SIEM TCO Optimization: A CISO’s Field Notes on Cutting In‑House Cost (and When Managed SIEM Is the Better Deal)

Written by V. Garbar | 13 Jan, 2026

SIEM TCO Optimization Is Not a Tool Problem. It’s an Operating Model Problem.

I’ve spent the better part of 15 years living inside SIEM programmes—building them, rescuing them, and occasionally shutting them down when the economics stopped making sense.

The pattern is consistent, and most organisations follow it almost instinctively:

the first months are about coverage (“get the logs in”);

the next year is about capability (use cases, detections, dashboards);

then the SIEM becomes a debate about cost, and the organisation discovers that “SIEM spend” is not the license.

It’s the operating model.

This path is common—but that doesn’t make it the right starting point.

What’s usually missing is a true zero step: a risk-based view of what actually matters before any logs are onboarded. Not assets in isolation, but business processes—and the assets, identities, data flows, and systems that support them.

Crucially, this is not just about identifying risk. It’s about understanding the cost of that risk.

A mature SIEM programme does not ask, “Is this risky?”

It asks, “What does this risk cost the business if it materialises?”

Only once risk is expressed in economic terms can it be meaningfully prioritised, aggregated, and compared. Over time, those individual risk costs can be summed, allowing organisations to reason about total exposure—and, just as importantly, to compare that exposure against the cost of mitigation.

This is where many SIEM programmes go wrong. They optimise for visibility first and attempt to justify cost later. By the time spend becomes visible, the operating model is already fixed, and course correction becomes difficult.

A more sustainable starting point forces different questions upfront:

Which business processes would create material business impact if disrupted or compromised?

What is the estimated cost of that impact—technically, operationally, and regulatorily?

Which assets and identities underpin those processes?

Which risks are technical, and which are compliance or regulatory in nature?

And which telemetry is actually required to make informed decisions about those risks?

Without this foundation, coverage-driven SIEM programmes inevitably turn into cost-driven debates later. Not because the tools are expensive, but because the operating model was never aligned to the economics of risk in the first place.

The SIEM cost equation nobody writes down

The SIEM cost equation almost nobody writes down—but every organisation eventually experiences—needs to be more explicit than most vendor conversations allow.

In practice, SIEM TCO looks closer to this:

SIEM TCO = Platform Costs + People Costs + Change Costs

Platform Costs ≈ (Data Volume × Retention Depth) → compute/storage + SIEM licensing/consumption
Change Costs = implementation & integration + continuous development & tuning
People Costs = L1–L3 monitoring/investigation/engineering

This reflects reality far better than any simplified “cost per GB” discussion.

Data volume multiplied by retention depth defines your storage and processing baseline.

Licensing defines how that data is monetised by the platform.

Implementation effort covers onboarding, parsing, normalisation, and integrations.

Monitoring teams (L1–L3) represent the ongoing human cost of triage, investigation, and response.

And continuous development—detections, tuning, automation, platform changes—is not optional; it is the price of keeping SIEM relevant as the environment evolves.

Most organisations account for some of these costs. Very few account for all of them together.

What’s usually missing entirely is the counterbalance: the economic cost of risk the SIEM is supposed to manage.

A useful starting approximation looks like this:

Cost of Risk = Likelihood × Business Impact

Where Business Impact is not abstract severity, but an estimated economic outcome:

operational downtime,

revenue loss,

regulatory penalties,

contractual breaches,

recovery and response cost,

reputational damage where it can be reasonably approximated.

This doesn’t need to be perfect to be useful.

It needs to be directionally correct and consistently applied.

The problem most SIEM programmes face is not that costs are too high—it’s that they are precise, while risk remains vague.

SIEM invoices arrive monthly and are measured in currency.

Risk is discussed qualitatively, often without aggregation or comparison.

A mature operating model forces these two equations to meet.

Once risks are expressed in economic terms, they can be:

prioritised,

summed across processes,

compared against the cost of mitigation,

and consciously accepted, reduced, or transferred.

At that point, SIEM optimisation stops being a reactive cost-cutting exercise and becomes a mechanical decision-making process:

does the cost of detection, monitoring, and response make sense relative to the cost of the risk it addresses?

Without this linkage, SIEM programmes drift. Costs grow predictably, value remains implicit, and optimisation discussions eventually collapse into debates about tooling rather than economics.

The part most budgets miss: “people math” is the real SIEM bill

Most SIEM budgets underestimate one thing consistently: the cost of people is not linear, optional, or easily optimised away. It is structural.

A simple reality check for mid-size organisations makes this clear.

Twenty-four by seven coverage means 168 hours per week.

A single full-time analyst realistically covers around 40 hours per week.

On paper, that suggests:

168 ÷ 40 = 4.2 FTE to staff a single continuous function.

In reality, once you account for holidays, sickness, training, attrition, escalation overhead, and the simple fact that humans cannot operate at full cognitive capacity indefinitely, the number is closer to 5–6 FTE to reliably sustain a single 24×7 role.

And this is before you talk about depth.

A functioning SIEM operation is not just a monitoring seat. It is a pipeline:

L1 triage,

L2 investigation,
L3 engineering and response.

Each layer absorbs different kinds of work, but they are tightly coupled. Noise at L1 creates backlog at L2. Weak engineering support at L3 increases noise everywhere else.

This is where SIEM economics often break down.

When organisations say “we’ll run it in-house,” what they usually mean is:

we’ll fund the tool, and we’ll see how much human effort we can absorb.

That model works only as long as expectations remain low.

The moment leadership expects:

consistent 24×7 coverage,

predictable triage quality,

maintained detection content,

continuous tuning,

disciplined incident response,

the human cost stops being marginal and becomes dominant.

And there is an uncomfortable truth here:

human decision-making capacity does not scale with telemetry growth.

Telemetry grows exponentially. Human attention grows linearly—at best.

This creates a hidden risk multiplier.

When alert volume exceeds analytical capacity, organisations don’t just pay more in salaries. They incur residual risk:

delayed detection,

shallow investigations,

missed weak signals,

inconsistent response quality.

From a risk perspective, this is critical.

Understaffed or overloaded SOCs do not simply cost less—they leave more risk unmanaged. That unmanaged risk has an economic cost, even if it is not immediately visible on a balance sheet.

This is why a mature SIEM optimisation programme must optimise two meters at the same time:

the technical meter (data volume, retention, compute);

the human meter (minutes of triage per alert, engineering hours per source, time-to-decision).

If you optimise only the tool and ignore the people, costs reappear as burnout, turnover, delayed response, and ultimately incidents.

At scale, SIEM TCO is not driven by how much data you ingest.

It is driven by how efficiently humans can convert that data into decisions.

And that efficiency is an operating model choice—not a tooling feature.

Data volume isn’t just a cost driver. It’s the shape of your SOC.

Most SIEM cost discussions eventually collapse into a single phrase:

“We need to reduce ingestion.”

That instinct is understandable—but incomplete.

Data volume is not just a cost driver. It defines the shape of your SOC, the way analysts think, prioritise, and ultimately make decisions.

A SOC that ingests everything will spend its life searching through everything. Not because analysts are inefficient, but because the system forces them into that mode. This is where volume turns into cognitive load.

Every additional data source increases:

the number of potential correlations,

the number of investigative paths,

the number of alerts that look “almost relevant,”

and the time required to build confidence in a decision.

At a certain point, the problem is no longer tooling performance or query speed. It becomes human throughput.

When telemetry volume grows faster than analytical capacity, SOC teams adapt in predictable ways:

alerts get triaged faster, but more shallowly;

investigations become checklist-driven rather than hypothesis-driven;

edge cases are deferred or ignored;

and risk acceptance happens implicitly, without being discussed or priced.

From a cost perspective, this is dangerous.

From a risk perspective, it is worse.

This is why the goal is not “ingest less data.” The goal is ingest with intent.

Intent means that every major data source is onboarded with a clear answer to three questions:

What decisions does this data enable?

Which risks does it help reduce or quantify?

And what is the acceptable cost—both technical and human—of operating it?

Without intent, data volume becomes self-justifying. Logs stay because they are already there. Retention grows because nobody wants to be responsible for removing something “important.” Over time, the SOC’s operating model shifts from risk-driven to volume-driven.

And once that happens, SIEM optimisation becomes reactive:

cut logs, reduce retention, suppress alerts—often without understanding which risks are being reintroduced.

A mature operating model reverses this logic. It starts with decisions and risk, and works backwards to data. Not every signal deserves real-time analytics. Not every log deserves premium storage. And not every possible correlation deserves an alert.

Data volume, when aligned to risk and decision-making, becomes manageable. When it is not, it shapes the SOC into something reactive, noisy, and expensive. And that shape, once established, is hard to change.

The practical model we use across customers: three data classes

1. Detection-grade data

Detection-grade data is the telemetry that directly protects business-critical processes and systems.

This is where SIEM exists first and foremost.

It powers real-time decisions around assets whose compromise would immediately translate into business, operational, or regulatory impact.

In practice, this includes telemetry from:

Externally exposed web applications and APIs that process customer data, payments, or sensitive transactions.

Internal business-critical systems such as ERP, core banking platforms, trading systems, billing engines, and other systems that underpin revenue-generating processes.

Identity and authorisation layers — authentication events, privileged access, conditional access decisions, identity federation, and MFA systems. These systems effectively consolidate control over access to the entire organisation.

Security control planes — systems that manage security posture themselves: EDR/XDR platforms, security management consoles, identity providers, and policy engines. Compromise here often means compromise everywhere.

Infrastructure management components — hypervisor management, cloud control planes, container orchestration, CI/CD systems, and configuration management platforms. These are high-impact targets precisely because they enable lateral and vertical privilege escalation.

Critical infrastructure and industrial environments, including ICS/OT and SCADA systems, where availability and integrity matter as much as confidentiality, and where detection latency directly translates into safety and operational risk.

Detection-grade data must be:

tightly scoped to these high-impact systems,

well-parsed and normalised,

fast to query,

and continuously monitored.

From an economic perspective, this is where premium spend is justified.

Detection-grade data directly reduces the cost of high-impact risk: service disruption, regulatory breach, safety incidents, and large-scale compromise.

If detection fails at this layer, the organisation doesn’t just lose visibility — it loses control. And the downstream cost is measured not in alerts, but in business impact.

2. Investigation-grade data

Investigation-grade data exists to answer questions that cannot be fully anticipated at detection time.

This is the telemetry that supports depth, context, and confidence when something has already gone wrong—or when weak signals need to be validated before action is taken.

It typically includes data from:

Network telemetry at scale — DNS logs, proxy logs, full firewall session data, east-west traffic visibility.

Application and platform logs that are too verbose or contextual for real-time detection but essential for reconstructing timelines.

Cloud infrastructure telemetry beyond control-plane events, such as detailed workload activity, network flows, and service-level diagnostics.

Email and collaboration platforms, where investigations often require historical message flow, metadata, and interaction context rather than immediate alerts.

Investigation-grade data is not about speed.

It is about explainability.

When incidents escalate to management, legal, or regulators, this is the data that allows security teams to say:

what happened,

how it happened,

how far it spread,

and whether containment was effective.

From an economic perspective, investigation-grade data is where discipline matters most.

Treating it like detection-grade data—forcing it into premium, real-time analytics tiers—creates disproportionate cost without proportional risk reduction.

At the same time, under-retaining it creates a different kind of risk: inconclusive investigations, prolonged incident handling, and weak post-incident assurance.

The right balance accepts that investigation-grade data is accessed episodically, but when it is accessed, it must be complete, reliable, and searchable.

3. Compliance-grade data

Compliance-grade data exists to satisfy external obligations, not operational curiosity.

Its primary purpose is evidence.

This class typically includes:

Logs retained to meet regulatory requirements (DORA, NIS2, ISO 27001, sector-specific regulation).

Audit trails required for financial, legal, or contractual assurance.

Security-relevant records that may be requested long after the event, often under formal investigation or audit conditions.

The defining characteristics of compliance-grade data are:

immutability,

integrity,

retrievability over time.

Speed is not the requirement.

Accuracy and trustworthiness are.

One of the most common and expensive mistakes in SIEM programmes is treating compliance-grade data as if it were operationally active. Doing so drives up costs while providing little to no improvement in security outcomes.

From a risk perspective, compliance-grade data mitigates a different category of risk:

regulatory penalties,

audit failures,

legal exposure,

loss of licence to operate.

These risks are real—but they are not reduced by faster queries or real-time correlation. They are reduced by correct retention, governance, and evidence handling.

In a mature operating model, compliance-grade data is deliberately separated from detection and investigation workflows. It is retained according to policy, protected against tampering, and accessed only when required.

When this separation is clear, compliance stops being a hidden cost driver and becomes a controlled, predictable component of SIEM TCO.

Retention strategy: fear-based retention is the most common (and most expensive) mistake

Retention strategy is where many SIEM programmes quietly lose control of their economics.

In a large number of environments, the effective policy is simple:

“Keep everything searchable for a year. Just in case.”

This feels cautious. In reality, it is one of the most expensive and least disciplined decisions organisations make.

Fear-based retention is not driven by risk analysis. It is driven by uncertainty and lack of clarity around data purpose.

When teams cannot clearly distinguish between detection-grade, investigation-grade, and compliance-grade data, the safest option appears to be treating everything the same. Everything stays hot. Everything stays searchable. Everything stays expensive.

Operationally, this is the SIEM equivalent of leaving every light on in a building 24×7 because you might walk into a room.

A cost-optimised retention model is intentionally boring, explicit, and engineered.

In practice, it looks like this:

Hot retention (fast, expensive)

A short, clearly defined window—typically 30 to 90 days—reserved for detection-grade data and a limited subset of high-value investigation-grade telemetry. This data supports real-time decisions, active incidents, and rapid response.

Warm / cold retention (slower, cheaper)

Extended retention for investigation-grade data that is accessed episodically: threat hunting, forensic reconstruction, regulatory inquiries, or post-incident analysis. Speed is traded for cost efficiency without compromising completeness.

Archive retention (cheapest)

Compliance-grade data retained for regulatory, legal, and audit purposes. Immutability, integrity, and long-term retrievability matter. Query performance does not.

Microsoft’s Sentinel architecture explicitly reflects this separation through its analytics tier and data lake tier. The principle, however, is vendor-agnostic. What matters operationally is not where data is stored, but why it is stored there.

Retention is an engineering design decision, not a finance argument and not a compliance panic response.

Each retention choice introduces an explicit trade-off:

cost versus speed,

accessibility versus immediacy,

operational convenience versus economic discipline.

In a mature operating model, those trade-offs are intentional and documented. They are directly linked back to:

the cost of the risk being managed,

the likelihood of needing the data,

and the business or regulatory impact of delayed access.

When retention is driven by fear, SIEM costs grow without improving security outcomes.

When retention is driven by purpose, costs become predictable, defensible, and aligned to risk.

Alert fatigue is a financial problem, not a wellbeing problem

Alert fatigue is often discussed as a human problem: stress, burnout, morale. Those effects are real—but they are secondary. At its core, alert fatigue is an economic problem. Every alert consumes time. Every triage decision consumes cognitive capacity. Every false positive converts directly into labour cost—and indirectly into unmanaged risk. If an alert does not lead to a decision or an action, its return on investment is negative.

Most SIEM environments accumulate alerts over time that no one explicitly owns. They persist because:

no one wants to be responsible for disabling “something important”;

SOC teams are rewarded for volume, not outcomes;

leadership measures how many detections exist, not how many incidents are handled well.

The result is predictable.

As alert volume grows, organisations pay twice:

once in analyst time spent triaging low-value signals;

and again in residual risk, as high-impact signals are delayed, deprioritised, or lost in noise.

This is where the economics of alerting intersect directly with the economics of risk.When analysts are overloaded, decisions slow down. When decisions slow down, detection latency increases. When detection latency increases, the cost of incidents rises—even if the same incidents are eventually detected.

This is why reducing noise is not an optimisation exercise. It is a risk reduction activity.

Independent studies illustrate this clearly. For example, Forrester’s Total Economic Impact analysis for Microsoft Sentinel highlighted that automation and improved signal quality can reduce false positives by up to 79% and reduce the effort required for advanced investigations by up to 85% in the composite organisation studied.

The specific numbers matter less than the direction.

Less noise means:

fewer analyst minutes per alert,

faster time-to-decision,

more consistent investigations,

and lower residual risk at the same staffing level.

In a mature operating model, alerting is governed like any other production system.

Each detection:

has a clear owner,

is tied to a defined risk scenario,

has an expected response path,

and is measured for quality, not just existence.

Alerts that do not justify their operational and economic cost are tuned, automated, or retired.

Without this discipline, SIEM optimisation efforts stall. Costs remain visible and measurable, while risk reduction remains assumed.

And when that happens, organisations cut logs instead of fixing the alerting model—often reintroducing risk they never intended to accept.

SIEM platform economics: what actually drives cost in Splunk, Sentinel, QRadar, and Elastic

As of 2026, SIEM selection is less about feature parity and more about how the platform scales with your infrastructure and security operating model.

All major platforms—IBM QRadar, Microsoft Sentinel, Elastic, and Splunk—solve the same core problem. What differs is how they monetise it. Each licensing model encourages certain operational behaviours and penalises others. Over time, those incentives—not headline pricing—determine total cost of ownership.

IBM QRadar: Predictability over flexibility

QRadar was designed for environments where stability and predictability matter more than rapid elasticity. It remains a common choice for large enterprises and regulated sectors, often in on-premise or tightly controlled hybrid deployments.

Its economics are built around fixed capacity limits:

Events Per Second (EPS) for event ingestion,

Flows Per Minute (FPM) for network flows.

This model offers budget predictability. Organisations generally know what they will pay year over year. The trade-off is scalability: sustained growth in telemetry typically requires purchasing additional capacity or redesigning ingestion pipelines.

QRadar performs well when:

event volumes are stable,

data sources are well understood,

changes are controlled and incremental.

In highly dynamic or cloud-native environments, however, QRadar must be carefully engineered. Without early filtering and strict flow control, licensing quickly shifts toward peak capacity rather than actual security value.

Microsoft Sentinel: Cloud elasticity with consumption discipline

Microsoft Sentinel is a fully cloud-native SIEM built on Azure. Its economics are driven by data consumption, not infrastructure.

The core model is straightforward:

pay for the volume of data analysed and retained,

optionally commit to reserved ingestion tiers (for example, 50 or 100 GB per day) to reduce unit cost compared to pay-as-you-go pricing.

A material differentiator is included Microsoft telemetry. Azure Activity Logs, Office 365 Audit Logs, and Defender alerts can be ingested at no cost or with significant allowances. For Microsoft-centric environments, this changes the economics of baseline detection coverage.

The most important cost control lever in Sentinel is explicit data tiering:

an analytics tier for detection-grade, time-sensitive data,

a data lake tier for investigation and compliance use cases.

Sentinel is economically viable when it is operated with discipline: pre-ingestion filtering, continuous review of sources, and cost treated as an operational metric. Without that discipline, cloud flexibility turns into unpredictable spend.

Elastic: Low entry cost, high operational responsibility

Elastic offers the most flexibility—and places the most responsibility on the organisation.

In Elastic Cloud, pricing is resource-based (CPU, memory, storage). In self-managed deployments, costs are tied to cluster size, node count, and subscription level. There is no direct per-gigabyte ingestion fee.

This allows large volumes of data to be retained at relatively low cost when access patterns are infrequent. The trade-off is operational complexity.

Elastic requires:

strict schema and index lifecycle management,

careful data tiering,
engineering maturity in platform operations.

Elastic works well for organisations with very high telemetry volumes and strong engineering teams. It becomes risky when “open source” is assumed to mean “low effort.” In practice, cost is shifted from licensing to payroll and operational risk.

Splunk: Maximum analytical power, minimal tolerance for waste

Splunk remains one of the most powerful analytics platforms in the SIEM space, but its economics are unforgiving.

Traditional pricing is based on daily ingest volume. Newer workload-based models (such as Splunk Virtual Compute) shift cost toward compute consumption, but the underlying principle remains the same: anything indexed is expensive.

Splunk rewards:

strict control of indexed data,

well-engineered ingestion pipelines,

a disciplined detection lifecycle.

Splunk is rarely “too expensive” by default. It becomes expensive when organisations avoid hard decisions about which data actually supports security decisions.

The common pattern across platforms

The difference between SIEM platforms is not which one is “cheapest.”

It is which operating behaviours they reinforce.

QRadar rewards predictability.

Sentinel rewards flexibility with discipline.

Elastic rewards engineering maturity.

Splunk rewards intentionality and control—and penalises the lack of both.

When a SIEM is perceived as expensive, it is usually a signal that the operating model is misaligned with the platform’s economic incentives—not that the wrong tool was chosen.

In-house SIEM TCO optimisation: the playbook that actually changes your numbers

At some point, every organisation trying to optimise SIEM cost arrives at the same crossroads:

do we continue to run this ourselves, or does a managed model make more sense?

This is often framed as a cultural or control discussion. In reality, it is an economic decision. Managed SIEM is not inherently better. In-house SIEM is not inherently cheaper. Each model becomes rational—or irrational—based on how the cost of risk, data, and human effort intersect.

Whether you are starting from scratch or resetting an existing SIEM estate, the optimisation sequence is consistent.

The in-house optimisation playbook that actually changes the numbers

When organisations ask to reduce SIEM cost without degrading detection outcomes, the work almost always follows the same sequence.

Step 1: Map cost to value before touching the platform

For every data source, table, or index, answer four questions:

How much does it ingest per day?

Which detections or investigations depend on it?

Which risks does it help reduce, and what is the estimated cost of those risks?

Is it required for compliance?

If a data source cannot be tied to detection, investigation, or compliance, it does not belong in an expensive tier.

Step 2: Enforce data tiering at the pipeline level

Optimisation happens upstream, not in dashboards.

Detection-grade data belongs in real-time analytics tiers.

Investigation-grade data belongs in cheaper, searchable storage.

Compliance-grade data belongs in immutable archives.

This is an architectural decision, not a financial one.

Step 3: Treat detections as an engineering lifecycle

Noise reduction is not tuning once—it is continuous governance.

Every detection should have:

an owner,

a defined response expectation,

quality metrics (false positives, time-to-triage),

and a retirement path.

Detections that consistently fail to justify their cost should be automated, redesigned, or removed.

Step 4: Monitor cost as an operational metric

In mature environments, cost visibility is part of SOC reporting:

top ingesting sources,

unused or low-value data,

alert volume versus true positives,

approximate cost per incident.

This shifts optimisation from ad-hoc reactions to deliberate decisions.

Step 5: Use platform-native cost levers

Each SIEM provides mechanisms to control cost. Fighting the platform usually fails. Aligning with it works.

When managed SIEM becomes the rational choice

Managed SIEM becomes attractive when the human cost, not the tool cost, dominates.

Common triggers include:

24×7 expectations without sustainable staffing,

dependency on one or two key engineers,

telemetry growth outpacing analytical capacity,

leadership demanding predictable outcomes and predictable spend.

At that point, the question is no longer “Can we run SIEM ourselves?”

It becomes “Can we economically sustain the operating model we need?”

The trade-offs a CISO should make explicit

Managed SIEM is only effective when trade-offs are acknowledged upfront.

Standardisation vs customisation

Managed providers standardise to achieve scale. That lowers cost—but limits bespoke complexity.

Clear ownership and responsibility

Who decides what data is ingested?

Who owns detection quality?

Who accepts residual risk introduced by filtering or tiering?

Transparency requirements

A serious managed model provides visibility into:

ingested data and storage tiers,

active detections,

changes and rationale,

operational metrics (MTTA, MTTR, false positives).

Without this, managed SIEM becomes a black box—and black boxes are expensive in a different way.

What “good” managed SIEM actually looks like

When managed well, the operating model splits into three continuous workstreams:

Security data engineering

Pipeline design, parsing, enrichment, tiering, and health monitoring.

This is where data volume and cost are controlled.

Detection engineering

Threat modelling, rule lifecycle management, automation, and response logic.

This is where analyst effort and alert quality are optimised.

SOC operations

24×7 triage, escalation discipline, communication, and reporting.

This is where time-to-decision and business confidence are earned.

A managed SIEM should not simply “run the tool.”

It should continuously reduce the cost of converting telemetry into decisions.

Closing thought

SIEM TCO optimisation is not about austerity. It is about alignment.

Organisations that succeed can clearly explain:

which telemetry supports detection,

which supports investigation,

which exists for compliance,

what each category costs,

and what risk each category mitigates.

When that clarity exists, SIEM costs become predictable—and defensible.

When it doesn’t, optimisation turns into log cutting, and risk quietly returns through the side door.

View full post