How this is measured

Every rule we ship is exercised by a real attack technique against a live TinySocs install, on a schedule, in public. This page explains exactly what the numbers on the dashboard mean — including the parts that are unflattering.

What runs

We use Atomic Red Team — an open library of small, self-contained tests, each mapped to a MITRE ATT&CK technique. For each rule in the detection pack that has a mapped technique, the harness:

executes the Atomic Red Team test for that technique on the machine (e.g. simulates failed logons for brute-force, dumps LSASS for credential access);
waits for the TinySocs detection pipeline to process the resulting Windows events;
queries OpenSearch for an alert carrying the expected rule ID;
records the outcome with a timestamp and the commit that produced it.

The harness source is open: tests/Test-AtomicDetection.ps1, and the technique→rule mapping is tests/atomic-tests.yaml.

What counts as a pass

A test passes when at least one of its expected rules fires an alert within the timeout. That is detection at the technique level: the attack happened, and the pack caught it. If nothing fires, it's a miss.

The categories

Each result lands in exactly one bucket. The dashboard colours by these:

Category	Meaning
Detected	An expected rule fired. The pack caught the attack.
Missed	A rule should have fired and didn't. The only alarming category. Every miss gets a written postmortem.
Skip — platform	The test environment can't run this test (e.g. it needs a Domain Controller, Sysmon, or the FIM module). Not a rule defect.
Skip — prerequisite	An environmental precondition isn't met (e.g. Tamper Protection is on, blocking a Defender change). Expected behaviour, not a defect.
Error	The harness itself hit a problem (network, Atomic Red Team install, a query failure). Investigated, but not a rule failure.

A skipped test is not a failed rule. Tests are skipped because of the test environment, not because the detection is broken. We run on a single workstation, so anything that needs a Domain Controller or a feature we don't have installed is skipped — the rule itself is unchanged and ships exactly the same. The dashboard keeps skips visually distinct from misses for precisely this reason.

The venue

Validation currently runs on a single Windows 11 workstation with TinySocs, OpenSearch, and Sysmon installed. That's an honest limit: rules that need a Domain Controller (e.g. NTDS extraction) or the optional FIM module show as platform-skips here, not passes. Multi-platform validation (Windows Server, additional OS versions) is planned, not yet live.

Coverage, named

Not every rule in the pack has an Atomic Red Team test yet. The dashboard counts and lists the untested rules rather than hiding them, and we add coverage on a weekly cadence. A rule without a test is marked untested — it is not counted as a pass.

When we miss

A first-time miss is the failure mode that matters, so we treat it openly: same-week investigation, a written postmortem committed alongside the results, and a link from the affected rule. "Yes, we missed; here's why; here's the fix" is the whole point of publishing this.

The raw data

Every weekly run is committed as a JSON file you can read or diff: results/. The dashboard is built from those files and nothing else — there's no separate, prettier source of truth.

← Back to the dashboard