Find skew

Find patterns where one slot value dominates the field, so the dominant case can be sampled without losing signal. Returns a ranked list of {patternIdentity, skewedSlot, dominantValue, dominantPct, samplingOpportunityPct} and chains to Mitigate for the top findings.

Distinct from analyzer field histograms: skew is measured within the structural pattern (per-field-set), so dominance reads correctly when a slot is GET 78% of the time inside one pattern but balanced across the env as a whole.

Example

You

skew in this batch of 1,800 events

Log10x

3 skew findings. Top:

pattern	top skewed slot	dominant value	dominant %	sampling opportunity (1/N)
`Payment_Gateway_Timeout`	`verb`	`get`	78%	70%
`HealthCheck_Pass`	`endpoint`	`/healthz`	96%	86%
`AddItemAsync`	`tenant`	`acme-corp`	64%	58%

Sample the get case in Payment_Gateway_Timeout at 1/10 → save ~70% of bytes without losing signal.

More to ask

"skew over 80%, top 10 only"
"check this Slack dump for sampling opportunities"
"skew but only patterns with 100+ events"

Prerequisites

None. The local 10x engine runs on the paste; it needs tenx 1.0.22+ installed.

Schema and samples

Input example

Real call against the demo env (captured by scripts/capture-tool-envelopes.mjs).

{
  "events": [
    "audit verb=get user=u0 status=200",
    "audit verb=get user=u1 status=200",
    "audit verb=get user=u2 status=200",
    "audit verb=get user=u3 status=200",
    "audit verb=get user=u4 status=200",
    "audit verb=get user=u5 status=200",
    "audit verb=get user=u6 status=200",
    "audit verb=get user=u7 status=200",
    "audit verb=post user=u9 status=201",
    "audit verb=delete user=u10 status=204"
  ],
  "min_concentration": 0.6,
  "top_n": 5,
  "view": "summary"
}

Input schema

Agent-facing JSON Schema (the canonical shape the MCP server publishes via tools/list):

{
  "type": "object",
  "properties": {
    "events": {
      "type": "array",
      "items": {},
      "description": "Events to analyze for slot skew. Same shape as log10x_resolve_batch: raw strings or JSON objects. Each event is templated locally; skew is computed across the resulting patterns."
    },
    "min_concentration": {
      "type": "number",
      "minimum": 0,
      "maximum": 1,
      "default": 0.6,
      "description": "Minimum dominant-value fraction for a slot to be flagged as skewed. Default 0.6 (a slot is \"skewed\" when one value is 60%+ of events). Hand-picked default tagged as `unvalidated_default` in the output. Compare against the `observed_dominant_pct_distribution` in `threshold_audit` to judge whether 0.6 is well above or below this dataset's noise."
    },
    "top_n": {
      "type": "number",
      "minimum": 1,
      "maximum": 50,
      "default": 20,
      "description": "Number of findings to return. Default 20."
    },
    "min_events": {
      "type": "number",
      "minimum": 1,
      "default": 10,
      "description": "Minimum events per pattern to bother checking. Default 10 (filters low-sample noise)."
    },
    "sample_n": {
      "type": "number",
      "minimum": 2,
      "default": 10,
      "description": "Sampling rate N for the savings projection (1/N of the dominant case kept). Default 10. Same calibration caveat: sample_n=10 is a defensible starting point but not validated for any specific cost target."
    }
  },
  "required": [
    "events"
  ],
  "additionalProperties": false
}

Source: src/tools/find-skew.ts.

Output example

Real envelope from the demo env. view: "summary" returns the full StructuredOutput with typed data. Long arrays + base64 PNG bodies trimmed for readability; the real call returns them in full.

Headline (the 1-line agent-facing answer):

No skewed slots found above the threshold.

{
  "schema_version": "1.0",
  "schema_epoch": "2026-05-25",
  "tool": "log10x_find_skew",
  "generated_at": "2026-05-26T15:38:16.519Z",
  "view": "summary",
  "summary": {
    "headline": "No skewed slots found above the threshold.",
    "bullets": []
  },
  "data": {
    "findings": []
  },
  "actions": [],
  "truncated": false,
  "warnings": []
}

Output schema

The data block inside the StructuredOutput envelope:

interface ToolData {
  findings: unknown[];
}

Envelope-level fields the agent should also read: summary.headline (1-line answer), actions[] (next-call chain hints as {tool, args, reason}), truncated: boolean, images[] (PNG attachments where applicable), schema_epoch (engine-ID stability boundary).

Next: Measure compaction