Investigate

Root-cause report on a symptom — alert, service, pattern, or pasted log line. Returns the onset time, the named cause with supporting evidence, what else moved at the same time, and kubectl / curl / PromQL commands to verify. Works for sudden spikes (pager just fired) and slow drift (pattern worsening for weeks).

Example

"spike on payments-svc — root cause?"

Onset: 14:30 today. Payment_Gateway_Timeout jumped 200/min → 45,000/min.

Cause: CPU spike on db-replica-2 matched the onset (r=0.94). What moved with it: db.replica.cpu, apm.payments.latency, kafka.consumer.lag.

Verify: kubectl describe pod db-replica-2

More to ask

"why is Retry_Backoff_Exhausted firing?"
"slow drift in checkout-svc, last 30 days"
"full environment audit, last 30 days"

Prerequisites

This tool requires the Reporter deployed. Slow-drift investigations need continuous historical metrics, which CLI-only mode doesn't produce.

Tool schema (advanced)

Field	Type	Required	Default	Description
`starting_point`	string	yes	—	What to investigate, in the user's own words: a pasted log line, a pattern name, a service name, or the literal string `environment` / `all` / `audit` for a sweep.
`window`	string	no	`1h`	Analysis window. `1h` for acute spikes; `30d` for drift. Accepts any PromQL range string. Alias: `timeRange`.
`timeRange`	string	no	—	Alias for `window` for consistency with the other tools. If both are set, `window` wins.
`depth`	string	no	`normal`	`shallow` = anchor service only. `normal` = anchor + immediate dependencies. `deep` = full environment-wide.
`baseline_offset`	string	no	`24h` / `window`	Baseline comparison offset. Defaults to `24h` for short windows (acute spikes), or `window` value for long windows (drift).
`use_bytes`	boolean	no	`false`	Use byte-based rate instead of event-count. Event-count is strongly preferred.
`environment`	string	no	—	Environment nickname — required in multi-env setups.

Resolver fixes (2026-04-27). The pattern-exists probe now honors the user's window instead of hardcoded [5m], so a sparse pattern that fired heavily in a 7d window but is silent in the last 5 min still resolves. When the pattern doesn't exist in the requested window, a 30d wide-probe checks whether it exists at all — if so, the report tells the SRE to widen the window instead of bouncing them to event lookup. Series missing the message_pattern label are filtered out of env-audit movers so undefined rows no longer appear.