Investigate
Root-cause analysis on a service or pattern, and cross-pillar correlation between log patterns and APM / infra / business metrics.
"spike on payments-svc — what's driving it?"
Root cause:
Payment_Gateway_Timeoutjumped 200/min → 45,000/min at 14:30. CPU spike ondb-replica-2matches."co-movers?"
Datadog:
db.replica.cpu(r=0.94) ·apm.payments.latency(r=0.91) ·kafka.consumer.lag(r=0.87)."verify"
kubectl describe pod db-replica-2— node pressure, OOM-killed twice in last hour.
| You ask | Example answer |
|---|---|
| spike on payments-svc — what's driving it? | Root cause: Payment_Gateway_Timeout jumped 200/min → 45,000/min at 14:30. CPU spike on db-replica-2 matches. Verify: kubectl describe pod db-replica-2. |
re-show investigation inv_a1b2 |
Full report by ID (session-local, 50-item cache) |
what moves with Payment_Gateway_Timeout? |
Datadog: db.replica.cpu (r=0.94) · apm.payments.latency (r=0.91) · kafka.consumer.lag (r=0.87). 3 noise hits filtered. |
log patterns behind apm.payments.latency? |
Payment_Gateway_Timeout (r=0.91) · DB_Connection_Refused (r=0.83) · Retry_Exhausted (r=0.71) |
apm_request_duration_p99{service="payments-svc"} |
Direct passthrough to your Datadog / Grafana / Prometheus endpoint |
| join key for logs ↔ metrics? | Found service (matches in 87% of overlap). Used by Correlate / Translate automatically. |
Prerequisites
Investigate and Resume need the Reporter deployed. The metric-correlation tools (Correlate, Translate, Query, Join keys) additionally need an APM / infra metric endpoint linked — point LOG10X_CUSTOMER_METRICS_URL at your Grafana, Datadog, or Prometheus instance.