KV Store

Validating the KV Store, diagnosing "Consume KV" silent failures, monitoring capacity, recovering from template/event ordering issues, and distributed-cluster setup.

How do I validate that KV Store is working correctly

Quick Health Check (run all three):

  1. Verify KV collection exists:

    | rest /servicesNS/nobody/tenx-for-splunk/storage/collections/config
    | search title="tenx_dml"
    
    Expected: Returns 1 result. If 0 results, collection wasn't created.

  2. Check KV store population:

    | inputlookup tenx-dml-lookup | stats count
    
    Expected: Shows N (number of templates). If 0, no templates loaded yet.

  3. Verify "Consume KV" scheduled search is running:

    | index=_internal savedsearch_name="Consume KV"
    | stats latest(status) as status, latest(_time) as last_run by savedsearch_name
    
    Expected: status=success, last_run within last 2 minutes.

If any check fails, see troubleshooting below.

\"Consume KV\" scheduled search is failing silently

The "Consume KV" search populates templates from tenx_dml index into the KV Store. If it fails, templates won't be available for expansion.

Diagnostic procedure:

Step 1: Check scheduler logs
| index=_internal sourcetype=scheduler savedsearch_name="Consume KV"
| table _time, status, result_count, alert_action
| stats latest(*) as * by status

Common failure modes:

Status Cause Fix
error Search syntax error in saved search Edit saved search "Consume KV" and verify query syntax
success / count=0 No templates in tenx_dml index Run: \| index=tenx_dml \| stats count, if 0, send templates via HEC
failure Alert action (tenx_dml_to_kv.py) failed Check: \| index=_internal sourcetype=action_handler savedsearch_name="Consume KV"
No results Search never ran Verify: Scheduler is enabled (Settings > Scheduled Searches)

Recovery steps:

1. Verify templates exist:
   | index=tenx_dml sourcetype=tenx_dml_raw_json | stats count

2. Force immediate execution:
   Click saved search "Consume KV" > Run
   (Or use: | savedsearch "Consume KV")

3. Wait 2 minutes and verify population:
   | inputlookup tenx-dml-lookup | stats count
   (Should show > 0)

4. If still 0, check KV collection exists:
   | rest /servicesNS/nobody/tenx-for-splunk/storage/collections/config
How do I monitor KV Store size and capacity

KV Store size affects search performance. Monitor it proactively:

Monthly capacity check:

| inputlookup tenx-dml-lookup
| stats count as num_templates, max(timestamp_format) as latest_update

Growth and partitioning guidance:

The KV Store holds one row per template, so size grows with the number of templates rather than event volume. Track num_templates over time and watch expansion latency (query below). If lookup latency climbs well above your normal baseline as the template count grows, consider archiving old templates or partitioning across collections.

If the template count keeps growing:

Archive old templates by exporting them out of the collection and removing them from tenx_dml:

| inputlookup tenx-dml-lookup
| search timestamp_format < "2024-01-01"
| ... (export to archive)

The tenx-inflate macro reads a single collection (tenx_dml), so keep all active templates in that one collection rather than splitting them.

Monitor expansion latency:

index=<your-index> sourcetype=tenx_encoded
| `tenx-inflate`
| stats avg(eval(round(relative_time(now(), "now") - _time, 3))) as inflate_latency_sec

Establish a baseline for inflate_latency_sec when the collection is small, then watch for it rising well above that baseline as the template count grows.

What if I accidentally send encoded events before templates are loaded

If encoded events arrive before templates, expansion will fail silently until templates load.

Prevention:

Always verify template population BEFORE sending encoded events:

# Wait for this to return > 0:
| inputlookup tenx-dml-lookup | stats count

Recovery (if already happened):

  1. Load the missing templates - Re-send template data via HEC (same format as before) - Wait ~2 minutes for "Consume KV" to process (it runs every 2 minutes)

  2. Re-index the encoded events (optional)

    # If using Kubernetes:
    kubectl delete pod <forwarder-pod-name>  # Triggers reprocessing
    
    # If using file-based forwarder:
    # Delete offset tracking file, restart forwarder
    

  3. Verify recovery:

    | index=<your-index> sourcetype=tenx_encoded
    | head 10 | `tenx-inflate`
    # Should now return expanded events
    

Distributed KV Store setup for multi-node Splunk clusters

For production Splunk clusters, KV Store can be: - Replicated (HA across nodes) - Partitioned (scaled across multiple collections)

For a multi-node Splunk cluster:

KV Store replication across search heads follows your cluster's KV Store configuration, so confirm the tenx_dml collection is present and consistent on each node:

# On each node:
| rest /servicesNS/nobody/tenx-for-splunk/storage/collections/config
| search title="tenx_dml"
| table label, acl{}.perms

Each node should return the same collection.

Collection schema:

The tenx_dml collection is defined in the app's default/collections.conf. The tenx-inflate macro reads these fields at search time:

[tenx_dml]
field.pattern_hash = string
field.pattern = string
field.pattern_parts = array
field.part_0 = string
field.pattern_terminator = string
field.timestamp_format = string

Monitoring cluster KV Store health:

| rest /servicesNS/nobody/tenx-for-splunk/storage/collections/data/tenx_dml
| stats count as templates_primary
| append
  [| rest /servicesNS/nobody/tenx-for-splunk/storage/collections/data/tenx_dml
   | stats count as templates_replica]

Both should be equal (healthy replication).