KV Store

Validating the KV Store, diagnosing "Consume KV" silent failures, monitoring capacity, recovering from template/event ordering issues, and distributed-cluster setup.

How do I validate that KV Store is working correctly

Quick Health Check (run all three):

  1. Verify KV collection exists:

    | rest /servicesNS/nobody/tenx-for-splunk/storage/collections/config
    | search title="tenx_dml"
    
    Expected: Returns 1 result. If 0 results, collection wasn't created.

  2. Check KV store population:

    | inputlookup tenx-dml-lookup | stats count
    
    Expected: Shows N (number of templates). If 0, no templates loaded yet.

  3. Verify "Consume KV" scheduled search is running:

    | index=_internal savedsearch_name="Consume KV"
    | stats latest(status) as status, latest(_time) as last_run by savedsearch_name
    
    Expected: status=success, last_run within last 2 minutes.

If any check fails, see troubleshooting below.

\"Consume KV\" scheduled search is failing silently

The "Consume KV" search populates templates from tenx_dml index into the KV Store. If it fails, templates won't be available for expansion.

Diagnostic procedure:

Step 1: Check scheduler logs
| index=_internal sourcetype=scheduler savedsearch_name="Consume KV"
| table _time, status, result_count, alert_action
| stats latest(*) as * by status

Common failure modes:

Status Cause Fix
error Search syntax error in saved search Edit saved search "Consume KV" and verify query syntax
success / count=0 No templates in tenx_dml index Run: \| index=tenx_dml \| stats count — if 0, send templates via HEC
failure Alert action (tenx_dml_to_kv.py) failed Check: \| index=_internal sourcetype=action_handler savedsearch_name="Consume KV"
No results Search never ran Verify: Scheduler is enabled (Settings > Scheduled Searches)

Recovery steps:

1. Verify templates exist:
   | index=tenx_dml sourcetype=tenx_dml_raw_json | stats count

2. Force immediate execution:
   Click saved search "Consume KV" > Run
   (Or use: | savedsearch "Consume KV")

3. Wait 2 minutes and verify population:
   | inputlookup tenx-dml-lookup | stats count
   (Should show > 0)

4. If still 0, check KV collection exists:
   | rest /servicesNS/nobody/tenx-for-splunk/storage/collections/config
How do I monitor KV Store size and capacity

KV Store size affects search performance. Monitor it proactively:

Monthly capacity check:

| inputlookup tenx-dml-lookup
| stats count as num_templates, max(timestamp_format) as latest_update

Recommended capacity limits:

Template Count Action Performance
< 100K No action needed Excellent (< 5ms lookup)
100K-500K Monitor monthly Good (5-20ms lookup)
500K-1M Plan optimization Fair (20-50ms lookup)
> 1M Contact engineering Needs partitioning

If approaching 1M templates:

Option 1: Archive old templates (move to secondary collection)

| inputlookup tenx-dml-lookup
| search timestamp_format < "2024-01-01"
| ... (export to archive)

Option 2: Partition templates across multiple collections

Create: tenx_dml_2024, tenx_dml_2025, etc.
Route by year in your expansion macro

Monitor expansion latency:

index=<your-index> sourcetype=tenx_encoded
| `tenx-inflate`
| stats avg(eval(round(relative_time(now(), "now") - _time, 3))) as inflate_latency_sec

If latency > 1 second, KV Store may be oversized.

What if I accidentally send encoded events before templates are loaded

If encoded events arrive before templates, expansion will fail silently until templates load.

Prevention:

Always verify template population BEFORE sending encoded events:

# Wait for this to return > 0:
| inputlookup tenx-dml-lookup | stats count

Recovery (if already happened):

  1. Load the missing templates - Re-send template data via HEC (same format as before) - Wait 2-3 minutes for "Consume KV" to process

  2. Re-index the encoded events (optional)

    # If using Kubernetes:
    kubectl delete pod <forwarder-pod-name>  # Triggers reprocessing
    
    # If using file-based forwarder:
    # Delete offset tracking file, restart forwarder
    

  3. Verify recovery:

    | index=<your-index> sourcetype=tenx_encoded
    | head 10 | `tenx-inflate`
    # Should now return expanded events
    

Distributed KV Store setup for multi-node Splunk clusters

For production Splunk clusters, KV Store can be: - Replicated (HA across nodes) - Partitioned (scaled across multiple collections)

For 3-node Splunk cluster:

KV Store automatically replicates to all nodes (no special config). To verify:

# On each node:
| rest /servicesNS/nobody/tenx-for-splunk/storage/collections/config
| search title="tenx_dml"
| table label, acl{}.perms

All three nodes should return the same collection.

Performance optimization for distributed setup:

# In app's local/collections.conf (or via REST):

[tenx_dml]
field.pattern_hash = string
field.pattern = string
accelerated_fields = pattern_hash  # Index pattern_hash for faster lookups

This creates an index on pattern_hash (faster expansion macro joins).

For very large clusters (10+ nodes):

Consider dedicated KV Store nodes:

Edit: $SPLUNK_HOME/etc/system/local/server.conf
[sslConfig]
serverRepositories = <list-of-kv-store-only-nodes>

Monitoring cluster KV Store health:

| rest /servicesNS/nobody/tenx-for-splunk/storage/collections/data/tenx_dml
| stats count as templates_primary
| append
  [| rest /servicesNS/nobody/tenx-for-splunk/storage/collections/data/tenx_dml
   | stats count as templates_replica]

Both should be equal (healthy replication).