KV Store

Validating the KV Store, diagnosing "Consume KV" silent failures, monitoring capacity, recovering from template/event ordering issues, and distributed-cluster setup.

How do I validate that KV Store is working correctly

Quick Health Check (run all three):

Verify KV collection exists:

| rest /servicesNS/nobody/tenx-for-splunk/storage/collections/config
| search title="tenx_dml"

Expected: Returns 1 result. If 0 results, collection wasn't created.

Check KV store population:
```
| inputlookup tenx-dml-lookup | stats count
```
Expected: Shows N (number of templates). If 0, no templates loaded yet.

Verify "Consume KV" scheduled search is running:

| index=_internal savedsearch_name="Consume KV"
| stats latest(status) as status, latest(_time) as last_run by savedsearch_name

Expected: status=success, last_run within last 2 minutes.

If any check fails, see troubleshooting below.

\"Consume KV\" scheduled search is failing silently

The "Consume KV" search populates templates from tenx_dml index into the KV Store. If it fails, templates won't be available for expansion.

Diagnostic procedure:

Step 1: Check scheduler logs
| index=_internal sourcetype=scheduler savedsearch_name="Consume KV"
| table _time, status, result_count, alert_action
| stats latest(*) as * by status

Common failure modes:

Status	Cause	Fix
`error`	Search syntax error in saved search	Edit saved search "Consume KV" and verify query syntax
`success` / count=0	No templates in `tenx_dml` index	Run: `\\| index=tenx_dml \\| stats count` — if 0, send templates via HEC
`failure`	Alert action (tenx_dml_to_kv.py) failed	Check: `\\| index=_internal sourcetype=action_handler savedsearch_name="Consume KV"`
No results	Search never ran	Verify: Scheduler is enabled (Settings > Scheduled Searches)

Recovery steps:

1. Verify templates exist:
   | index=tenx_dml sourcetype=tenx_dml_raw_json | stats count

2. Force immediate execution:
   Click saved search "Consume KV" > Run
   (Or use: | savedsearch "Consume KV")

3. Wait 2 minutes and verify population:
   | inputlookup tenx-dml-lookup | stats count
   (Should show > 0)

4. If still 0, check KV collection exists:
   | rest /servicesNS/nobody/tenx-for-splunk/storage/collections/config

How do I monitor KV Store size and capacity

KV Store size affects search performance. Monitor it proactively:

Monthly capacity check:

| inputlookup tenx-dml-lookup
| stats count as num_templates, max(timestamp_format) as latest_update

Recommended capacity limits:

Template Count	Action	Performance
< 100K	No action needed	Excellent (< 5ms lookup)
100K-500K	Monitor monthly	Good (5-20ms lookup)
500K-1M	Plan optimization	Fair (20-50ms lookup)
> 1M	Contact engineering	Needs partitioning

If approaching 1M templates:

Option 1: Archive old templates (move to secondary collection)

| inputlookup tenx-dml-lookup
| search timestamp_format < "2024-01-01"
| ... (export to archive)

Option 2: Partition templates across multiple collections

Create: tenx_dml_2024, tenx_dml_2025, etc.
Route by year in your expansion macro

Monitor expansion latency:

index=<your-index> sourcetype=tenx_encoded
| `tenx-inflate`
| stats avg(eval(round(relative_time(now(), "now") - _time, 3))) as inflate_latency_sec

If latency > 1 second, KV Store may be oversized.

What if I accidentally send encoded events before templates are loaded

If encoded events arrive before templates, expansion will fail silently until templates load.

Prevention:

Always verify template population BEFORE sending encoded events:

# Wait for this to return > 0:
| inputlookup tenx-dml-lookup | stats count

Recovery (if already happened):

Load the missing templates - Re-send template data via HEC (same format as before) - Wait 2-3 minutes for "Consume KV" to process

Re-index the encoded events (optional)

# If using Kubernetes:
kubectl delete pod <forwarder-pod-name>  # Triggers reprocessing

# If using file-based forwarder:
# Delete offset tracking file, restart forwarder

Verify recovery:

| index=<your-index> sourcetype=tenx_encoded
| head 10 | `tenx-inflate`
# Should now return expanded events

Distributed KV Store setup for multi-node Splunk clusters

For production Splunk clusters, KV Store can be: - Replicated (HA across nodes) - Partitioned (scaled across multiple collections)

For 3-node Splunk cluster:

KV Store automatically replicates to all nodes (no special config). To verify:

# On each node:
| rest /servicesNS/nobody/tenx-for-splunk/storage/collections/config
| search title="tenx_dml"
| table label, acl{}.perms

All three nodes should return the same collection.

Performance optimization for distributed setup:

# In app's local/collections.conf (or via REST):

[tenx_dml]
field.pattern_hash = string
field.pattern = string
accelerated_fields = pattern_hash  # Index pattern_hash for faster lookups

This creates an index on pattern_hash (faster expansion macro joins).

For very large clusters (10+ nodes):

Consider dedicated KV Store nodes:

Edit: $SPLUNK_HOME/etc/system/local/server.conf
[sslConfig]
serverRepositories = <list-of-kv-store-only-nodes>

Monitoring cluster KV Store health:

| rest /servicesNS/nobody/tenx-for-splunk/storage/collections/data/tenx_dml
| stats count as templates_primary
| append
  [| rest /servicesNS/nobody/tenx-for-splunk/storage/collections/data/tenx_dml
   | stats count as templates_replica]

Both should be equal (healthy replication).