ClickHouse
Search and visualize compact events in ClickHouse losslessly: the decoder returns the full original log text at query time. This open-source decoder ships as pure SQL, no plugin and no binary, and exposes a tenx.events view that returns full original log text at query time. Existing Grafana dashboards, alerts, and SQL queries continue to work against the compacted data without modification.
Works identically on self-hosted ClickHouse, Altinity Cloud, and ClickHouse Cloud. The install is a single SQL file applied via standard clickhouse-client --multiquery.
How It Works
The ClickHouse port differs structurally from the Splunk app and L1ES plugin. Splunk uses a JavaScript search hook plus a SPL macro. Elasticsearch uses a plugin that intercepts queries and rewrites them. ClickHouse needs neither, because it already exposes the right primitives natively:
- Templates dictionary: ClickHouse's built-in
Dictionarykeeps the template patterns in memory with automatic refresh from a backing table - Decoded view: a standard
CREATE VIEWjoins encoded events with their templates and returns expanded text via SQL UDFs that ClickHouse compiles into the query plan
Users SELECT FROM tenx.events. The view does the work. No client-side rewriting, no server plugin, no extra service.
Ingestion Flow
Events are compact at the edge and ingested into ClickHouse with reduced payload size:
graph LR
A["<div style='font-size: 14px;'>🗜️ Compact</div><div style='font-size: 10px;'>Compact Events</div>"] --> B["<div style='font-size: 14px;'>📡 Ingest</div><div style='font-size: 10px;'>HTTP / native</div>"]
B --> C["<div style='font-size: 14px;'>📋 Templates</div><div style='font-size: 10px;'>tenx.templates</div>"]
B --> D["<div style='font-size: 14px;'>💾 Encoded</div><div style='font-size: 10px;'>tenx.encoded_events</div>"]
C --> E["<div style='font-size: 14px;'>📚 Dictionary</div><div style='font-size: 10px;'>auto-refresh</div>"]
classDef edge fill:#7c3aed88,stroke:#6d28d9,color:#ffffff,stroke-width:2px,rx:8,ry:8
classDef ingest fill:#9333ea88,stroke:#7c3aed,color:#ffffff,stroke-width:2px,rx:8,ry:8
classDef store fill:#2563eb88,stroke:#1d4ed8,color:#ffffff,stroke-width:2px,rx:8,ry:8
classDef dict fill:#059669,stroke:#047857,color:#ffffff,stroke-width:2px,rx:8,ry:8
class A edge
class B ingest
class C,D store
class E dict
🗜️ Compact: Receiver (Compact mode) compacts events, extracting repetitive patterns into templates
📡 Ingest: Encoded events forwarded to ClickHouse via the HTTP interface or native protocol, with reduced payload size
📋 Templates: Templates stored in the tenx.templates MergeTree table
💾 Encoded: Compact events stored in tenx.encoded_events, with materialized columns extracting the template hash and Kubernetes envelope fields at INSERT time for fast filter pushdown
📚 Dictionary: tenx.templates_dict loads the templates into memory and auto-refreshes every 60-120 seconds from the source table
Search Flow
There is no client interceptor and no server-side query rewriter. Queries run as normal SQL against the view:
graph LR
E["<div style='font-size: 14px;'>👤 User</div><div style='font-size: 10px;'>SQL / Grafana</div>"] --> F["<div style='font-size: 14px;'>📋 View</div><div style='font-size: 10px;'>tenx.events</div>"]
F --> G["<div style='font-size: 14px;'>🔎 Lookup</div><div style='font-size: 10px;'>Dictionary</div>"]
G --> H["<div style='font-size: 14px;'>📖 Expand</div><div style='font-size: 10px;'>SQL UDFs</div>"]
H --> I["<div style='font-size: 14px;'>📊 Results</div><div style='font-size: 10px;'>Full Data</div>"]
classDef user fill:#059669,stroke:#047857,color:#ffffff,stroke-width:2px,rx:8,ry:8
classDef view fill:#f59e0b,stroke:#d97706,color:#ffffff,stroke-width:2px,rx:8,ry:8
classDef result fill:#ea580c88,stroke:#c2410c,color:#ffffff,stroke-width:2px,rx:8,ry:8
class E user
class F,G,H view
class I result
👤 User: Submits SQL through clickhouse-client, the HTTP API, Grafana, or any ClickHouse client
📋 View: tenx.events joins encoded events with their templates per row
🔎 Lookup: dictGetOrDefault retrieves the template literals and slot patterns from tenx.templates_dict (in-memory, microsecond latency)
📖 Expand: SQL UDFs (tenx_inflate, tenx_inflate_iso) interleave the template literals with the encoded values per row
📊 Results: Full events returned with original field names and values
Compact Events in ClickHouse
A compact event replaces the log message with a template reference and variable values. Here is the same event before and after Compact mode:
Original event (raw):
{"log": "2026-02-25T14:03:22Z INFO [http-handler] POST /api/v2/orders completed in 42ms status=200 bytes=1583 user=acct_7291", "kubernetes": {"container_name": "order-svc", "namespace_name": "checkout", "pod_name": "order-svc-6f8b4d-xk2lp"}}
Compact event (as stored in tenx.encoded_events):
{"log": "~a3f29c01,2026-02-25T14:03:22Z,/api/v2/orders,42,200,1583,acct_7291", "kubernetes": {"container_name": "order-svc", "namespace_name": "checkout", "pod_name": "order-svc-6f8b4d-xk2lp"}}
Expanded event (returned by SELECT FROM tenx.events):
Identical to the original. The view looks up template a3f29c01 in tenx.templates_dict, reconstructs the full message from the template pattern and variable values, and returns it in the decoded_log column. Grafana, BI tools, and SQL clients see the original text.
What changes and what stays the same:
| Field | Compact? | Notes |
|---|---|---|
log (compact payload) |
Yes | Replaced with ~hash,val1,val2,... |
kubernetes.container_name |
No | Extracted into a materialized container column for filter pushdown |
kubernetes.namespace_name, pod_name |
No | Same, materialized for indexed access |
| Templates table | New | Holds the template dictionary; required for decode |
| Existing tables and dashboards | No | Untouched; the view is purely additive |
Compact mode encodes the log message itself while leaving the Kubernetes metadata fields (container_name, namespace_name, pod_name) as normal JSON. That metadata stays indexable, so dashboard filters on container or namespace remain fast and the primary key keeps working at INSERT time. The clickhouse-app User Guide covers the Receiver setting.
Query Behavior
The view exposes six columns: container, namespace, pod, templateHash, encoded_log, decoded_log. The first four are pre-extracted at INSERT time and indexed; the last two are computed at SELECT time.
Filter pushdown works on indexed columns
-- FAST: filter pushes down on indexed columns, only matching rows decode
SELECT decoded_log FROM tenx.events
WHERE container = 'order-svc'
AND templateHash = 'a3f29c01'
LIMIT 100;
ClickHouse evaluates the WHERE clause against the materialized columns first, then runs the decoder only on surviving rows. For pattern-aware queries that filter on templateHash, latency is sub-50 ms even on tables holding millions of events.
Filters on decoded_log cannot push down
-- SLOW: filter operates after decode, every row gets expanded
SELECT decoded_log FROM tenx.events
WHERE decoded_log LIKE '%error%'
LIMIT 100;
The SQL UDFs are opaque to the query optimizer. A LIKE '%error%' against decoded_log forces the view to expand every row in the scan range before the filter can apply.
Query-hygiene rule for users: filter on the indexed columns first (container, templateHash, namespace, time range), then expand the survivors.
Two views for two timestamp behaviors
The install creates two views over the same data. They differ only in how timestamp slots render.
| View | Timestamp output | Speed |
|---|---|---|
tenx.events |
Preserves the original format per template (e.g. 2025-10-02 01:28:24) |
~600 ms fixed cost per query from multiIf format dispatch |
tenx.events_iso |
Normalizes to ISO 8601 (2025-10-02T01:28:24.000Z) regardless of template |
Sub-100 ms full-table decode at ~2.3M rows/sec |
Both views expose identical column shape; switching is a one-identifier change. Use tenx.events_iso when downstream tooling accepts ISO 8601 (most Grafana time pickers, most BI tools); use tenx.events when alerts or compliance rules depend on the original format.
Quickstart
Get compact events flowing into ClickHouse in under 15 minutes.
Step 1: Apply the SQL Schema
Clone the repository and apply install.sql against your ClickHouse server. Works identically on self-hosted, Altinity Cloud, and ClickHouse Cloud.
Prerequisites:
| Requirement | Description |
|---|---|
| ClickHouse | Version 24.x or later (tested through 26.x) |
| Access | SQL access sufficient to create databases, tables, dictionaries, functions, and views |
Verify Installation
After applying, confirm the six decoder functions and two views exist:
Expected: 6 functions (tenx_inflate, tenx_inflate_iso, tenx_inflate_core, tenx_inflate_core_iso, tenx_substitute_slot, tenx_substitute_slot_iso)
Expected: templates, encoded_events, events, events_iso
Step 2: Configure Forwarder
Configure your log forwarder to send compact events and templates to ClickHouse.
Include 10x optimizer configuration:
@INCLUDE ${TENX_MODULES}/pipelines/run/modules/input/forwarder/fluentbit/conf/tenx-optimize.conf
@INCLUDE ${TENX_MODULES}/pipelines/run/modules/input/forwarder/fluentbit/conf/tenx-unix.conf
Configure ClickHouse HTTP outputs:
# ========================= TEMPLATES OUTPUT =========================
# Routes templates to tenx.templates via the HTTP interface
[OUTPUT]
Name http
Match tenx-template
Host your-clickhouse-host.com
Port 8123
URI /?query=INSERT+INTO+tenx.templates+(templateHash,template)+FORMAT+JSONEachRow
Format json_lines
Header Content-Type application/x-ndjson
TLS On
TLS.Verify Off
# ========================= ENCODED EVENTS OUTPUT ====================
# Routes encoded log events to tenx.encoded_events as one line per row
[OUTPUT]
Name http
Match_Regex ^(?!tenx-template).*
Host your-clickhouse-host.com
Port 8123
URI /?query=INSERT+INTO+tenx.encoded_events+(raw)+FORMAT+LineAsString
Format plain
TLS On
TLS.Verify Off
Include 10x optimizer configuration:
@include "#{ENV['TENX_MODULES']}/pipelines/run/modules/input/forwarder/fluentd/conf/tenx-optimize-unix.conf"
Configure ClickHouse HTTP outputs using fluent-plugin-clickhouse or fluent-plugin-http:
<match tenx-template>
@type http
endpoint http://your-clickhouse-host.com:8123/?query=INSERT+INTO+tenx.templates+(templateHash,template)+FORMAT+JSONEachRow
content_type application/x-ndjson
<format>
@type json
</format>
</match>
<match **>
@type http
endpoint http://your-clickhouse-host.com:8123/?query=INSERT+INTO+tenx.encoded_events+(raw)+FORMAT+LineAsString
content_type text/plain
<format>
@type single_value
message_key log
</format>
</match>
Configure exporters in your OTel config using the ClickHouse exporter:
exporters:
clickhouse/templates:
endpoint: tcp://your-clickhouse-host.com:9000?secure=true
database: tenx
logs_table_name: templates
tls:
insecure_skip_verify: true
clickhouse/encoded:
endpoint: tcp://your-clickhouse-host.com:9000?secure=true
database: tenx
logs_table_name: encoded_events
tls:
insecure_skip_verify: true
service:
pipelines:
logs/templates:
receivers: [tenx_templates]
exporters: [clickhouse/templates]
logs/encoded:
receivers: [tenx_encoded]
exporters: [clickhouse/encoded]
Step 3: Verify End-to-End
Run these queries to confirm everything is working:
1. Check templates are arriving:
Expected: count > 02. Force a dictionary refresh, then check it loaded:
SYSTEM RELOAD DICTIONARY tenx.templates_dict;
SELECT status, element_count FROM system.dictionaries
WHERE name = 'templates_dict';
LOADED, element_count > 0
3. Check encoded events are indexed:
Expected: count > 04. Query the view and verify decoded text:
Expected: full expanded events, not~hash,val1,val2... format
Step 4: Point Grafana at the View
In your Grafana ClickHouse data source, query tenx.events (or tenx.events_iso) as if it were a normal logs table. Filter on container, templateHash, namespace, or time range for sub-50 ms response on indexed columns. No dashboard rebuild, no panel changes, no query rewrites.
For new dashboards that want template-aware filtering, add WHERE templateHash IN ('hash1', 'hash2') to scope to specific patterns. Pattern hashes are visible in the Reporter cost attribution views and via the MCP server's top_patterns tool.
Verification Checklist
Use this checklist to diagnose issues at each stage of the pipeline.
Schema Applied?
Test:
| Result | Meaning | Action |
|---|---|---|
| 6 | Schema complete | Proceed to templates check |
| < 6 | Partial install | Re-apply install.sql (idempotent) |
| 0 | Schema not applied | Apply install.sql |
Templates Loaded?
Test:
| Result | Meaning | Action |
|---|---|---|
| Count > 0 | Templates present | Proceed to dictionary check |
| Count = 0 | No templates received | Check forwarder config, verify templates routed to tenx.templates |
Dictionary Refreshed?
Test:
SELECT status, element_count, last_successful_update_time
FROM system.dictionaries WHERE name = 'templates_dict';
| Result | Meaning | Action |
|---|---|---|
status = LOADED, element_count > 0 |
Working | Proceed to view check |
status = LOADED, element_count = 0 |
Templates table empty | Load templates first |
status = NOT_LOADED or FAILED |
Reload failed | SYSTEM RELOAD DICTIONARY tenx.templates_dict; check ClickHouse logs |
View Returning Expanded Results?
Test:
| Result | Meaning | Action |
|---|---|---|
| Full expanded text | Working correctly | Done |
~hash,val1,val2,... |
Template missing from dictionary | Verify SELECT count() FROM tenx.templates WHERE templateHash = '<hash>'; force reload |
| Empty | No matching encoded events | Verify ingestion path |
Troubleshooting
Events Show Raw Compact Form
Symptom: decoded_log returns ~hash,val1,val2,... instead of expanded text.
Common Causes:
| Cause | Solution |
|---|---|
| Template missing from dictionary | Check SELECT count() FROM tenx.templates WHERE templateHash = '<the-hash>' |
| Dictionary not refreshed since templates loaded | Run SYSTEM RELOAD DICTIONARY tenx.templates_dict |
| Template never produced by Receiver | Check Receiver logs; the template may have been filtered out before forward |
Slow Queries on Small Result Sets (Cloud)
Symptom: Queries against tenx.events consistently take ~600 ms even when the result set is small.
Cause: The tenx.events view uses a multiIf dispatch over 17 timestamp format patterns. Each multiIf branch passes a constant format string to formatDateTimeInJodaSyntax (ClickHouse requires the format to be a compile-time constant). The dispatch itself has a fixed cost per query.
Solution: Switch to tenx.events_iso if your downstream tooling accepts ISO 8601 timestamps. That view uses a single constant format and runs at native ClickHouse scan speed.
Dictionary Stuck at Old Templates
Symptom: Newly-loaded templates do not appear in tenx.events queries.
Cause: Default LIFETIME(MIN 60 MAX 120) means the dictionary refreshes every 60-120 seconds; new templates take up to 2 minutes to become queryable.
Solution for high-throughput workloads: Tighten the LIFETIME by dropping and recreating the dictionary:
DROP DICTIONARY tenx.templates_dict;
CREATE DICTIONARY tenx.templates_dict
(templateHash String, literals Array(String), slots Array(String))
PRIMARY KEY templateHash
SOURCE(CLICKHOUSE(TABLE 'templates' DB 'tenx'))
LAYOUT(COMPLEX_KEY_HASHED())
LIFETIME(MIN 5 MAX 15);
Or force an immediate refresh after every batch of templates: SYSTEM RELOAD DICTIONARY tenx.templates_dict.
Full-Text Search Is Slow
Symptom: WHERE decoded_log LIKE '%error%' is slow even on small tables.
Cause: The SQL UDFs that expand events are opaque to the query optimizer. ClickHouse cannot push the filter through the decode; it expands every row in the scan range.
Solution: Filter on indexed columns first. The view exposes container, templateHash, namespace, pod as pre-materialized indexed columns. For pattern-aware search, use WHERE templateHash IN ('hash1', 'hash2').
For full-text search across expanded text at scale, materialize a separate index downstream, or use Retriever to fetch and expand the exact offloaded events from your own S3 bucket on demand.
Components
| Component | Description |
|---|---|
| install.sql | Single-file install: database, tables, dictionary, six SQL UDFs, two views |
| tenx.templates | MergeTree source table for templates, with materialized literals[] and slots[] columns |
| tenx.templates_dict | In-memory COMPLEX_KEY_HASHED dictionary over the templates table, with auto-refresh |
| tenx.encoded_events | MergeTree table for compact events with materialized columns for filter pushdown |
| tenx.events | View that returns expanded log text, preserving the original timestamp format per template |
| tenx.events_iso | View that returns expanded log text with all timestamps normalized to ISO 8601 |
| Helm chart | Job-pattern install against an existing ClickHouse service in Kubernetes |
Platform Support
| Platform | Status |
|---|---|
| ClickHouse self-hosted | Supported (any 24.x or later; tested through 26.x) |
| Altinity Cloud | Supported (same install) |
| ClickHouse Cloud | Supported (same install, pure SQL works inside Cloud's executable-UDF restrictions) |
| ClickHouse on Kubernetes (Helm) | Supported via the included starter chart |
CI tests every change against ClickHouse 24.8 (LTS), 25.3, and latest.
Production Operations
Templates table is critical infrastructure
Compact events cannot be expanded without the templates table. Treat its availability and integrity at least as carefully as authentication data. For a replicated cluster:
CREATE TABLE tenx.templates ON CLUSTER '{cluster}'
(
templateHash String,
template String,
literals Array(String) MATERIALIZED splitByRegexp('\\$\\([^)]*\\)|\\$', template),
slots Array(String) MATERIALIZED extractAll(template, '\\$\\([^)]*\\)|\\$')
)
ENGINE = ReplicatedMergeTree('/clickhouse/tables/{shard}/templates', '{replica}')
ORDER BY templateHash;
Back up the templates table at least daily, replicate it across availability zones, and test restore procedures.
Codec selection
ClickHouse's column compression codec (LZ4 default, ZSTD optional) interacts with the templating layer. Measured on a 200 MB OpenTelemetry-demo sample:
| Codec | Raw on disk | Compact on disk | Reduction |
|---|---|---|---|
| LZ4 (default) | 10.83 MiB | 6.23 MiB | 42% |
| ZSTD (level 3) | 3.37 MiB | 2.45 MiB | 27% |
ZSTD overlaps significantly with the templating layer because both find structural redundancy; the on-disk reduction is smaller under ZSTD. LZ4 leaves more headroom for the templating contribution.
Query performance
Measured on the same 200 MB sample:
| Workload | tenx.events (multiIf) |
tenx.events_iso (ISO 8601) |
|---|---|---|
| Decode 100 rows | 620 ms | 82 ms |
| Decode 10,000 rows | 2.26 s | 66 ms |
| Decode full table (137,418 rows) | 6.7 s | 60 ms |
The ISO variant runs at near-native ClickHouse scan speed because it stays inside the vectorized execution engine with no per-row format dispatch.
Operator Checklist
| Category | Check | How to Verify |
|---|---|---|
| Install | Schema applied | SELECT count() FROM system.functions WHERE name LIKE 'tenx%' returns 6 |
| Setup | Tables, dictionary, views present | SELECT name FROM system.tables WHERE database = 'tenx' lists all four |
| Forwarder | Templates routed to tenx.templates |
SELECT count() FROM tenx.templates > 0 |
| Forwarder | Compact events routed to tenx.encoded_events |
SELECT count() FROM tenx.encoded_events > 0 |
| Dictionary | Loaded with non-zero element count | SELECT status, element_count FROM system.dictionaries WHERE name = 'templates_dict' |
| Search | View returns expanded text | SELECT decoded_log FROM tenx.events WHERE templateHash != '' LIMIT 1 |
| Backup | Templates table backed up | Same retention + replication policy as authentication data |
| Codec | Codec chosen with eyes open | Configured CODEC(...) on tenx.encoded_events.raw matches storage vs CPU trade |
This decoder is open source. View on GitHub.