Skip to content

ClickHouse

Search and visualize compact events in ClickHouse losslessly: the decoder returns the full original log text at query time. This open-source decoder ships as pure SQL, no plugin and no binary, and exposes a tenx.events view that returns full original log text at query time. Existing Grafana dashboards, alerts, and SQL queries continue to work against the compacted data without modification.

Works identically on self-hosted ClickHouse, Altinity Cloud, and ClickHouse Cloud. The install is a single SQL file applied via standard clickhouse-client --multiquery.


How It Works

The ClickHouse port differs structurally from the Splunk app and L1ES plugin. Splunk uses a JavaScript search hook plus a SPL macro. Elasticsearch uses a plugin that intercepts queries and rewrites them. ClickHouse needs neither, because it already exposes the right primitives natively:

  1. Templates dictionary: ClickHouse's built-in Dictionary keeps the template patterns in memory with automatic refresh from a backing table
  2. Decoded view: a standard CREATE VIEW joins encoded events with their templates and returns expanded text via SQL UDFs that ClickHouse compiles into the query plan

Users SELECT FROM tenx.events. The view does the work. No client-side rewriting, no server plugin, no extra service.

Ingestion Flow

Events are compact at the edge and ingested into ClickHouse with reduced payload size:

graph LR
    A["<div style='font-size: 14px;'>🗜️ Compact</div><div style='font-size: 10px;'>Compact Events</div>"] --> B["<div style='font-size: 14px;'>📡 Ingest</div><div style='font-size: 10px;'>HTTP / native</div>"]
    B --> C["<div style='font-size: 14px;'>📋 Templates</div><div style='font-size: 10px;'>tenx.templates</div>"]
    B --> D["<div style='font-size: 14px;'>💾 Encoded</div><div style='font-size: 10px;'>tenx.encoded_events</div>"]
    C --> E["<div style='font-size: 14px;'>📚 Dictionary</div><div style='font-size: 10px;'>auto-refresh</div>"]

    classDef edge fill:#7c3aed88,stroke:#6d28d9,color:#ffffff,stroke-width:2px,rx:8,ry:8
    classDef ingest fill:#9333ea88,stroke:#7c3aed,color:#ffffff,stroke-width:2px,rx:8,ry:8
    classDef store fill:#2563eb88,stroke:#1d4ed8,color:#ffffff,stroke-width:2px,rx:8,ry:8
    classDef dict fill:#059669,stroke:#047857,color:#ffffff,stroke-width:2px,rx:8,ry:8

    class A edge
    class B ingest
    class C,D store
    class E dict

🗜️ Compact: Receiver (Compact mode) compacts events, extracting repetitive patterns into templates

📡 Ingest: Encoded events forwarded to ClickHouse via the HTTP interface or native protocol, with reduced payload size

📋 Templates: Templates stored in the tenx.templates MergeTree table

💾 Encoded: Compact events stored in tenx.encoded_events, with materialized columns extracting the template hash and Kubernetes envelope fields at INSERT time for fast filter pushdown

📚 Dictionary: tenx.templates_dict loads the templates into memory and auto-refreshes every 60-120 seconds from the source table

Search Flow

There is no client interceptor and no server-side query rewriter. Queries run as normal SQL against the view:

graph LR
    E["<div style='font-size: 14px;'>👤 User</div><div style='font-size: 10px;'>SQL / Grafana</div>"] --> F["<div style='font-size: 14px;'>📋 View</div><div style='font-size: 10px;'>tenx.events</div>"]
    F --> G["<div style='font-size: 14px;'>🔎 Lookup</div><div style='font-size: 10px;'>Dictionary</div>"]
    G --> H["<div style='font-size: 14px;'>📖 Expand</div><div style='font-size: 10px;'>SQL UDFs</div>"]
    H --> I["<div style='font-size: 14px;'>📊 Results</div><div style='font-size: 10px;'>Full Data</div>"]

    classDef user fill:#059669,stroke:#047857,color:#ffffff,stroke-width:2px,rx:8,ry:8
    classDef view fill:#f59e0b,stroke:#d97706,color:#ffffff,stroke-width:2px,rx:8,ry:8
    classDef result fill:#ea580c88,stroke:#c2410c,color:#ffffff,stroke-width:2px,rx:8,ry:8

    class E user
    class F,G,H view
    class I result

👤 User: Submits SQL through clickhouse-client, the HTTP API, Grafana, or any ClickHouse client

📋 View: tenx.events joins encoded events with their templates per row

🔎 Lookup: dictGetOrDefault retrieves the template literals and slot patterns from tenx.templates_dict (in-memory, microsecond latency)

📖 Expand: SQL UDFs (tenx_inflate, tenx_inflate_iso) interleave the template literals with the encoded values per row

📊 Results: Full events returned with original field names and values


Compact Events in ClickHouse

A compact event replaces the log message with a template reference and variable values. Here is the same event before and after Compact mode:

Original event (raw):

{"log": "2026-02-25T14:03:22Z INFO  [http-handler] POST /api/v2/orders completed in 42ms status=200 bytes=1583 user=acct_7291", "kubernetes": {"container_name": "order-svc", "namespace_name": "checkout", "pod_name": "order-svc-6f8b4d-xk2lp"}}

Compact event (as stored in tenx.encoded_events):

{"log": "~a3f29c01,2026-02-25T14:03:22Z,/api/v2/orders,42,200,1583,acct_7291", "kubernetes": {"container_name": "order-svc", "namespace_name": "checkout", "pod_name": "order-svc-6f8b4d-xk2lp"}}

Expanded event (returned by SELECT FROM tenx.events):

Identical to the original. The view looks up template a3f29c01 in tenx.templates_dict, reconstructs the full message from the template pattern and variable values, and returns it in the decoded_log column. Grafana, BI tools, and SQL clients see the original text.

What changes and what stays the same:

Field Compact? Notes
log (compact payload) Yes Replaced with ~hash,val1,val2,...
kubernetes.container_name No Extracted into a materialized container column for filter pushdown
kubernetes.namespace_name, pod_name No Same, materialized for indexed access
Templates table New Holds the template dictionary; required for decode
Existing tables and dashboards No Untouched; the view is purely additive

Compact mode encodes the log message itself while leaving the Kubernetes metadata fields (container_name, namespace_name, pod_name) as normal JSON. That metadata stays indexable, so dashboard filters on container or namespace remain fast and the primary key keeps working at INSERT time. The clickhouse-app User Guide covers the Receiver setting.


Query Behavior

The view exposes six columns: container, namespace, pod, templateHash, encoded_log, decoded_log. The first four are pre-extracted at INSERT time and indexed; the last two are computed at SELECT time.

Filter pushdown works on indexed columns

-- FAST: filter pushes down on indexed columns, only matching rows decode
SELECT decoded_log FROM tenx.events
WHERE container = 'order-svc'
  AND templateHash = 'a3f29c01'
LIMIT 100;

ClickHouse evaluates the WHERE clause against the materialized columns first, then runs the decoder only on surviving rows. For pattern-aware queries that filter on templateHash, latency is sub-50 ms even on tables holding millions of events.

Filters on decoded_log cannot push down

-- SLOW: filter operates after decode, every row gets expanded
SELECT decoded_log FROM tenx.events
WHERE decoded_log LIKE '%error%'
LIMIT 100;

The SQL UDFs are opaque to the query optimizer. A LIKE '%error%' against decoded_log forces the view to expand every row in the scan range before the filter can apply.

Query-hygiene rule for users: filter on the indexed columns first (container, templateHash, namespace, time range), then expand the survivors.

Two views for two timestamp behaviors

The install creates two views over the same data. They differ only in how timestamp slots render.

View Timestamp output Speed
tenx.events Preserves the original format per template (e.g. 2025-10-02 01:28:24) ~600 ms fixed cost per query from multiIf format dispatch
tenx.events_iso Normalizes to ISO 8601 (2025-10-02T01:28:24.000Z) regardless of template Sub-100 ms full-table decode at ~2.3M rows/sec

Both views expose identical column shape; switching is a one-identifier change. Use tenx.events_iso when downstream tooling accepts ISO 8601 (most Grafana time pickers, most BI tools); use tenx.events when alerts or compliance rules depend on the original format.


Quickstart

Get compact events flowing into ClickHouse in under 15 minutes.

Step 1: Apply the SQL Schema

Clone the repository and apply install.sql against your ClickHouse server. Works identically on self-hosted, Altinity Cloud, and ClickHouse Cloud.

git clone https://github.com/log-10x/clickhouse-app.git
docker exec -i my-clickhouse clickhouse-client --multiquery \
    < clickhouse-app/tenx-for-clickhouse/install.sql
git clone https://github.com/log-10x/clickhouse-app.git
clickhouse-client --multiquery \
    < clickhouse-app/tenx-for-clickhouse/install.sql
clickhouse-client --host <your-cloud-host>.clickhouse.cloud \
                  --port 9440 --secure \
                  --user <your-user> --password <your-password> \
                  --multiquery < clickhouse-app/tenx-for-clickhouse/install.sql

Or paste the contents of install.sql into the Cloud Console SQL editor.

Prerequisites:

Requirement Description
ClickHouse Version 24.x or later (tested through 26.x)
Access SQL access sufficient to create databases, tables, dictionaries, functions, and views
Verify Installation

After applying, confirm the six decoder functions and two views exist:

SELECT name FROM system.functions
WHERE name LIKE 'tenx%' ORDER BY name;

Expected: 6 functions (tenx_inflate, tenx_inflate_iso, tenx_inflate_core, tenx_inflate_core_iso, tenx_substitute_slot, tenx_substitute_slot_iso)

SELECT name FROM system.tables WHERE database = 'tenx';

Expected: templates, encoded_events, events, events_iso

Step 3: Verify End-to-End

Run these queries to confirm everything is working:

1. Check templates are arriving:

SELECT count() FROM tenx.templates;
Expected: count > 0

2. Force a dictionary refresh, then check it loaded:

SYSTEM RELOAD DICTIONARY tenx.templates_dict;
SELECT status, element_count FROM system.dictionaries
WHERE name = 'templates_dict';
Expected: status = LOADED, element_count > 0

3. Check encoded events are indexed:

SELECT count() FROM tenx.encoded_events;
Expected: count > 0

4. Query the view and verify decoded text:

SELECT decoded_log FROM tenx.events
WHERE templateHash != ''
LIMIT 5;
Expected: full expanded events, not ~hash,val1,val2... format


Verification Checklist

Use this checklist to diagnose issues at each stage of the pipeline.

Schema Applied?

Test:

SELECT count() FROM system.functions WHERE name LIKE 'tenx%';

Result Meaning Action
6 Schema complete Proceed to templates check
< 6 Partial install Re-apply install.sql (idempotent)
0 Schema not applied Apply install.sql
Templates Loaded?

Test:

SELECT count() FROM tenx.templates;

Result Meaning Action
Count > 0 Templates present Proceed to dictionary check
Count = 0 No templates received Check forwarder config, verify templates routed to tenx.templates
Dictionary Refreshed?

Test:

SELECT status, element_count, last_successful_update_time
FROM system.dictionaries WHERE name = 'templates_dict';

Result Meaning Action
status = LOADED, element_count > 0 Working Proceed to view check
status = LOADED, element_count = 0 Templates table empty Load templates first
status = NOT_LOADED or FAILED Reload failed SYSTEM RELOAD DICTIONARY tenx.templates_dict; check ClickHouse logs
View Returning Expanded Results?

Test:

SELECT decoded_log FROM tenx.events
WHERE templateHash != ''
LIMIT 1;

Result Meaning Action
Full expanded text Working correctly Done
~hash,val1,val2,... Template missing from dictionary Verify SELECT count() FROM tenx.templates WHERE templateHash = '<hash>'; force reload
Empty No matching encoded events Verify ingestion path

Troubleshooting

Events Show Raw Compact Form

Symptom: decoded_log returns ~hash,val1,val2,... instead of expanded text.

Common Causes:

Cause Solution
Template missing from dictionary Check SELECT count() FROM tenx.templates WHERE templateHash = '<the-hash>'
Dictionary not refreshed since templates loaded Run SYSTEM RELOAD DICTIONARY tenx.templates_dict
Template never produced by Receiver Check Receiver logs; the template may have been filtered out before forward
Slow Queries on Small Result Sets (Cloud)

Symptom: Queries against tenx.events consistently take ~600 ms even when the result set is small.

Cause: The tenx.events view uses a multiIf dispatch over 17 timestamp format patterns. Each multiIf branch passes a constant format string to formatDateTimeInJodaSyntax (ClickHouse requires the format to be a compile-time constant). The dispatch itself has a fixed cost per query.

Solution: Switch to tenx.events_iso if your downstream tooling accepts ISO 8601 timestamps. That view uses a single constant format and runs at native ClickHouse scan speed.

SELECT decoded_log FROM tenx.events_iso
WHERE container = 'order-svc'
LIMIT 100;
Dictionary Stuck at Old Templates

Symptom: Newly-loaded templates do not appear in tenx.events queries.

Cause: Default LIFETIME(MIN 60 MAX 120) means the dictionary refreshes every 60-120 seconds; new templates take up to 2 minutes to become queryable.

Solution for high-throughput workloads: Tighten the LIFETIME by dropping and recreating the dictionary:

DROP DICTIONARY tenx.templates_dict;
CREATE DICTIONARY tenx.templates_dict
(templateHash String, literals Array(String), slots Array(String))
PRIMARY KEY templateHash
SOURCE(CLICKHOUSE(TABLE 'templates' DB 'tenx'))
LAYOUT(COMPLEX_KEY_HASHED())
LIFETIME(MIN 5 MAX 15);

Or force an immediate refresh after every batch of templates: SYSTEM RELOAD DICTIONARY tenx.templates_dict.

Full-Text Search Is Slow

Symptom: WHERE decoded_log LIKE '%error%' is slow even on small tables.

Cause: The SQL UDFs that expand events are opaque to the query optimizer. ClickHouse cannot push the filter through the decode; it expands every row in the scan range.

Solution: Filter on indexed columns first. The view exposes container, templateHash, namespace, pod as pre-materialized indexed columns. For pattern-aware search, use WHERE templateHash IN ('hash1', 'hash2').

For full-text search across expanded text at scale, materialize a separate index downstream, or use Retriever to fetch and expand the exact offloaded events from your own S3 bucket on demand.


Components

Component Description
install.sql Single-file install: database, tables, dictionary, six SQL UDFs, two views
tenx.templates MergeTree source table for templates, with materialized literals[] and slots[] columns
tenx.templates_dict In-memory COMPLEX_KEY_HASHED dictionary over the templates table, with auto-refresh
tenx.encoded_events MergeTree table for compact events with materialized columns for filter pushdown
tenx.events View that returns expanded log text, preserving the original timestamp format per template
tenx.events_iso View that returns expanded log text with all timestamps normalized to ISO 8601
Helm chart Job-pattern install against an existing ClickHouse service in Kubernetes

Platform Support

Platform Status
ClickHouse self-hosted Supported (any 24.x or later; tested through 26.x)
Altinity Cloud Supported (same install)
ClickHouse Cloud Supported (same install, pure SQL works inside Cloud's executable-UDF restrictions)
ClickHouse on Kubernetes (Helm) Supported via the included starter chart

CI tests every change against ClickHouse 24.8 (LTS), 25.3, and latest.


Production Operations

Templates table is critical infrastructure

Compact events cannot be expanded without the templates table. Treat its availability and integrity at least as carefully as authentication data. For a replicated cluster:

CREATE TABLE tenx.templates ON CLUSTER '{cluster}'
(
    templateHash String,
    template     String,
    literals     Array(String) MATERIALIZED splitByRegexp('\\$\\([^)]*\\)|\\$', template),
    slots        Array(String) MATERIALIZED extractAll(template, '\\$\\([^)]*\\)|\\$')
)
ENGINE = ReplicatedMergeTree('/clickhouse/tables/{shard}/templates', '{replica}')
ORDER BY templateHash;

Back up the templates table at least daily, replicate it across availability zones, and test restore procedures.

Codec selection

ClickHouse's column compression codec (LZ4 default, ZSTD optional) interacts with the templating layer. Measured on a 200 MB OpenTelemetry-demo sample:

Codec Raw on disk Compact on disk Reduction
LZ4 (default) 10.83 MiB 6.23 MiB 42%
ZSTD (level 3) 3.37 MiB 2.45 MiB 27%

ZSTD overlaps significantly with the templating layer because both find structural redundancy; the on-disk reduction is smaller under ZSTD. LZ4 leaves more headroom for the templating contribution.

Query performance

Measured on the same 200 MB sample:

Workload tenx.events (multiIf) tenx.events_iso (ISO 8601)
Decode 100 rows 620 ms 82 ms
Decode 10,000 rows 2.26 s 66 ms
Decode full table (137,418 rows) 6.7 s 60 ms

The ISO variant runs at near-native ClickHouse scan speed because it stays inside the vectorized execution engine with no per-row format dispatch.

Operator Checklist

Category Check How to Verify
Install Schema applied SELECT count() FROM system.functions WHERE name LIKE 'tenx%' returns 6
Setup Tables, dictionary, views present SELECT name FROM system.tables WHERE database = 'tenx' lists all four
Forwarder Templates routed to tenx.templates SELECT count() FROM tenx.templates > 0
Forwarder Compact events routed to tenx.encoded_events SELECT count() FROM tenx.encoded_events > 0
Dictionary Loaded with non-zero element count SELECT status, element_count FROM system.dictionaries WHERE name = 'templates_dict'
Search View returns expanded text SELECT decoded_log FROM tenx.events WHERE templateHash != '' LIMIT 1
Backup Templates table backed up Same retention + replication policy as authentication data
Codec Codec chosen with eyes open Configured CODEC(...) on tenx.encoded_events.raw matches storage vs CPU trade


This decoder is open source. View on GitHub.