FAQ

The Compiler extracts symbol vocabulary from source code and container images to enable the 10x Engine's JIT runtime to recognize log event structure.

Security & Access

What does the compiler do

The compiler creates a vocabulary for the runtime to recognize your logs. At runtime, the 10x Engine tokenizes incoming events and matches tokens against the symbol library. When symbols match, the engine infers event structure and creates cached hidden classes (TenXTemplates). Subsequent events with the same structure reuse the cached template — fast, accurate, and memory-efficient.

Without a symbol library: The runtime still processes events using JIT inference — creating templates on-demand as events arrive.

With a symbol library: The runtime recognizes patterns upfront, builds more accurate templates, and improves message extraction for cost tracking and metrics — the core value of the engine.

What information does the compiler generate

The compiler extracts symbol values — tokens that appear in log statements, such as string literals, enum values, and format patterns. Extracted string constants — from log statement formats (e.g., "error in %s"), binary executables (via strings), or text/JSON/YAML files — are stored as 64-bit hashes in the symbol library. The original plain text is never written to the output. The runtime engine matches incoming events against these hashes, not against your source strings.

Class and method names are stored alongside hashes to identify the source of each log statement — comparable to what appears in a stack trace.

The compiler does not extract application logic, control flow, full source file contents, or environment variables. It does not read .env files or runtime state.

Output is a symbol library file (.10x.tar) containing hash mappings and their source locations. You can inspect the contents before deploying (tar tf library.10x.tar).

What permissions does the compiler need, and how do I scope access

The compiler needs read-only access to the sources you configure:

Source	Credential	Scope
GitHub repos	`GH_TOKEN`	Fine-grained PAT with `Contents: Read-only` scoped to target repos (recommended). Classic PAT with `repo` scope also supported. Public repos need no token
Docker images	`DOCKER_USERNAME` + `DOCKER_TOKEN`	Pull access to target registries
Helm charts	Same as GitHub + Docker	Resolves chart image references, then pulls each image
Artifactory	`ARTIFACTORY_TOKEN`	Read access to target repositories

You control exactly which repos and images the compiler accesses via your configuration. The compiler only pulls sources you explicitly list — it does not discover or scan anything beyond what you configure.

All processing runs inside your infrastructure (CI/CD pipeline or k8s CronJob). No source code or symbol data is sent to Log10x or any external service.

Does any data leave my network

All source code and symbol output stays inside your infrastructure. No source code, symbol data, or log content is transmitted to Log10x or any external service.

The only outbound calls are:

Pulling the 10x container image from ghcr.io (can be mirrored to a private registry for air-gapped environments)
Billing telemetry — engines send lightweight Prometheus metrics to prometheus.log10x.com over TLS 1.3. See Metrics API for the complete list of metrics and labels. Metric payloads contain no log content, no source code, and no PII. For self-managed deployments, see License for air-gap configuration

Input (source code, container images) is pulled from your own repositories. Output (symbol library files) is written to your own storage or pushed to your own GitHub repo via GitOps.

Can I run the compiler in an air-gapped environment with no egress

Yes. The two outbound calls (container image pull + billing telemetry) can both be internalized:

Container image — Mirror the 10x image from ghcr.io to your private registry. Point the CronJob or CI/CD step at your internal registry instead.
Billing telemetry — Deploy a local License Receiver to capture engine heartbeats internally. Quarterly usage reports are exported manually.

With both in place, the compiler runs with zero egress. All source access is to your own internal repos and registries.

When to Use

Do I need to run the compiler, or do the built-in defaults cover my stack

The 10x runtime ships with a default symbol library covering 150+ open-source frameworks across Java, Python, Node.js, Go, C++, Rust, Ruby, .NET, and Scala — including Kubernetes, OpenTelemetry, Spring Boot, Django, Express, nginx, PostgreSQL, Redis, Kafka, and many more.

Start with the defaults. Run the Dev app on your own log files to see how much the built-in library covers. The runtime reports which events matched known symbols and which did not.

Run the compiler if:

Your applications generate custom logging (lots of proprietary log statements in your code)
You use third-party frameworks not in the default library
You want symbol coverage for your specific application log statements (improves template accuracy for custom events)

What happens to log events that don't have matching symbols

The runtime engine still processes them. At runtime, the engine tokenizes each event and matches tokens against the symbol library, then hashes the symbol+delimiter sequence to upsert into the template map.

With symbol coverage: Tokens are classified as known symbols, producing longer, coherent symbol sequences. This enables accurate message extraction that captures the intended "essence" of the event — essential for cost attribution and metrics.

Without symbol coverage: Tokens are classified as variables, producing shorter, fragmented symbol sequences. Message extraction still works, but makes best effort based on available symbols. The runtime still builds templates and processes events normally, but message precision is reduced. Events still function with ~80% efficiency from structure inference alone — no failure, just less accurate classification.

Coverage

My stack uses common open-source frameworks — do I need to run the compiler at all

The 10x runtime ships with a default symbol library covering 150+ open-source frameworks. If your stack is covered by the defaults, you don't need to run the compiler.

Run the compiler to add custom symbols for your proprietary application code on top of the defaults. This improves template accuracy for your specific log formats. The compiler output supplements the default library, not replaces it.

How to decide: Use the Dev app to verify default coverage on your own log files. If default coverage is sufficient, you're done. If you want to optimize logs from custom frameworks or proprietary services, run the compiler.

I have logs from services where I have no source code — vendor agents, third-party images, AWS managed services. Can the compiler cover these

Source code is not required. The compiler handles closed-source workloads via three approaches:

Docker image scanning — Pull any Docker image from any registry (public or private), including closed-source vendor agents. The compiler exports and scans the image filesystem for symbol content. Images are only re-pulled when their SHA256 changes, making incremental runs efficient. Works for nginx:latest, Datadog agent images, Splunk UF, or any vendor-provided image.
Executable scanning — Runs the OS strings utility on compiled binaries (.so, .dll, .dynlib) to extract symbol values. Runs inside your infrastructure; output is only 64-bit hashes (no binary content leaves your network).
JIT fallback — For truly opaque sources (AWS-managed service logs, SaaS tools with no Docker image), the JIT runtime handles them automatically at runtime. The engine creates a template on the first event seen, giving ~80% efficiency without any compiler setup needed.

Do you support custom and proprietary log formats, or just the 150+ default frameworks

Both. The default library covers common frameworks, but the compiler lets you add custom symbols for proprietary logging patterns. Custom formats work out of the box with 80% coverage from the built-in library.

How it works: The runtime tokenizes incoming events and matches tokens against the symbol library. With 80% built-in coverage, most generic tokens (class names, log levels, timestamps, method names) already match. For logs from your proprietary services, the runtime still processes them, but template inference is less precise.

When to optimize: If you want accurate message extraction for cost attribution and metrics — capturing the intended "essence" of each custom event type — run the compiler on your custom code. This generates additional symbols, so the runtime recognizes your proprietary patterns and builds more accurate templates. The compiler is iterative: initial runs take 10-30 minutes to scan your code. Subsequent runs only process changed files, completing in seconds — perfect for CI/CD pipelines. You point it at your repositories once, then it automatically supplements the default symbol library on each deploy.

See real-world before/after examples showing how symbol coverage improves lossless volume reduction on production logs.

My custom format isn't in the library — does it still work

Yes. The runtime handles unknown formats gracefully:

Events are tokenized and matched against whatever symbols exist in the library
A template is built from the symbol+delimiter sequence found, even if sparse
Message extraction works with best effort based on available symbols

For precise cost tracking, run the compiler on that format — then message extraction captures the intended pattern. Otherwise, the format works as-is without configuration or manual regex rules.

Operations

How often do I need to run the compiler, and what is the maintenance overhead

Minimal. Initial runs take a few minutes; incremental runs on unchanged code complete in seconds. Deploy as a k8s CronJob (every 30 minutes) or trigger from CI/CD on commits.

How it stays lightweight:

Checksums — Unchanged source files are skipped automatically. Only modified code is re-scanned, so incremental runs are fast.
Docker images — Images are only re-pulled when their SHA256 hash changes since the last run. Routine re-runs on an unchanged codebase are near-instantaneous.
Distribution — Edge and cloud apps pull updated symbol libraries via GitOps at startup and poll for changes periodically. Between runs, events are processed using the previous symbol library version — no data is lost.

The compiler runs as a Docker container, processes changed files, and exits. No long-running daemon, no runtime agent, no application code changes.

How do symbols relate to templates, and how do I troubleshoot recognition issues

Symbols are vocabulary — tokens extracted from your source code by the AOT compiler. Templates (TenXTemplates) are cached hidden classes created by the runtime engine when it encounters events that match a known symbol pattern.

The chain: symbol library → runtime engine loads symbols → incoming event matches symbol pattern → engine assigns a cached TenXTemplate → subsequent events with the same structure reuse the template.

To troubleshoot:

Extract and inspect symbol library contents (tar tf library.10x.tar) — verify your target symbols were captured
Run the Dev app locally on test logs and enable debug output to see symbol matching and template assignment in detail
Check the engine log file for runtime template assignment messages

Next: Install