Index

The Index module enables storage queries to fetch blob byte ranges (e.g., AWS S3, Azure) matching a specified app/service name, search term(s) and timestamp range predictably at scale.

Workflow

The index module executes when S3 event notifications are sent directly to SQS queues, triggering index workers to process uploaded files.

The module comprises an input and output stream:

The input stream reads the events from the uploaded file to transform them into TenXObjects.
The output stream performs the following actions for each TenXObject:

Write its template to the index container (if not exists).
Map its timestamp to a Bloom filter associated with a rolling time window specified by indexWriteResolution.
Append its templateHash and vars to the Bloom filter's hash set. Once a filter's size exceeds the Object storage's key byte length (e.g., for AWS S3 1024 bytes), the stream writes it to the index container and assigns a new filter to the time window.

TenXTemplate Filters

TenXTemplate Bloom filters enable parallel traversal of the index container (e.g., S3 bucket) with high test accuracy for fetching required byte ranges to scan for matching log/trace events.

Separating low-cardinality symbol values into TenXTemplates and writing only template hashes and high-cardinality variables to Bloom filters reduces their volume by over 75% compared to appending both low and high-cardinality values.

Restricting Bloom filter size to the object storage's key length enables batch retrieval of filters via list operations (e.g., AWS S3: 1000 keys/request, Azure: 5000 keys/request, GCP: 1000 keys/request).

graph LR
    A["📥 Log File Upload<br/>(S3/Azure/GCP)"] --> B["Read Events<br/>Stream"]
    B --> C["Transform to<br/>TenXObjects"]
    C --> D["⚙️ Process<br/>Each Object"]

    D --> E["Extract Variables<br/>& Template"]
    E --> F["Map to<br/>Time Window"]
    F --> G{"Filter Size<br/>< 1024 bytes?"}

    G -->|Yes| H["Append to<br/>Current Filter"]
    G -->|No| I["Write Filter<br/>to Index"]

    H --> J{"More<br/>Events?"}
    I --> K["Create New<br/>Filter"]
    K --> J

    J -->|Yes| D
    J -->|No| L["✅ Index Objects<br/>Ready for Query"]

    E --> M["Write Template<br/>(if new)"]
    M --> L

    classDef input fill:#2563eb88,stroke:#1d4ed8,color:#ffffff,stroke-width:2px,rx:8,ry:8
    classDef processing fill:#05966988,stroke:#047857,color:#ffffff,stroke-width:2px,rx:8,ry:8
    classDef output fill:#dc262688,stroke:#b91c1c,color:#ffffff,stroke-width:2px,rx:8,ry:8
    classDef decision fill:#7c3aed88,stroke:#6d28d9,color:#ffffff,stroke-width:2px,rx:8,ry:8
    classDef result fill:#ea580c88,stroke:#c2410c,color:#ffffff,stroke-width:2px,rx:8,ry:8

    class A,B,C input
    class D,E,F processing
    class G,J decision
    class H,I,K,M output
    class L result

Compute Resources

Indexing is CPU and memory intensive during file parsing. Default k8s pod resources:

1 CPU and 2GB memory per pod (see deployment guide)
Autoscaling: 2–10 replicas depending on queue depth (default 2 min, scales to 10 if backlog grows)
Throughput: One pod handles ~10–50 GB/day depending on event size and CPU availability

Indexing runs asynchronously — triggered by S3 event notifications, in parallel with queries. Multiple index workers process files concurrently from the SQS queue. Indexes are built once at ingest time and never recomputed.

Cost

Index building cost is part of the k8s pod resource costs — no per-GB indexing fee. You pay:

k8s pod (CPU + memory) running the index workers
S3 storage for index objects (~1–5% overhead vs. original data size)
SQS queue operations (~$0.40 per million messages)

Scaling

If files upload faster than indexing, the SQS queue buffers pending work — no events are lost. Index worker pods scale up automatically via Kubernetes HPA.

Unindexed files remain queryable via full scan (slower than indexed queries but functional).

Deployment topologies:

All-in-one: Single pod cluster handles index, query, and stream roles (suitable for \<100 GB/day)
Separate clusters: Dedicated index/query/stream pods allow independent scaling (recommended for >500 GB/day)

See the deployment guide for sizing guidance.

Config Files

To configure the Object storage index output module, Edit these files.

Options

Specify the options below to configure multiple Object storage index output:

Name	Description	Category
indexObjectStorageName	Object storage logical name	Container
indexReadContainer	Name of the input Object Storage container	Container
indexReadObject	Name of the object storage blob to index	Container
indexWriteContainer	Name of target index container	Container
indexWriteTarget	Logical name identifying the origin of 'indexReadObject'	Output
indexReadExtractMessage	Use extractor for inner message	Parsing
indexReadMessageField	Message field name	Parsing
indexWriteByteRange	Max byte range size to index 'indexReadObject'	Accuracy
indexWriteResolution	Index time window resolution	Accuracy
indexWriteAccuracy	Bloom filter accuracy of index objects.	Accuracy
indexWriteTemplateMergeInterval	Merge template interval	Advanced
indexObjectStorageArgs	Custom Object storage args	Advanced
indexReadPrintProgress	Sets whether this input prints throughput stats to the console	General

Container

`indexObjectStorageName`

Object storage logical name.

Type	Required	Category
String	✔	Container

Identifies the Object Storage containing the blob to index (e.g., AWS).

`indexReadContainer`

Name of the input Object Storage container.

Type	Required	Category
String	✔	Container

Specifies the Object Storage container (e.g., AWS S3 bucket) containing the blob (e.g., log file) to index.

`indexReadObject`

Name of the object storage blob to index.

Type	Required	Category
String	✔	Container

Specifies the blob (e.g., log file) name within indexReadContainer to index.

`indexWriteContainer`

Name of target index container.

Type	Required	Category
String	✔	Container

Specifies the storage container (e.g., AWS S3 bucket) to output TenXTemplate Filters (e.g., TenXTemplates and Bloom filters).

Output

`indexWriteTarget`

Logical name identifying the origin of 'indexReadObject'.

Type	Required	Category
String	✔	Output

Specifies a logical name to store index objects produced for indexReadObject under. This name commonly specifies the app which generated the events enclosed within this blob (e.g. acme-client).

Parsing

`indexReadExtractMessage`

Use extractor for inner message.

Type	Default	Category
Boolean	false	Parsing

Specifies whether to extract an inner field from the entire event json to use as the base for constructing the TenXObject.

`indexReadMessageField`

Message field name.

Type	Default	Category
String	log	Parsing

Name of the actual message field in the event json to use to construct the TenXObject, used only if indexReadExtractMessage is true.

Accuracy

`indexWriteByteRange`

Max byte range size to index 'indexReadObject'.

Type	Required	Category
Number	✔	Accuracy

Controls the chunk size in which to index the target object.

For example, if the target object is 1GB and this value is 2MB, index the object in 2MB segments to ensure matching queries can retrieve chunks vs. all of it unnecessarily. To learn more see: byte range fetches.

`indexWriteResolution`

Index time window resolution.

Type	Required	Category
Number	✔	Accuracy

Controls the index time range resolution.

For example, setting this to 1min means that queries to the index at time ranges greater than 1min (e.g. 15min) will not fetch byte ranges outside the time frame unnecessarily.

The lower this value is, the greater the output index size will be.

This value should satisfy the minimum resolution for querying the index. For example, if queries to the index are in 5-minute increments:

query:
  filter:
    from: $=now("-5m")
    to: $=now()

Setting this value to 5min will create the most efficient index.

`indexWriteAccuracy`

Bloom filter accuracy of index objects.

Type	Required	Category
Number	✔	Accuracy

Controls the accuracy of bloom filter TenXTemplate Filters.

The index output stream produces a list of Bloom filters for each indexWriteResolution and indexWriteByteRange combination of the target blob. Query inputs utilize these filter objects to rule out byte ranges where the query criteria are known NOT to match.

For example, if a target blob weighing 10MB contains events whose timestamps range from the beginning of the hour to 3min later, and indexWriteResolution is set to 1min and indexWriteByteRange is set to 2Mb, up to 6 ranges are indexed separately, where the templateHash and vars members of each TenXObject are within that range are added to a list of bloom filters whose accuracy must not fall below this value. The greater the accuracy, the greater the list of filters is created.

The query input uses Bloom filters to evaluate whether their corresponding byte ranges contain target search terms with an accuracy (i.e., chance of false positive) set by these values. In other words, if this value is 95, there is a 5% chance that a byte range that does NOT contain target terms is fetched and scanned.

Advanced

`indexWriteTemplateMergeInterval`

Merge template interval.

Type	Default	Category
Number	0	Advanced

Specifies the interval to wait between template merge operation. Each index operation stores output TenXTemplate objects in the indexWriteContainer. Index operations merge templates files into a single file periodically, with the period interval set by this value.

`indexObjectStorageArgs`

Custom Object storage args.

Type	Default	Category
List	[]	Advanced

Custom arguments passed as a map to the constructor of the underlying object storage. This list is expected to hold pairs of key values (e.g., args: [key1, value1, key2, value2]).

General

`indexReadPrintProgress`

Sets whether this input prints throughput stats to the console.

Type	Default	Category
Boolean	false	General

Sets whether this input prints throughput stats to the console. This value is commonly used when testing an integration to a remote endpoint.

This module is defined in index/module.yaml.