Lambda

Deploy the Retriever app to AWS Lambda with the terraform-aws-tenx-retriever-lambda module.

Step 1: Prerequisites

Requirement	Description
Log10x License	Your license key (get one)
AWS CLI	Configured with credentials that can create Lambda, SQS, IAM, and S3 notifications
Terraform	Terraform >= 1.5
Docker	For building the container image
ECR Repository	Any ECR repo in the same region — you'll push the retriever image here
S3 Bucket(s)	One for raw log uploads (source), one for index artifacts (can be the same bucket)

Step 2: Build and Push the Container Image

The retriever ships as a Lambda container image. Build it from the engine repo:

# In the l1x-inc engine repo
./gradlew :pipeline:run-lambda:shadowJar

cd pipeline/run-lambda
docker build -t tenx-retriever-lambda:1.0.0 \
  -f Dockerfile \
  ../..

Tag and push to your ECR repo:

REGION=us-east-1
ACCOUNT=$(aws sts get-caller-identity --query Account --output text)
REPO=tenx-retriever-lambda

aws ecr get-login-password --region $REGION \
  | docker login --username AWS --password-stdin $ACCOUNT.dkr.ecr.$REGION.amazonaws.com

docker tag tenx-retriever-lambda:1.0.0 \
  $ACCOUNT.dkr.ecr.$REGION.amazonaws.com/$REPO:1.0.0

docker push $ACCOUNT.dkr.ecr.$REGION.amazonaws.com/$REPO:1.0.0

The resulting image URI is what you pass to the module as image_uri.

Step 3: Configure Terraform

main.tf

provider "aws" {
  region = "us-east-1"
}

data "aws_caller_identity" "current" {}

module "retriever" {
  source = "log-10x/tenx-retriever-lambda/aws"

  name_prefix        = "my-retriever"
  image_uri          = "${data.aws_caller_identity.current.account_id}.dkr.ecr.us-east-1.amazonaws.com/tenx-retriever-lambda:1.0.0"

  # Bring-your-own buckets. Source holds raw logs; index holds the bloom/reverse artifacts.
  # Can be the same bucket (EKS-style single-bucket layout).
  source_bucket_name = "my-raw-log-bucket"
  index_bucket_name  = "my-raw-log-bucket"

  # Scope the S3 event trigger to the prefix/suffix where uploads land.
  # Required when source_bucket_name == index_bucket_name — the module
  # refuses an empty `source_prefix` in that case to prevent the
  # indexer's writes from re-triggering the indexer via the S3
  # notification (recursive-invocation loop).
  source_prefix      = "app/"
  source_suffix      = ".log"

  tenx_api_key       = var.tenx_api_key

  tags = {
    Environment = "production"
    ManagedBy   = "terraform"
  }
}

variable "tenx_api_key" {
  type      = string
  sensitive = true
}

output "query_function_url" {
  value = module.retriever.query_function_url
}

Apply:

terraform init
terraform apply -var="tenx_api_key=YOUR_API_KEY"

Step 4: Verify Indexing

Upload a test file matching the trigger prefix/suffix:

echo "{\"timestamp\":\"$(date -u +%Y-%m-%dT%H:%M:%SZ)\",\"level\":\"ERROR\",\"message\":\"Test error\"}" > test.log
aws s3 cp test.log s3://my-raw-log-bucket/app/test.log

Within 10-20 s the indexer Lambda fires and writes bloom + reverse-index artifacts under tenx/ keys in the index bucket:

aws s3 ls s3://my-raw-log-bucket/tenx/ --recursive

Check CloudWatch Logs for the indexer Lambda (/aws/lambda/my-retriever-indexer):

INFO  IndexFilterStats - index complete. Bytes: 21, filters.size: 1, ...
INFO  ExecutionPipeline - execution of pipeline.yaml (myObjectStorageIndex) completed in: 7501ms

Step 5: Verify Querying

Submit a query against the Function URL. Requests are signed with SigV4 when query_url_auth = "AWS_IAM" (default):

URL=$(terraform output -raw query_function_url)

awscurl --service lambda --region us-east-1 -X POST \
  "$URL/retriever/query" \
  -H "Content-Type: application/json" \
  -d '{"from": "now(\"-5m\")", "to": "now()", "search": "severity_level == \"ERROR\"", "name": "my_query"}'

The query Lambda writes _DONE.json on completion:

aws s3 ls s3://my-raw-log-bucket/tenx/my-raw-log-bucket/qr/ --recursive | grep _DONE.json

Matched events are visible in the stream Lambda's CloudWatch log group (/aws/lambda/my-retriever-stream). Fluent Bit is not used in the Lambda deployment — the pipeline emits directly from the stream worker.

Step 6: Front with API Gateway (Optional)

The built-in Function URL is the fastest path to a public query endpoint. If you need custom auth, routing, or rate limiting, disable it and front with API Gateway:

module "retriever" {
  # ... other inputs
  enable_query_url = false
}

resource "aws_apigatewayv2_api" "query" {
  name          = "tenx-retriever-query"
  protocol_type = "HTTP"
}

resource "aws_apigatewayv2_integration" "query" {
  api_id             = aws_apigatewayv2_api.query.id
  integration_type   = "AWS_PROXY"
  integration_uri    = module.retriever.lambda_function_arns["query"]
  payload_format_version = "2.0"
}

Step 7: Monitor Operations

Useful CloudWatch alarms:

Metric	Source	Threshold	Meaning
`ApproximateAgeOfOldestMessage`	SQS `*-index-queue`	> 300 s	Indexing is falling behind — concurrency limit?
`ApproximateNumberOfMessages`	SQS `*-index-queue-dlq`	> 0	Failed indexing messages — inspect DLQ body
`Errors`	Lambda `*-query`	> 0	Query handler crashes
`Duration` p95	Lambda `*-query`	> 8 s	Cold starts dominating — consider provisioned concurrency

Inspect a DLQ message:

aws sqs receive-message \
  --queue-url $(terraform output -raw dlq_urls | jq -r '."index"')

Step 8: Teardown

terraform destroy -var="tenx_api_key=YOUR_API_KEY"

The module removes the Lambdas, queues, DLQs, IAM role, and S3 event notification. S3 buckets and the ECR image are left alone (you own them). To remove indexed artifacts:

aws s3 rm s3://my-raw-log-bucket/tenx/ --recursive

Components

What terraform apply creates. Tabs describe each component and what it's for.

Compute Queues IAM Triggers HTTP entry

Four Lambda functions run from a single container image. The ROLE environment variable tells each one which role to play.

indexer runs once per log-file upload. It builds the Bloom filter and reverse-index, then writes both back to S3.
query is the public entry point. It receives a query, picks the sub-windows to scan, and fans them out.
subquery scans one time-slice of the Bloom index. It finds matching byte-ranges and fans those out to stream.
stream fetches matched byte-ranges from S3, decodes the events, and emits them to the SIEM.

Three SQS queues (index, subquery, stream) buffer work between roles. Each has a dead-letter queue. When a consumer Lambda fails max_receive_count times on a message, the message moves to the DLQ instead of being silently dropped. DLQs retain messages for 14 days so you can inspect failures.

One role shared by all four Lambdas. It grants S3 read and write on the source and index buckets, SQS send on the three queues, and CloudWatch Logs access for diagnostics.

The role also includes s3:PutObjectTagging. The engine tags its bloom-filter writes with S3 object tags. Without this permission, tagged PUTs silently return HTTP 400 and bloom filters never land.

The source bucket's ObjectCreated:* notification sends messages to the index queue, and the index queue invokes the indexer. The notification is scoped by source_prefix and source_suffix so the indexer's own writes under tenx/ don't re-trigger it. A terraform plan precondition refuses any configuration that would create a recursive loop.

Three Lambda event-source mappings connect each SQS queue to its consumer.

A Lambda Function URL on the query Lambda exposes POST /retriever/query. It is cheaper than API Gateway and simpler to wire up. Callers must sign requests with SigV4 (AWS_IAM) by default. Set enable_query_url = false to front the query Lambda with API Gateway instead, or query_url_auth = "NONE" to make the endpoint publicly invocable for demos.

Performance

Measured against retriever v1.0 (engine merge a32bd0a3) on AWS us-east-1, Lambda x86_64 at 6144 MB, April 2026. Corpus: otel-sample JSON-lines files, ~21 MB each. Queries target a single file's byte-range with a simple severity filter. Figures are wall-clock from the caller's perspective — HTTP POST to last event in SIEM for queries, S3 PUT to bloom-filter-written for indexing.

Query latency Indexing latency Sizing projection

Condition	p50	p95
Warm	1.2 s	1.4 s
Cold	6.7 s	10 s

Enable Provisioned Concurrency on query and stream to eliminate cold starts. 3–5 warm instances per Lambda at ~$15/month each cover ~1 query/min.

File size	p50	p95	Throughput
21 MB	15.4 s	18.3 s	1.4 MB/s

Fixed ~5 s bootstrap + linear work phase.

File size	Expected p50	Dominant cost
1 MB	5–7 s	Bootstrap
21 MB	15 s	Measured
100 MB	60–80 s	Work phase

For files under 5 MB, batch producer-side before S3 PUT — one 20 MB file indexes ~3× faster than twenty 1 MB files serially.

Cost

Unit rates (Lambda at 6144 MB, us-east-1). Multiply by your volume.

Line item	Rate
Indexer compute	~$0.074 per GB of logs indexed
Query chain compute (warm)	~$0.0005 per query
S3 storage	$0.023 per GB-month (Standard)
SQS messages	$0.40 per million (one per indexed file)

Scales roughly linearly by daily ingest. AWS Lambda default account concurrency (1000) covers up to ~100 TB/day.

For mode-vs-mode and market comparisons, see the cost table on the picker page.

Tunables

Variable	Default	Effect
`memory_size`	6144	CPU scales linearly with memory. 6144 MB is measured optimal. 10240 MB plateaus. Lower memory is dramatically slower.
`pipeline_shutdown_grace_ms`	250	How long the engine waits for sequencer queues to drain on pipeline close. The engine default of 5000 adds a flat 5 s to every warm Lambda invocation, because sequencer queues are already empty at close time in a single-shot invocation. 250 ms bounds the wait safely. Override upward only if you observe dropped events on a long-running workload (EKS).
`indexer_batch_size`	1	SQS batch size for the indexer. A batch of 1 is safest (ordered, no redelivery). Increase it to trade latency for throughput under backlog.
`enable_query_url`	true	Whether to create the Lambda Function URL exposing `POST /retriever/query`. Set to false if you front the query Lambda with API Gateway instead.
`query_url_auth`	`AWS_IAM`	Function URL auth mode. `AWS_IAM` requires SigV4-signed requests. `NONE` makes the URL publicly invocable (use only for demos).

See the module README for the full input and output reference.

Next: FAQ