Rate

Prevent log analytics over-billing while ensuring critical events always reach your analysis tools.

The rate regulator uses automatic message enrichment combined with byte-based cost calculation to apply sampling based on actual ingestion costs per logical event type.

By tracking costs (event byte size × vendor ingestion rate), the regulator provides business-aligned budget enforcement that accounts for variable event sizes (i.e., 10KB error log is correctly weighted against a 100-byte debug message).

This approach enables more precise control than regex-based rules which require manual configuration and lack logical event type, severity, and cost awareness.

Control Strategies

The rate regulator supports both Local and global controls which differ in how they track and apply cost-based budgets for event sampling.

Local Control Global Control

Forwarders maintain independent cost counters without cross-node communication, tracking spend per event type (symbolMessage) based on byte volume and configured ingestion costs. Value lies in simple per-service setup, quick processing without network delays, and fault isolation that contains issues to single nodes.

Trade-offs include decisions limited to local data (risking cluster-wide budget overruns), no visibility into cross-service patterns.

Example: A single application forwarder tracks its own $0.01/min budget, throttling high-cost debug logs based solely on its own traffic, ignoring cluster-wide spending patterns.

Forwarders share cost data via lookup tables (synchronized through centralized storage like GitHub) for unified budget decisions. Value comes from enforcing organization-wide budgets, detecting cross-service cost patterns, and applying consistent policies everywhere based on cluster-wide spend.

Trade-offs involve requiring cluster communication, added setup complexity for coordination, and minor decision latency.

Example: Cluster aggregates spend data showing "api_trace" events cost $0.08/min across all nodes. Global lookup shares this intelligence, allowing all forwarders to throttle this pattern proportionally to stay under budget.

Multi-App Regulation

For central forwarders handling logs from multiple applications (common in Kubernetes), the rate regulator prevents individual apps from bypassing budget caps by scaling pods. Use the k8s container name field to aggregate spend per app across all replicas. Two approaches available:

Option A: Cap Total App Spend (All Event Types)

Prevents any single app from dominating the budget regardless of how many event types it emits.

rateRegulatorFieldNames: [container]  # App only
rateRegulatorMaxSharePerFieldSet: 0.2
rateRegulatorBudgetPerHour: 1.50

Result:

Frontend app (all events, 5 pods): Cannot exceed 20% of total budget ($0.30/hour)
Backend app (all events, 2 pods): Cannot exceed 20% ($0.30/hour)
Payment app (all events, 1 pod): Cannot exceed 20% ($0.30/hour)

Trade-off: Loses event-type intelligence—can't prioritize ERROR over DEBUG within an app.

Option B: Cap Per Event Type Per App

Enforces fairness within each app—prevents a single noisy event type from dominating that app's spend.

rateRegulatorFieldNames: [symbolMessage, container]  # Event type + app
rateRegulatorMaxSharePerFieldSet: 0.2
rateRegulatorBudgetPerHour: 1.50

Result:

"heartbeat_debug|frontend" (5 pods): Cannot exceed 20% of total budget
"error_login|frontend" (5 pods): Separate 20% cap
"timeout|payment-service" (1 pod): Separate 20% cap

Trade-off: Each (event type × app) combo gets its own 20% cap—apps with many event types could theoretically exceed 20% total (though unlikely in practice).

Key Insight: Use container (not pod) for aggregation—container name is stable across replicas, while pod names are unique per instance. Scaling from 1→10 pods doesn't bypass limits.

Workflow

The rate regulator executes the following steps:

graph LR
    A["<div style='font-size: 14px;'>📥 Input</div><div style='font-size: 10px; text-align: center;'>TenXObject</div>"] --> B["<div style='font-size: 14px;'>💰 Cost Calc</div><div style='font-size: 10px; text-align: center;'>Bytes × $/GB</div>"]
    B --> C{"<div style='font-size: 14px;'>🗂️ Global</div><div style='font-size: 10px; text-align: center;'>Has Lookup?</div>"}
    C -->|Yes| D["<div style='font-size: 14px;'>🌐 Global</div><div style='font-size: 10px; text-align: center;'>Cluster Spend</div>"]
    C -->|No| E["<div style='font-size: 14px;'>📈 Local</div><div style='font-size: 10px; text-align: center;'>Node Spend</div>"]
    D --> F["<div style='font-size: 14px;'>⚖️ Budget Check</div><div style='font-size: 10px; text-align: center;'>vs Target Rate</div>"]
    E --> F
    F --> G["<div style='font-size: 14px;'>📊 Event Share</div><div style='font-size: 10px; text-align: center;'>vs Max %</div>"]
    G --> H["<div style='font-size: 14px;'>🎯 Boost</div><div style='font-size: 10px; text-align: center;'>by Severity</div>"]
    H --> I{"<div style='font-size: 14px;'>🎲 Sample</div><div style='font-size: 10px; text-align: center;'>Decision</div>"}
    I -->|Drop| J["<div style='font-size: 14px;'>🗑️ Drop</div><div style='font-size: 10px; text-align: center;'>TenXObject</div>"]
    I -->|Keep| K["<div style='font-size: 14px;'>✅ Retain</div><div style='font-size: 10px; text-align: center;'>TenXObject</div>"]

    classDef input fill:#3b82f688,stroke:#2563eb,color:#ffffff,stroke-width:2px,rx:8,ry:8
    classDef decision fill:#eab30888,stroke:#d97706,color:#ffffff,stroke-width:2px,rx:8,ry:8
    classDef process fill:#059669,stroke:#047857,color:#ffffff,stroke-width:2px,rx:8,ry:8
    classDef rate fill:#7c3aed88,stroke:#6d28d9,color:#ffffff,stroke-width:2px,rx:8,ry:8
    classDef retain fill:#16a34a,stroke:#15803d,color:#ffffff,stroke-width:2px,rx:8,ry:8
    classDef drop fill:#dc2626,stroke:#b91c1c,color:#ffffff,stroke-width:2px,rx:8,ry:8

    class A input
    class B,G decision
    class C,D,F process
    class E rate
    class I retain
    class H drop

Local Mode (Without Lookup): Per-Node Filtering

Scenario: A Kubernetes node running 3 pods with config:

Budget: $1.50/hour ($0.025/min)
Max share per event type: 20%
Ingestion cost: $1.50/GB (Splunk)
5-minute tracking window

Step-by-step for a Kubernetes pod error event (ERROR level, 1.8KB):

📥 Event Arrives: Pod emits a CrashLoopBackOff error with full Kubernetes metadata (see raw JSON, 1835 bytes)
💰 Cost Calculated: 1835 bytes / 1GB × $1.50 = $0.0000028 per event
📊 Field Set Identified: symbolMessage = "Error_syncing_pod" (extracted by message enrichment) → counter key: Error_syncing_pod
📈 Track Spend (Local):

Current 5-min window spend: Error_syncing_pod = $0.06, total = $0.10
After increment: Error_syncing_pod = $0.0600028, total = $0.1000028
Normalize to per-minute: Error_syncing_pod = $0.012/min, total = $0.020/min

⚖️ Budget Check: Is total over budget?

Total spend rate: $0.020/min vs. target $0.025/min → Under budget → globalScale = 1.0 (no throttling)

📊 Event Share Check: Is "Error_syncing_pod" dominating?

Share: $0.06 / $0.10 = 60% vs. max 20% → Over share limit
Scale down: fieldSetRate = 0.2 / 0.6 = 0.33 (retain 33% of these events)

🎯 Severity Boost: ERROR level boost = 2.0

baseRate = 1.0 × 0.33 = 0.33
finalRate = 0.33 × 2.0 = 0.66 → 66% retention (boost helps but doesn't fully override)

🎲 Sample Decision: random(0-1) = 0.3 < 0.66 → ✅ Event Kept

Result: "Error_syncing_pod" is heavily over the 20% share (at 60%), so it gets throttled to 33% base rate. The ERROR severity boost (2.0×) increases retention to 66%, meaning 2/3 of these ERROR events are kept. If it were DEBUG (boost=0.5), final rate would be 0.165 → 83% chance of being dropped.

Global Mode (With Lookup): Cluster-Wide Filtering

Scenario: Same app now across 10 nodes (30 pods total) with cluster-wide policy enforcement. The policy module generates a lookup from Prometheus with 6-hour average spend:

Lookup file contents:

field_set,cost_per_hour
Error_syncing_pod,0.60
Info_user_action,0.50
Debug_heartbeat,0.40
_global_cost_total,2.00

Step-by-step for same pod error event on one node:

📥 Event Arrives: Same 1.8KB ERROR event (raw JSON)
💰 Cost Calculated: 1835 bytes / 1GB × $1.50 = $0.0000028
📊 Field Set Identified: Error_syncing_pod
🌐 Lookup Check:

Lookup file modified 2 minutes ago (within 5-min retention) → Fresh
Fetch cluster-wide data: Error_syncing_pod = $0.60/hour, total = $2.00/hour

📈 Track Spend (Global): Use cluster-wide spend from lookup

Cluster spend rate: Error_syncing_pod = $0.60/hour = $0.01/min, total = $2.00/hour = $0.0333/min
(Local counter still increments for fallback, but decision uses global data)

⚖️ Budget Check (Cluster-Wide): Is cluster over budget?

Total cluster spend: $0.0333/min vs. per-node target $0.025/min → Over budget (33% above)
Scale down globally: globalScale = 0.025 / 0.0333 = 0.75 (retain 75% cluster-wide)

📊 Event Share Check (Cluster-Wide): Is "Error_syncing_pod" dominating cluster?

Cluster share: $0.60 / $2.00 = 30% vs. max 20% → Over share
Scale down: fieldSetRate = 0.2 / 0.3 = 0.67 (retain 67% of "Error_syncing_pod" events)

🎯 Severity Boost: ERROR level boost = 2.0

baseRate = 0.75 × 0.67 = 0.50
finalRate = 0.50 × 2.0 = 1.0 (clamped to max 1.0) → 100% retention

🎲 Sample Decision: random(0-1) = 0.7 < 1.0 → ✅ Event Kept

Result: Even though the cluster is over budget (133% of target) and "Error_syncing_pod" exceeds 20% share (at 30%), the ERROR boost ensures full retention. All 10 nodes see the same cluster-wide spend data and make consistent decisions. If this were a DEBUG "Debug_heartbeat" event (boost=0.5), final rate would be 0.50 × 0.5 = 0.25 → 75% dropped cluster-wide.

Key Difference: Local mode only sees this node's $0.10 total spend (under budget), while global mode sees cluster's $2.00/hour (over budget), enabling coordinated throttling across all nodes.

Configuration

To configure the Rate regulator module, Edit these settings.

Below is the default configuration from: rate/config.yaml.

Edit Online

# 🔟❎ 'run' rate regulator configuration

# Rate regulators utilize cost-based sampling to filter noisy telemetry from event outputs (e.g., Splunk)
# Enforces spending limits by tracking byte volume and ingestion costs per event type
# To learn more see https://doc.log10x.com/run/regulate/rate/

# Set the 10x pipeline to 'run'
tenx: run

# =============================== Dependencies ================================

include: run/modules/regulate/rate

# ============================== Rate Options =================================

rateRegulator:

  # 'fieldNames' specifies the list of TenXObject fields to identify rate counter buckets
  #  The list usually contains the symbolMessage field from message enrichment
  #  Can include additional fields like container (k8s app), country (GeoIP), httpCode, etc.
  fieldNames:
    - $=yield TenXEnv.get("symbolMessageField")  # Matches lookup keys

  # 'resetIntervalMs' specifies reset interval for rate counters in milliseconds
  #  Default of 5 minutes provides stable cost averages while remaining responsive
  #  Trade-offs: 1min (reactive/noisy) vs 5min (balanced) vs 10min+ (stable/slow)
  resetIntervalMs: $=parseDuration("5m")

  # 'minRetentionThreshold' specifies minimum retantion threshold for high-spend events (0.0 to 1.0)
  #  Ensures some events are always retained even when budget is exceeded
  #  Example: 0.1 = minimum 10% retention even for very high-spend patterns
  minRetentionThreshold: 0.1

  # 'levelBoost' specifies severity level boost mapping for sampling rates
  #  Higher severity events can be given higher retention rates through boost values
  #  Example: ERROR=2.0 means ERROR events are twice as likely to be retained
  levelBoost:
    - TRACE=0.25
    - DEBUG=0.5
    - INFO=1
    - WARN=1.5
    - ERROR=2
    - FATAL=3

  # ----------------------------- Budget Options ------------------------------

  # 'budgetPerHour' specifies target spending budget per hour in USD
  #  Soft target for total cost across all event types on this node
  #  Actual spend may exceed by 10-20% due to soft enforcement
  #  Examples: $1.50/hour (~$36/day), $0.10/hour (dev/test), $0.02/hour (minimal)
  budgetPerHour: 1.50

  # 'maxSharePerFieldSet' specifies maximum % of total spend any single field set can consume
  #  Enforced independently of budget—prevents noisy field sets from dominating even when under budget
  #  Example: 0.2 means no single event type can exceed 20% of total spend
  #  In global mode, enforced cluster-wide based on aggregate spend data from lookup
  maxSharePerFieldSet: 0.2

  # 'ingestionCostPerGB' specifies vendor ingestion cost per GB in USD
  #  Used to calculate per-event costs based on byte size
  #  Common values: Splunk $1.50/GB, Datadog $0.10-$0.25/GB, Elastic $0.109/GB
  #  Should match policyIngestionCostPerGB if using global mode with policy module
  ingestionCostPerGB: 1.5

  # -------------------------- Global Lookup Options --------------------------

  # Enables usage of a lookup file containing cluster-wide spend data for global budget enforcement
  # To learn more see https://doc.log10x.com/run/regulate/rate/#global-control
  #
  # Generating the lookup file is done using the policy module
  # To learn more see https://doc.log10x.com/run/regulate/policy
  #
  # Periodically pulling the lookup file to keep it fresh is done via the gitops configuration
  # To learn more see https://doc.log10x.com/config/github/#config
  #
  lookup:

    # 'file' specifies the lookup file path. Will reload on change.
    #  Contains cost-per-hour data per field set, generated by policy module
    #  Comment out to use local mode (per-node tracking only)
    # file: $=path("data/sample/policy") + "/policy.csv"

    # 'retain' specifies the period before the file is marked as stale
    #  If stale, regulator falls back to local counter rates
    retain: $=parseDuration("10m")

Options

Specify the options below to configure the Rate regulator:

Name	Description
rateRegulatorFieldNames	List of TenXObject fields to identify rate counter buckets
rateRegulatorResetIntervalMs	Reset interval for rate counters in milliseconds
rateRegulatorMinRetentionThreshold	Minimum retention threshold for events when budget is exceeded
rateRegulatorLevelBoost	Severity level boost mapping for retention rates
rateRegulatorLookupFile	Lookup file containing global event type rates
rateRegulatorLookupRetain	Retention period for the lookup file containing global event type rates
rateRegulatorBudgetPerHour	Target spending budget per hour in USD
rateRegulatorMaxSharePerFieldSet	Maximum % of total spend any single field set can consume
rateRegulatorIngestionCostPerGB	Vendor ingestion cost per GB in USD

`rateRegulatorFieldNames`

List of TenXObject fields to identify rate counter buckets.

Type	Default
List	[symbolMessage]

Defines the list of TenXObject field names extracted to identify which rate counter bucket an event belongs to. The list usually contains the symbolMessage field from the message enrichment module but can include additional fields like GeoIP, HTTP code, k8s container name, or custom enrichments for multi-dimensional rate tracking.

Common Use Cases:

Single-app regulation (per event type):

rateRegulatorFieldNames:
  - symbolMessage

Multi-dimensional tracking (event type + geography + HTTP status):

rateRegulatorFieldNames:
  - symbolMessage
  - country
  - httpCode

Multi-app regulation in Kubernetes:

Option A: Cap total spend per app (all event types combined):

rateRegulatorFieldNames:
  - container  # Aggregates all event types for each app

Each app's total spend (across all event types and pods) gets one cap. Simple but loses event-type intelligence.

Option B: Cap spend per event type per app:

rateRegulatorFieldNames:
  - symbolMessage  # Event type
  - container      # App identifier (same across all pods)

Each (event type × app) combo gets its own cap. Provides fairness within apps but allows apps with many event types to potentially exceed one total cap.

Use container (not pod) for aggregation—container name is stable across replicas while pod names are unique per instance.

`rateRegulatorResetIntervalMs`

Reset interval for rate counters in milliseconds.

Type	Default
Number	300000

Defines the interval in milliseconds after which to reset rate counters. Controls how frequently the regulator resets its tracking counters.

Default of 5 minutes (300000ms) provides stable cost averages and smooths out bursts while remaining responsive. Shorter windows (e.g., 1 minute) are more reactive but noisier; longer windows (e.g., 10 minutes) are smoother but slower to adapt.

Trade-offs:

1 minute: Very responsive for bursts, but unstable averages and over-reactive throttling
5 minutes: Balanced—stable averages, catches sustained patterns, aligns with hourly budgets (1/12 hour)
10+ minutes: Very stable for long-term trends, but slower to adapt and less suitable for short-lived nodes

Validation: Must be at least 60000 milliseconds (1 minute).

`rateRegulatorMinRetentionThreshold`

Minimum retention threshold for events when budget is exceeded.

Type	Default
Number	0.1

Defines the minimum retention rate (0.0 to 1.0) applied to events even when budget is exceeded. Ensures some events are always retained even for very high-spend patterns, preventing complete data loss.

This value ensures a floor on retention, preventing complete data loss even when budget is exceeded. The severity level boost (from rateRegulatorLevelBoost) is multiplied with this value to determine the actual minimum retention threshold for each severity level.

How boost works:

Boost only affects the minimum retention threshold, not the calculated threshold based on budget
When under budget: retention threshold is based on budget utilization (boost has no effect)
When over budget: retention threshold is clamped to minRetentionThreshold * boost

Examples:

minRetentionThreshold: 0.1 with boost: 1.0 (INFO) → minimum 10% retention when over budget
minRetentionThreshold: 0.1 with boost: 2.0 (ERROR) → minimum 20% retention when over budget
minRetentionThreshold: 0.1 with boost: 0.25 (DEBUG) → minimum 2.5% retention when over budget

Important: Boost values \< 1.0 reduce minimum retention for low-priority events. This prevents them from consuming budget when over limit, while still ensuring some events are retained.

Trade-offs:

0.01 (1%): Very aggressive throttling, minimal retention when over budget. Use only if budget is critical.
0.1 (10%): Balanced default. Ensures observability even during budget overruns while still enforcing cost control.
0.25 (25%): Conservative. Prioritizes data retention over strict budget enforcement.

Validation: Must be greater than 0.01.

`rateRegulatorLevelBoost`

Severity level boost mapping for retention rates.

Type	Default
List	[]

defines a map of severity levels to boost multipliers for minimum retention thresholds. Higher severity events can be given higher minimum retention rates through boost values.

The boost multiplier is applied only to rateRegulatorMinRetentionThreshold, not to the entire retention threshold. This ensures critical events (ERROR, FATAL) have higher minimum retention floors when budget is exceeded, while preventing boost values \< 1.0 from reducing retention when under budget.

How it works:

The regulator calculates a retention threshold based on budget utilization
The threshold is clamped to at least rateRegulatorMinRetentionThreshold * boost
Boost only affects the minimum floor, not the calculated threshold
Higher boost values result in higher minimum retention for events of that severity when over budget
Lower boost values (\< 1.0) reduce minimum retention for low-priority events (e.g., DEBUG, TRACE)

For example:

levelBoost:
  - TRACE=0.25
  - DEBUG=0.5
  - INFO=1
  - WARN=1.5
  - ERROR=2
  - FATAL=3

`rateRegulatorLookupFile`

Lookup file containing global event type rates.

Type	Default
String

Defines the path to a lookup file containing global event type frequency data. Used to make sampling decisions based on cluster-wide event patterns.

`rateRegulatorLookupRetain`

Retention period for the lookup file containing global event type rates.

Type	Default
Number	300000

Defines the retention period for the lookup file containing global event type frequency data.

If the file's last modified time is older than this period, the lookup is considered stale, and local counter rates are used. Used to make sampling decisions based on cluster-wide event patterns.

Validation: Must be greater than 60000 milliseconds.

`rateRegulatorBudgetPerHour`

Target spending budget per hour in USD.

Type	Default
Number	1.0

Defines the soft target budget per hour for total cost across all event types on this node. Actual spend may exceed this by 10-20% due to soft enforcement.

Hourly budgets align naturally with cloud infrastructure costs and work for both short-lived and long-lived nodes.

Examples:

$1.50/hour → reasonable for a production log forwarder (~$36/day, ~$1080/month if running 24/7)
$0.10/hour → conservative for dev/test environments
$0.02/hour → minimal for low-volume services.

`rateRegulatorMaxSharePerFieldSet`

Maximum % of total spend any single field set can consume.

Type	Default
Number	0.2

Defines the maximum share (0.0 to 1.0) of the total budget that any single unique field set can use. A "field set" is a unique combination of the field values specified in rateRegulatorFieldNames.

Enforced independently of whether the total budget is exceeded—prevents noisy field sets from dominating even when under budget.

In global mode, this is enforced cluster-wide based on aggregate spend data from the lookup file.

Example with rateRegulatorFieldNames: [symbolMessage]:

0.2 means no single event type (e.g., "heartbeat_debug") can use more than 20% of total spend
If "heartbeat_debug" is costing 35% of total, it gets throttled to 20% regardless of whether you're over budget

Example with rateRegulatorFieldNames: [container] (Kubernetes: per-app total):

Each app (e.g., "frontend", "backend", "payment-service") is tracked separately
0.2 means no single app can exceed 20% of total spend across ALL its event types
Aggregates across all pods and all event types for each app
If "frontend" app (all events, 5 pods) costs 30%, it gets throttled to 20%

Example with rateRegulatorFieldNames: [symbolMessage, container] (Kubernetes: per event type per app):

Each unique combination (e.g., "error_login|frontend", "heartbeat_debug|backend") is tracked separately
0.2 means no single (event type × app) can exceed 20% of total spend
If "frontend" app's "heartbeat_debug" events cost 30% across its 5 pods, they get throttled to 20%
Note: "frontend" could have multiple event types, each with their own 20% cap.

`rateRegulatorIngestionCostPerGB`

Vendor ingestion cost per GB in USD.

Type	Default
Number	1.5

Defines the cost per GB charged by your observability vendor for log ingestion. Used to calculate per-event costs (event byte size × cost per GB) for budget enforcement.

Important: If using global mode with a policy module lookup, this value should match policyIngestionCostPerGB for consistency.

Common vendor pricing (2025):

Splunk Cloud: ~$1.50/GB (varies by contract, SKU)
Datadog Logs: ~$0.10-$0.25/GB (depends on tier: standard, flex, online archives)
Elastic Cloud: ~$0.109/GB (standard logging tier)
New Relic: ~$0.30/GB (Data Plus)
Sumo Logic: ~$1.50/GB (depends on plan)
AWS CloudWatch Logs: ~$0.50/GB ingestion + $0.03/GB storage

Example: A 10KB error log at $1.50/GB costs ~$0.000015. Over 1 million such events per hour, that's $15/hour ($360/day). The regulator tracks this spend per field set and enforces your rateRegulatorBudgetPerHour by probabilistically sampling events when over budget.

This module is defined in rate/module.yaml.