Rate
Prevent log analytics over-billing while ensuring critical events always reach your analysis tools.
The rate regulator uses automatic message enrichment combined with byte-based cost calculation to apply sampling based on actual ingestion costs per logical event type.
By tracking costs (event byte size × vendor ingestion rate), the regulator provides business-aligned budget enforcement that accounts for variable event sizes (i.e., 10KB error log is correctly weighted against a 100-byte debug message).
This approach enables more precise control than regex-based rules which require manual configuration and lack logical event type, severity, and cost awareness.
Control Strategies
The rate regulator supports both Local and global controls which differ in how they track and apply cost-based budgets for event sampling.
Forwarders maintain independent cost counters without cross-node communication, tracking spend per event type (symbolMessage) based on byte volume and configured ingestion costs. Value lies in simple per-service setup, quick processing without network delays, and fault isolation that contains issues to single nodes.
Trade-offs include decisions limited to local data (risking cluster-wide budget overruns), no visibility into cross-service patterns.
Example: A single application forwarder tracks its own $0.01/min budget, throttling high-cost debug logs based solely on its own traffic, ignoring cluster-wide spending patterns.
Forwarders share cost data via lookup tables (synchronized through centralized storage like GitHub) for unified budget decisions. Value comes from enforcing organization-wide budgets, detecting cross-service cost patterns, and applying consistent policies everywhere based on cluster-wide spend.
Trade-offs involve requiring cluster communication, added setup complexity for coordination, and minor decision latency.
Example: Cluster aggregates spend data showing "api_trace" events cost $0.08/min across all nodes. Global lookup shares this intelligence, allowing all forwarders to throttle this pattern proportionally to stay under budget.
Multi-App Regulation
For central forwarders handling logs from multiple applications (common in Kubernetes), the rate regulator prevents individual apps from bypassing budget caps by scaling pods. Use the k8s container name field to aggregate spend per app across all replicas. Two approaches available:
Option A: Cap Total App Spend (All Event Types)
Prevents any single app from dominating the budget regardless of how many event types it emits.
rateRegulatorFieldNames: [container] # App only
rateRegulatorMaxSharePerFieldSet: 0.2
rateRegulatorBudgetPerHour: 1.50
Result:
- Frontend app (all events, 5 pods): Cannot exceed 20% of total budget ($0.30/hour)
- Backend app (all events, 2 pods): Cannot exceed 20% ($0.30/hour)
- Payment app (all events, 1 pod): Cannot exceed 20% ($0.30/hour)
Trade-off: Loses event-type intelligence—can't prioritize ERROR over DEBUG within an app.
Option B: Cap Per Event Type Per App
Enforces fairness within each app—prevents a single noisy event type from dominating that app's spend.
rateRegulatorFieldNames: [symbolMessage, container] # Event type + app
rateRegulatorMaxSharePerFieldSet: 0.2
rateRegulatorBudgetPerHour: 1.50
Result:
- "heartbeat_debug|frontend" (5 pods): Cannot exceed 20% of total budget
- "error_login|frontend" (5 pods): Separate 20% cap
- "timeout|payment-service" (1 pod): Separate 20% cap
Trade-off: Each (event type × app) combo gets its own 20% cap—apps with many event types could theoretically exceed 20% total (though unlikely in practice).
Key Insight: Use container (not pod) for aggregation—container name is stable across replicas, while pod names are unique per instance. Scaling from 1→10 pods doesn't bypass limits.
Workflow
The rate regulator executes the following steps:
graph LR
A["<div style='font-size: 14px;'>📥 Input</div><div style='font-size: 10px; text-align: center;'>TenXObject</div>"] --> B["<div style='font-size: 14px;'>💰 Cost Calc</div><div style='font-size: 10px; text-align: center;'>Bytes × $/GB</div>"]
B --> C{"<div style='font-size: 14px;'>🗂️ Global</div><div style='font-size: 10px; text-align: center;'>Has Lookup?</div>"}
C -->|Yes| D["<div style='font-size: 14px;'>🌐 Global</div><div style='font-size: 10px; text-align: center;'>Cluster Spend</div>"]
C -->|No| E["<div style='font-size: 14px;'>📈 Local</div><div style='font-size: 10px; text-align: center;'>Node Spend</div>"]
D --> F["<div style='font-size: 14px;'>⚖️ Budget Check</div><div style='font-size: 10px; text-align: center;'>vs Target Rate</div>"]
E --> F
F --> G["<div style='font-size: 14px;'>📊 Event Share</div><div style='font-size: 10px; text-align: center;'>vs Max %</div>"]
G --> H["<div style='font-size: 14px;'>🎯 Boost</div><div style='font-size: 10px; text-align: center;'>by Severity</div>"]
H --> I{"<div style='font-size: 14px;'>🎲 Sample</div><div style='font-size: 10px; text-align: center;'>Decision</div>"}
I -->|Drop| J["<div style='font-size: 14px;'>🗑️ Drop</div><div style='font-size: 10px; text-align: center;'>TenXObject</div>"]
I -->|Keep| K["<div style='font-size: 14px;'>✅ Retain</div><div style='font-size: 10px; text-align: center;'>TenXObject</div>"]
classDef input fill:#3b82f688,stroke:#2563eb,color:#ffffff,stroke-width:2px,rx:8,ry:8
classDef decision fill:#eab30888,stroke:#d97706,color:#ffffff,stroke-width:2px,rx:8,ry:8
classDef process fill:#059669,stroke:#047857,color:#ffffff,stroke-width:2px,rx:8,ry:8
classDef rate fill:#7c3aed88,stroke:#6d28d9,color:#ffffff,stroke-width:2px,rx:8,ry:8
classDef retain fill:#16a34a,stroke:#15803d,color:#ffffff,stroke-width:2px,rx:8,ry:8
classDef drop fill:#dc2626,stroke:#b91c1c,color:#ffffff,stroke-width:2px,rx:8,ry:8
class A input
class B,G decision
class C,D,F process
class E rate
class I retain
class H drop
Local Mode (Without Lookup): Per-Node Filtering
Scenario: A Kubernetes node running 3 pods with config:
- Budget: $1.50/hour ($0.025/min)
- Max share per event type: 20%
- Ingestion cost: $1.50/GB (Splunk)
- 5-minute tracking window
Step-by-step for a Kubernetes pod error event (ERROR level, 1.8KB):
-
📥 Event Arrives: Pod emits a CrashLoopBackOff error with full Kubernetes metadata (see raw JSON, 1835 bytes)
-
💰 Cost Calculated:
1835 bytes / 1GB × $1.50 = $0.0000028per event -
📊 Field Set Identified:
symbolMessage = "Error_syncing_pod"(extracted by message enrichment) → counter key:Error_syncing_pod -
📈 Track Spend (Local):
- Current 5-min window spend:
Error_syncing_pod= $0.06, total = $0.10 - After increment:
Error_syncing_pod= $0.0600028, total = $0.1000028 - Normalize to per-minute:
Error_syncing_pod= $0.012/min, total = $0.020/min
- ⚖️ Budget Check: Is total over budget?
- Total spend rate: $0.020/min vs. target $0.025/min → Under budget →
globalScale = 1.0(no throttling)
- 📊 Event Share Check: Is "Error_syncing_pod" dominating?
- Share: $0.06 / $0.10 = 60% vs. max 20% → Over share limit
- Scale down:
fieldSetRate = 0.2 / 0.6 = 0.33(retain 33% of these events)
- 🎯 Severity Boost: ERROR level boost = 2.0
baseRate = 1.0 × 0.33 = 0.33finalRate = 0.33 × 2.0 = 0.66→ 66% retention (boost helps but doesn't fully override)
- 🎲 Sample Decision:
random(0-1) = 0.3 < 0.66→ ✅ Event Kept
Result: "Error_syncing_pod" is heavily over the 20% share (at 60%), so it gets throttled to 33% base rate. The ERROR severity boost (2.0×) increases retention to 66%, meaning 2/3 of these ERROR events are kept. If it were DEBUG (boost=0.5), final rate would be 0.165 → 83% chance of being dropped.
Global Mode (With Lookup): Cluster-Wide Filtering
Scenario: Same app now across 10 nodes (30 pods total) with cluster-wide policy enforcement. The policy module generates a lookup from Prometheus with 6-hour average spend:
Lookup file contents:
field_set,cost_per_hour
Error_syncing_pod,0.60
Info_user_action,0.50
Debug_heartbeat,0.40
_global_cost_total,2.00
Step-by-step for same pod error event on one node:
-
📥 Event Arrives: Same 1.8KB ERROR event (raw JSON)
-
💰 Cost Calculated:
1835 bytes / 1GB × $1.50 = $0.0000028 -
📊 Field Set Identified:
Error_syncing_pod -
🌐 Lookup Check:
- Lookup file modified 2 minutes ago (within 5-min retention) → Fresh
- Fetch cluster-wide data:
Error_syncing_pod= $0.60/hour, total = $2.00/hour
- 📈 Track Spend (Global): Use cluster-wide spend from lookup
- Cluster spend rate:
Error_syncing_pod= $0.60/hour = $0.01/min, total = $2.00/hour = $0.0333/min - (Local counter still increments for fallback, but decision uses global data)
- ⚖️ Budget Check (Cluster-Wide): Is cluster over budget?
- Total cluster spend: $0.0333/min vs. per-node target $0.025/min → Over budget (33% above)
- Scale down globally:
globalScale = 0.025 / 0.0333 = 0.75(retain 75% cluster-wide)
- 📊 Event Share Check (Cluster-Wide): Is "Error_syncing_pod" dominating cluster?
- Cluster share: $0.60 / $2.00 = 30% vs. max 20% → Over share
- Scale down:
fieldSetRate = 0.2 / 0.3 = 0.67(retain 67% of "Error_syncing_pod" events)
- 🎯 Severity Boost: ERROR level boost = 2.0
baseRate = 0.75 × 0.67 = 0.50finalRate = 0.50 × 2.0 = 1.0(clamped to max 1.0) → 100% retention
- 🎲 Sample Decision:
random(0-1) = 0.7 < 1.0→ ✅ Event Kept
Result: Even though the cluster is over budget (133% of target) and "Error_syncing_pod" exceeds 20% share (at 30%), the ERROR boost ensures full retention. All 10 nodes see the same cluster-wide spend data and make consistent decisions. If this were a DEBUG "Debug_heartbeat" event (boost=0.5), final rate would be 0.50 × 0.5 = 0.25 → 75% dropped cluster-wide.
Key Difference: Local mode only sees this node's $0.10 total spend (under budget), while global mode sees cluster's $2.00/hour (over budget), enabling coordinated throttling across all nodes.
Configuration
To configure the Rate regulator module, Edit these settings.
Below is the default configuration from: rate/config.yaml.
{
  "type" : "object",
  "properties" : {
    "include" : {
      "type" : "string"
    },
    "tenx" : {
      "type" : "string"
    },
    "rateRegulator" : {
      "type" : "object",
      "additionalProperties" : false,
      "properties" : {
        "fieldNames" : {
          "type" : [
            "array",
            "null"
          ],
          "markdownDescription" : "List of TenXObject fields to identify rate counter buckets\n\nDefines the list of TenXObject field names extracted to identify which rate counter bucket an event belongs to. The list usually contains the `symbolMessage` field from the [message enrichment](https://doc.log10x.com/run/initialize/message/) module but can include additional fields like [GeoIP](https://doc.log10x.com/run/initialize/geoIP/), [HTTP code](https://doc.log10x.com/run/initialize/httpCode/), [k8s container name](https://doc.log10x.com/run/initialize/k8s/), or custom enrichments for multi-dimensional rate tracking.  **Common Use Cases:**  **Single-app regulation (per event type):** ```yaml rateRegulatorFieldNames:   - symbolMessage ```  **Multi-dimensional tracking (event type + geography + HTTP status):** ```yaml rateRegulatorFieldNames:   - symbolMessage   - country   - httpCode ```  **Multi-app regulation in Kubernetes:**  *Option A: Cap total spend per app (all event types combined):* ```yaml rateRegulatorFieldNames:   - container  # Aggregates all event types for each app ``` Each app's total spend (across all event types and pods) gets one cap. Simple but loses event-type intelligence.  *Option B: Cap spend per event type per app:* ```yaml rateRegulatorFieldNames:   - symbolMessage  # Event type   - container      # App identifier (same across all pods) ``` Each (event type × app) combo gets its own cap. Provides fairness within apps but allows apps with many event types to potentially exceed one total cap.  Use `container` (not `pod`) for aggregation—`container` name is stable across replicas while `pod` names are unique per instance. (Default: [\"symbolMessage\"])",
          "items" : {
            "type" : "string"
          },
          "default" : [
            "symbolMessage"
          ]
        },
        "resetIntervalMs" : {
          "type" : [
            "number",
            "string"
          ],
          "markdownDescription" : "Reset interval for rate counters in milliseconds\n\nDefines the interval in milliseconds after which to reset rate counters. Controls how frequently the regulator resets its tracking counters.  Default of 5 minutes (300000ms) provides stable cost averages and smooths out bursts while remaining responsive. Shorter windows (e.g., 1 minute) are more reactive but noisier; longer windows (e.g., 10 minutes) are smoother but slower to adapt.  **Trade-offs:** - **1 minute**: Very responsive for bursts, but unstable averages and over-reactive throttling - **5 minutes**: Balanced—stable averages, catches sustained patterns, aligns with hourly budgets (1/12 hour) - **10+ minutes**: Very stable for long-term trends, but slower to adapt and less suitable for short-lived nodes  **Validation:** Must be at least 60000 milliseconds (1 minute). (Accepts number or string with $= prefix for runtime evaluation) (Default: 300000)",
          "default" : 300000
        },
        "minRetentionThreshold" : {
          "type" : [
            "number",
            "string"
          ],
          "markdownDescription" : "Minimum retention threshold for events when budget is exceeded\n\nDefines the minimum retention rate (0.0 to 1.0) applied to events even when budget is exceeded. Ensures some events are always retained even for very high-spend patterns, preventing complete data loss.  This value ensures a floor on retention, preventing complete data loss even when budget is exceeded. The severity level boost (from `rateRegulatorLevelBoost`) is multiplied with this value to determine the actual minimum retention threshold for each severity level.  **How boost works:** - Boost only affects the minimum retention threshold, not the calculated threshold based on budget - When under budget: retention threshold is based on budget utilization (boost has no effect) - When over budget: retention threshold is clamped to `minRetentionThreshold * boost`  **Examples:** - `minRetentionThreshold: 0.1` with `boost: 1.0` (INFO) → minimum 10% retention when over budget - `minRetentionThreshold: 0.1` with `boost: 2.0` (ERROR) → minimum 20% retention when over budget - `minRetentionThreshold: 0.1` with `boost: 0.25` (DEBUG) → minimum 2.5% retention when over budget  **Important:** Boost values < 1.0 reduce minimum retention for low-priority events. This prevents them from consuming budget when over limit, while still ensuring some events are retained.  **Trade-offs:** - **0.01 (1%)**: Very aggressive throttling, minimal retention when over budget. Use only if budget is critical. - **0.1 (10%)**: Balanced default. Ensures observability even during budget overruns while still enforcing cost control. - **0.25 (25%)**: Conservative. Prioritizes data retention over strict budget enforcement.  **Validation:** Must be greater than 0.01. (Accepts number or string with $= prefix for runtime evaluation) (Default: 0)",
          "default" : 0
        },
        "levelBoost" : {
          "type" : [
            "array",
            "null"
          ],
          "markdownDescription" : "Severity level boost mapping for retention rates\n\nDefines a map of [severity levels](https://doc.log10x.com/run/initialize/level/) to boost multipliers for minimum retention thresholds. Higher severity events can be given higher minimum retention rates through boost values.  The boost multiplier is applied only to `rateRegulatorMinRetentionThreshold`, not to the entire retention threshold. This ensures critical events (ERROR, FATAL) have higher minimum retention floors when budget is exceeded, while preventing boost values < 1.0 from reducing retention when under budget.  **How it works:** - The regulator calculates a retention threshold based on budget utilization - The threshold is clamped to at least `rateRegulatorMinRetentionThreshold * boost` - Boost only affects the minimum floor, not the calculated threshold - Higher boost values result in higher minimum retention for events of that severity when over budget - Lower boost values (< 1.0) reduce minimum retention for low-priority events (e.g., DEBUG, TRACE)  For example:  ``` yaml levelBoost:   - TRACE=0.25   - DEBUG=0.5   - INFO=1   - WARN=1.5   - ERROR=2   - FATAL=3 ```",
          "items" : {
            "type" : "string"
          }
        },
        "lookup" : {
          "type" : "object",
          "additionalProperties" : false,
          "properties" : {
            "file" : {
              "type" : [
                "string",
                "null"
              ],
              "markdownDescription" : "Lookup file containing global event type rates\n\nDefines the path to a lookup file containing global event type frequency data. Used to make sampling decisions based on [cluster-wide](https://doc.log10x.com/run/regulate/rate/#control-strategies) event patterns. (Default: )",
              "default" : ""
            },
            "retain" : {
              "type" : [
                "number",
                "string"
              ],
              "markdownDescription" : "Retention period for the lookup file containing global event type rates\n\nDefines the retention period for the lookup file containing global event type frequency data. If the file's last modified time is older than this period, the lookup is considered stale, and local counter rates are used. Used to make sampling decisions based on cluster-wide event patterns.  **Validation:** Must be greater than 60000 milliseconds. (Accepts number or string with $= prefix for runtime evaluation) (Default: 300000)",
              "default" : 300000
            }
          }
        },
        "budgetPerHour" : {
          "type" : [
            "number",
            "string"
          ],
          "markdownDescription" : "Target spending budget per hour in USD\n\nDefines the soft target budget per hour for total cost across all event types on this node. Actual spend may exceed this by 10-20% due to soft enforcement.  Hourly budgets align naturally with cloud infrastructure costs and work for both short-lived and long-lived nodes.  **Examples:** - $1.50/hour → reasonable for a production log forwarder (~$36/day, ~$1080/month if running 24/7) - $0.10/hour → conservative for dev/test environments - $0.02/hour → minimal for low-volume services (Accepts number or string with $= prefix for runtime evaluation) (Default: 1)",
          "default" : 1
        },
        "maxSharePerFieldSet" : {
          "type" : [
            "number",
            "string"
          ],
          "markdownDescription" : "Maximum % of total spend any single field set can consume\n\nDefines the maximum share (0.0 to 1.0) of the total budget that any single unique field set can use. A \"field set\" is a unique combination of the field values specified in `rateRegulatorFieldNames`.  Enforced independently of whether the total budget is exceeded—prevents noisy field sets from dominating even when under budget.  In global mode, this is enforced cluster-wide based on aggregate spend data from the lookup file.  **Example with `rateRegulatorFieldNames: [symbolMessage]`:** - 0.2 means no single event type (e.g., \"heartbeat_debug\") can use more than 20% of total spend - If \"heartbeat_debug\" is costing 35% of total, it gets throttled to 20% regardless of whether you're over budget  **Example with `rateRegulatorFieldNames: [container]` (Kubernetes: per-app total):** - Each app (e.g., \"frontend\", \"backend\", \"payment-service\") is tracked separately - 0.2 means no single app can exceed 20% of total spend across ALL its event types - Aggregates across all pods and all event types for each app - If \"frontend\" app (all events, 5 pods) costs 30%, it gets throttled to 20%  **Example with `rateRegulatorFieldNames: [symbolMessage, container]` (Kubernetes: per event type per app):** - Each unique combination (e.g., \"error_login|frontend\", \"heartbeat_debug|backend\") is tracked separately - 0.2 means no single (event type × app) can exceed 20% of total spend - If \"frontend\" app's \"heartbeat_debug\" events cost 30% across its 5 pods, they get throttled to 20% - Note: \"frontend\" could have multiple event types, each with their own 20% cap (Accepts number or string with $= prefix for runtime evaluation) (Default: 0)",
          "default" : 0
        },
        "ingestionCostPerGB" : {
          "type" : [
            "number",
            "string"
          ],
          "markdownDescription" : "Vendor ingestion cost per GB in USD\n\nDefines the cost per GB charged by your observability vendor for log ingestion. Used to calculate per-event costs (event byte size × cost per GB) for budget enforcement.  **Important:** If using global mode with a [policy module](https://doc.log10x.com/run/regulate/policy/) lookup, this value should match `policyIngestionCostPerGB` for consistency.  **Common vendor pricing (2025):** - **Splunk Cloud**: ~$1.50/GB (varies by contract, SKU) - **Datadog Logs**: ~$0.10-$0.25/GB (depends on tier: standard, flex, online archives) - **Elastic Cloud**: ~$0.109/GB (standard logging tier) - **New Relic**: ~$0.30/GB (Data Plus) - **Sumo Logic**: ~$1.50/GB (depends on plan) - **AWS CloudWatch Logs**: ~$0.50/GB ingestion + $0.03/GB storage  **Example:** A 10KB error log at $1.50/GB costs ~$0.000015. Over 1 million such events per hour, that's $15/hour ($360/day). The regulator tracks this spend per field set and enforces your `rateRegulatorBudgetPerHour` by probabilistically sampling events when over budget. (Accepts number or string with $= prefix for runtime evaluation) (Default: 1)",
          "default" : 1
        }
      }
    }
  },
  "additionalProperties" : false
}
# 🔟❎ 'run' rate regulator configuration
# Rate regulators utilize cost-based sampling to filter noisy telemetry from event outputs (e.g., Splunk)
# Enforces spending limits by tracking byte volume and ingestion costs per event type
# To learn more see https://doc.log10x.com/run/regulate/rate/
# Set the 10x pipeline to 'run'
tenx: run
# =============================== Dependencies ================================
include: run/modules/regulate/rate
# ============================== Rate Options =================================
rateRegulator:
# 'fieldNames' specifies the list of TenXObject fields to identify rate counter buckets
# The list usually contains the symbolMessage field from message enrichment
# Can include additional fields like container (k8s app), country (GeoIP), httpCode, etc.
fieldNames:
- $=yield TenXEnv.get("symbolMessageField") # Matches lookup keys
# 'resetIntervalMs' specifies reset interval for rate counters in milliseconds
# Default of 5 minutes provides stable cost averages while remaining responsive
# Trade-offs: 1min (reactive/noisy) vs 5min (balanced) vs 10min+ (stable/slow)
resetIntervalMs: $=parseDuration("5m")
# 'minRetentionThreshold' specifies minimum retantion threshold for high-spend events (0.0 to 1.0)
# Ensures some events are always retained even when budget is exceeded
# Example: 0.1 = minimum 10% retention even for very high-spend patterns
minRetentionThreshold: 0.1
# 'levelBoost' specifies severity level boost mapping for sampling rates
# Higher severity events can be given higher retention rates through boost values
# Example: ERROR=2.0 means ERROR events are twice as likely to be retained
levelBoost:
- TRACE=0.25
- DEBUG=0.5
- INFO=1
- WARN=1.5
- ERROR=2
- FATAL=3
# ----------------------------- Budget Options ------------------------------
# 'budgetPerHour' specifies target spending budget per hour in USD
# Soft target for total cost across all event types on this node
# Actual spend may exceed by 10-20% due to soft enforcement
# Examples: $1.50/hour (~$36/day), $0.10/hour (dev/test), $0.02/hour (minimal)
budgetPerHour: 1.50
# 'maxSharePerFieldSet' specifies maximum % of total spend any single field set can consume
# Enforced independently of budget—prevents noisy field sets from dominating even when under budget
# Example: 0.2 means no single event type can exceed 20% of total spend
# In global mode, enforced cluster-wide based on aggregate spend data from lookup
maxSharePerFieldSet: 0.2
# 'ingestionCostPerGB' specifies vendor ingestion cost per GB in USD
# Used to calculate per-event costs based on byte size
# Common values: Splunk $1.50/GB, Datadog $0.10-$0.25/GB, Elastic $0.109/GB
# Should match policyIngestionCostPerGB if using global mode with policy module
ingestionCostPerGB: 1.5
# -------------------------- Global Lookup Options --------------------------
# Enables usage of a lookup file containing cluster-wide spend data for global budget enforcement
# To learn more see https://doc.log10x.com/run/regulate/rate/#global-control
#
# Generating the lookup file is done using the policy module
# To learn more see https://doc.log10x.com/run/regulate/policy
#
# Periodically pulling the lookup file to keep it fresh is done via the gitops configuration
# To learn more see https://doc.log10x.com/config/github/#config
#
lookup:
# 'file' specifies the lookup file path. Will reload on change.
# Contains cost-per-hour data per field set, generated by policy module
# Comment out to use local mode (per-node tracking only)
# file: $=path("data/sample/policy") + "/policy.csv"
# 'retain' specifies the period before the file is marked as stale
# If stale, regulator falls back to local counter rates
retain: $=parseDuration("10m")
Options
Specify the options below to configure the Rate regulator:
| Name | Description |
|---|---|
| rateRegulatorFieldNames | List of TenXObject fields to identify rate counter buckets |
| rateRegulatorResetIntervalMs | Reset interval for rate counters in milliseconds |
| rateRegulatorMinRetentionThreshold | Minimum retention threshold for events when budget is exceeded |
| rateRegulatorLevelBoost | Severity level boost mapping for retention rates |
| rateRegulatorLookupFile | Lookup file containing global event type rates |
| rateRegulatorLookupRetain | Retention period for the lookup file containing global event type rates |
| rateRegulatorBudgetPerHour | Target spending budget per hour in USD |
| rateRegulatorMaxSharePerFieldSet | Maximum % of total spend any single field set can consume |
| rateRegulatorIngestionCostPerGB | Vendor ingestion cost per GB in USD |
rateRegulatorFieldNames
List of TenXObject fields to identify rate counter buckets.
| Type | Default |
|---|---|
| List | [symbolMessage] |
Defines the list of TenXObject field names extracted to identify which rate counter bucket an event belongs to.
The list usually contains the symbolMessage field from the message enrichment module
but can include additional fields like GeoIP, HTTP code, k8s container name, or custom enrichments for multi-dimensional rate tracking.
Common Use Cases:
Single-app regulation (per event type):
Multi-dimensional tracking (event type + geography + HTTP status):
Multi-app regulation in Kubernetes:
Option A: Cap total spend per app (all event types combined):
Each app's total spend (across all event types and pods) gets one cap. Simple but loses event-type intelligence.
Option B: Cap spend per event type per app:
rateRegulatorFieldNames:
- symbolMessage # Event type
- container # App identifier (same across all pods)
Each (event type × app) combo gets its own cap. Provides fairness within apps but allows apps with many event types to potentially exceed one total cap.
Use container (not pod) for aggregation—container name is stable across replicas while pod names are unique per instance.
rateRegulatorResetIntervalMs
Reset interval for rate counters in milliseconds.
| Type | Default |
|---|---|
| Number | 300000 |
Defines the interval in milliseconds after which to reset rate counters. Controls how frequently the regulator resets its tracking counters.
Default of 5 minutes (300000ms) provides stable cost averages and smooths out bursts while remaining responsive. Shorter windows (e.g., 1 minute) are more reactive but noisier; longer windows (e.g., 10 minutes) are smoother but slower to adapt.
Trade-offs:
- 1 minute: Very responsive for bursts, but unstable averages and over-reactive throttling
- 5 minutes: Balanced—stable averages, catches sustained patterns, aligns with hourly budgets (1/12 hour)
- 10+ minutes: Very stable for long-term trends, but slower to adapt and less suitable for short-lived nodes
Validation: Must be at least 60000 milliseconds (1 minute).
rateRegulatorMinRetentionThreshold
Minimum retention threshold for events when budget is exceeded.
| Type | Default |
|---|---|
| Number | 0.1 |
Defines the minimum retention rate (0.0 to 1.0) applied to events even when budget is exceeded. Ensures some events are always retained even for very high-spend patterns, preventing complete data loss.
This value ensures a floor on retention, preventing complete data loss even when budget is exceeded.
The severity level boost (from rateRegulatorLevelBoost) is multiplied with this value to determine the actual minimum retention threshold for each severity level.
How boost works:
- Boost only affects the minimum retention threshold, not the calculated threshold based on budget
- When under budget: retention threshold is based on budget utilization (boost has no effect)
- When over budget: retention threshold is clamped to
minRetentionThreshold * boost
Examples:
minRetentionThreshold: 0.1withboost: 1.0(INFO) → minimum 10% retention when over budgetminRetentionThreshold: 0.1withboost: 2.0(ERROR) → minimum 20% retention when over budgetminRetentionThreshold: 0.1withboost: 0.25(DEBUG) → minimum 2.5% retention when over budget
Important: Boost values \< 1.0 reduce minimum retention for low-priority events. This prevents them from consuming budget when over limit, while still ensuring some events are retained.
Trade-offs:
- 0.01 (1%): Very aggressive throttling, minimal retention when over budget. Use only if budget is critical.
- 0.1 (10%): Balanced default. Ensures observability even during budget overruns while still enforcing cost control.
- 0.25 (25%): Conservative. Prioritizes data retention over strict budget enforcement.
Validation: Must be greater than 0.01.
rateRegulatorLevelBoost
Severity level boost mapping for retention rates.
| Type | Default |
|---|---|
| List | [] |
defines a map of severity levels to boost multipliers for minimum retention thresholds. Higher severity events can be given higher minimum retention rates through boost values.
The boost multiplier is applied only to rateRegulatorMinRetentionThreshold, not to the entire retention threshold.
This ensures critical events (ERROR, FATAL) have higher minimum retention floors when budget is exceeded,
while preventing boost values \< 1.0 from reducing retention when under budget.
How it works:
- The regulator calculates a retention threshold based on budget utilization
- The threshold is clamped to at least
rateRegulatorMinRetentionThreshold * boost - Boost only affects the minimum floor, not the calculated threshold
- Higher boost values result in higher minimum retention for events of that severity when over budget
- Lower boost values (\< 1.0) reduce minimum retention for low-priority events (e.g., DEBUG, TRACE)
For example:
rateRegulatorLookupFile
Lookup file containing global event type rates.
| Type | Default |
|---|---|
| String |
Defines the path to a lookup file containing global event type frequency data. Used to make sampling decisions based on cluster-wide event patterns.
rateRegulatorLookupRetain
Retention period for the lookup file containing global event type rates.
| Type | Default |
|---|---|
| Number | 300000 |
Defines the retention period for the lookup file containing global event type frequency data.
If the file's last modified time is older than this period, the lookup is considered stale, and local counter rates are used. Used to make sampling decisions based on cluster-wide event patterns.
Validation: Must be greater than 60000 milliseconds.
rateRegulatorBudgetPerHour
Target spending budget per hour in USD.
| Type | Default |
|---|---|
| Number | 1.0 |
Defines the soft target budget per hour for total cost across all event types on this node. Actual spend may exceed this by 10-20% due to soft enforcement.
Hourly budgets align naturally with cloud infrastructure costs and work for both short-lived and long-lived nodes.
Examples:
- $1.50/hour → reasonable for a production log forwarder (~$36/day, ~$1080/month if running 24/7)
- $0.10/hour → conservative for dev/test environments
- $0.02/hour → minimal for low-volume services.
rateRegulatorMaxSharePerFieldSet
Maximum % of total spend any single field set can consume.
| Type | Default |
|---|---|
| Number | 0.2 |
Defines the maximum share (0.0 to 1.0) of the total budget that any single unique field set can use.
A "field set" is a unique combination of the field values specified in rateRegulatorFieldNames.
Enforced independently of whether the total budget is exceeded—prevents noisy field sets from dominating even when under budget.
In global mode, this is enforced cluster-wide based on aggregate spend data from the lookup file.
Example with rateRegulatorFieldNames: [symbolMessage]:
- 0.2 means no single event type (e.g., "heartbeat_debug") can use more than 20% of total spend
- If "heartbeat_debug" is costing 35% of total, it gets throttled to 20% regardless of whether you're over budget
Example with rateRegulatorFieldNames: [container] (Kubernetes: per-app total):
- Each app (e.g., "frontend", "backend", "payment-service") is tracked separately
- 0.2 means no single app can exceed 20% of total spend across ALL its event types
- Aggregates across all pods and all event types for each app
- If "frontend" app (all events, 5 pods) costs 30%, it gets throttled to 20%
Example with rateRegulatorFieldNames: [symbolMessage, container] (Kubernetes: per event type per app):
- Each unique combination (e.g., "error_login|frontend", "heartbeat_debug|backend") is tracked separately
- 0.2 means no single (event type × app) can exceed 20% of total spend
- If "frontend" app's "heartbeat_debug" events cost 30% across its 5 pods, they get throttled to 20%
- Note: "frontend" could have multiple event types, each with their own 20% cap.
rateRegulatorIngestionCostPerGB
Vendor ingestion cost per GB in USD.
| Type | Default |
|---|---|
| Number | 1.5 |
Defines the cost per GB charged by your observability vendor for log ingestion. Used to calculate per-event costs (event byte size × cost per GB) for budget enforcement.
Important: If using global mode with a policy module lookup, this value should match policyIngestionCostPerGB for consistency.
Common vendor pricing (2025):
- Splunk Cloud: ~$1.50/GB (varies by contract, SKU)
- Datadog Logs: ~$0.10-$0.25/GB (depends on tier: standard, flex, online archives)
- Elastic Cloud: ~$0.109/GB (standard logging tier)
- New Relic: ~$0.30/GB (Data Plus)
- Sumo Logic: ~$1.50/GB (depends on plan)
- AWS CloudWatch Logs: ~$0.50/GB ingestion + $0.03/GB storage
Example: A 10KB error log at $1.50/GB costs ~$0.000015. Over 1 million such events per hour, that's $15/hour ($360/day). The regulator tracks this spend per field set and enforces your rateRegulatorBudgetPerHour by probabilistically sampling events when over budget.
This module is defined in rate/module.yaml.