Skip to content

Input Extractor

Extractors filter, redact and select text to transform into TenXObjects from a stream of input events.

Configured extractors scan an input event for specified JSON fields or regex capture groups values on which to perform specified actions.

An extractor can be applied to an input via its targetInput member or by adding the extractor to a target input's extractors list.

For example, the k8s enrichment module uses JSON extractors to add pod and container context to TenXObjects for filtering and aggregation.

Actions

Extractors can perform the following actions on input events:

Capture

Capture actions control which segments of a log/trace event's text to transform into TenXObjects.

JSON extractors select field values to transform into TenXObjects. Regex extractors select values using defined by the extractorPattern.

For example, to capture the message value of the following simple event:

{
   "event": {
     "origin": "localhost",
     "message": "some event"
   }
}

A JSON extractor can specify the message field, while a Regex extractor can specify a capture group (see example).

Each capture action specifies which instances of the JSON field/Regex groups within an event to transform:

All

Transform all matching JSON field/capture group values in an event into TenXObjects.

First

Transform the first matching JSON field/capture group value in an event into an TenXObject.

Last

Transform the last matching JSON field/capture group value in an event into an TenXObject.

Default

Serves as a sink for selecting events that fail to match the extractorFilter pattern or do not contain JSON field names/regex capture groups specified by extractorActions.

This action captures the entire text of an event to transform into an TenXObject and set its extractorKey value to the current action's name.

Only one default capture action is allowed.

Filter

Filter actions allow for filtering an entire event or redacting some of its values. These actions serve a dual purpose of providing a fast mechanism for filtering out unnecessary events to save on the CPU resources as well as redacting sensitive information (i.e., HIPAA, PII).

Drop

Deletes a matched JSON field and its value(s) from the event entirely. Available only for JSON extractors.

Redact

For JSON extractors resets matching field values:

JSON Field Type Redacted Value
object {}
array []
number 0
string ""
boolean true

For regex extractors deletes all instances of matching capture groups from an event.

Advanced

Advanced actions allow more granular control over how events are captured:

No Transform

Select events without transforming them into typed TenXObjects. This action enables writing raw events that do not require any structure to output. Only one noTransform action is allowed.

Outer Text

Sets a JSON field/capture group as the outer text value for TenXObjects extracted by captureAll, captureFirst, or captureLast actions.

For example, if a JSON extractor action specifies captureFirst:message, a setOuterText:event action can set the fullText value of an TenXObject to the JSON message field's enclosing event object.

In the example event, some event is set as the TenXObject's text value and its fullText value is the entire enclosing JSON object.

The encode function returns a compact representation of an TenXObject's text value enclosed within the outer text region.

If no setOuterText selector is specified, an TenXObject's fullText and text fields return the same value.

If outerText actions are defined but not matched, the extractor applies the default capture action (if defined), otherwise, it drops the event.

Configuration

To configure the Input extractor unit, Edit these settings.

Below is the default configuration from: extract/pattern.yaml.

Edit Online

Edit pattern.yaml Locally

# 🔟❎ 'run' Regex extractor configuration

# Configure input regex pattern extractors.
# To learn more see: https://doc.log10x.com/run/input/extract

# Set the 10x pipeline to 'run'
tenx: run

# ============================= Extractor Options =============================

# multiple extractors can be defined below
extractor:

    # 'name' provides a unique name for this extractor that is referenced
    #  by any inputs to which it is applied 
  - name: patternExtractor

    # 'type' controls the method for parsing input stream text (json or pattern)
    type: pattern

    # 'targetInput' sets a regex pattern to match all inputs to which the extractor is applied.
    #  For example, to apply this extractor to an input named 'datadog', set: 'targetInput: datadog'
    targetInput: myInput

    # 'filter' specifies a regex pattern that must match input events to scan them for capture groups.
    #  For example, the pattern below will filter out syslog events that are not errors
    filter: "error: "

    # 'pattern' sets regex pattern whose capture groups to apply extractor actions below.
    #  To learn more about capture groups, see: https://regexone.com/lesson/capturing_groups
    #  If the pattern does not define capture groups, matches are captured as events to transform into TenXObjects.

    #  The example pattern below uses capture groups to parse a syslog event's 'message' field,
    #  and drop 'password' values. 
    #  For an explanation, see: https://chat.openai.com/share/003ead39-421c-489a-a52f-ee8846028887
    pattern: "/(error|info|debug): (?<message>.+?)(\\s+password=(?<password>\\S+))?(\\s+username=(?<username>\\S+))?$?"

    # 'actions' specifies the actions taken by this extractor on a matching capture group. 
    #  Actions are defined by their type followed by ':' and the regex capture group they target.
    #  The following action types are supported (comment in/out the ones needed for your use case):
    actions: 

      # Capture 

      # 'captureAll' transforms all instances of a matching capture group (i.e., 'message')
      #  within the current event into TenXObjects
      - captureAll:message

      # 'captureFirst' transforms only the first instance of a matching capture group (i.e., 'message')
      #  within the current event into an TenXObject
      - captureFirst:message

      # 'captureLast' transforms only the last instance of a matching capture group (i.e., 'message')
      #  within the current event into an TenXObject      
      - captureLast:message

      # Filter 

      # 'redact' removes all matching capture groups (i.e., 'password') found in the event
      - redact:password

      # Advanced 

      # 'captureDefault' serves as a sink for events that fail to match the 'filter' pattern
      #  or do not contain any of the capture group names specified by 'actions'. These events are
      #  transformed into TenXObjects whose 'extractorKey' field returns this action's name (i.e., 'other')
      - captureDefault:other

      # 'noTransform' is the same as 'captureDefault' (see above), except captured events are not
      #  transformed into TenXObjects and remain as 'plain text' objects.
      #  This is useful when the events need to be routed by the 10x pipeline to a specific
      #  output, but do not require access to structured elements (e.g., the intrinsic 'vars' and 'timestamp' fields).
      - noTransform:other

Below is the default configuration from: extract/json.yaml.

Edit Online

Edit json.yaml Locally

# 🔟❎ 'run' JSON extractor configuration

# Configure input JSON extractors.
# To learn more see: https://doc.log10x.com/run/input/extract

# Set the 10x pipeline to 'run'
tenx: run

# ============================= Extractor Options =============================

# multiple extractors can be defined below
extractor:

    # 'name' provides a unique name for this extractor that is referenced
    #  by any inputs to which it is applied 
  - name: jsonExtractor

    # 'type' controls the method for parsing input stream text (json or pattern)
    type: json

    # 'targetInput' sets a regex pattern to match all inputs to which the extractor is applied.
    #  For example, to apply this extractor to an input named 'datadog', set: 'targetInput: datadog'
    targetInput: myInput

    # 'filter' specifies a regex pattern the current event must match to scan for JSON objects.
    #  For example, the pattern below will filter out events that contain "level":"TRACE"
    filter: ^((?!"level":"TRACE").)*$

    # 'actions' specifies the actions taken by this extractor. 
    #  Define actions using <type> followed by ':' and the JSON field they target.
    #  The following action types are supported (comment in/out the ones needed for your use case):
    actions: 

      # Capture

      # 'captureAll' transforms all instances of a matching JSON field (i.e., 'message')
      #  within the current event into TenXObjects
      - captureAll:message

      # 'captureFirst' transforms only the first instance of a matching JSON field (i.e., 'message')
      #  within the current event into an TenXObject
      - captureFirst:message

      # 'captureLast' transforms only the last instance matching JSON field (i.e., 'message')
      #  within the current event into an TenXObject    
      - captureLast:message

      #  Filter

      # 'redact' resets the value of any matching JSON field (i.e., 'username') found in the event
      #  based on its JSON value type:
      #  - number -> 0
      #  - string -> ""
      #  - boolean -> true
      #  - object -> {}
      #  - array -> [] 
      - redact:username

        # 'drop' removes both field name and value of any matching 
        #  JSON field (i.e., 'password') found in the event. This action applies to JSON extractors only.
      - drop:password

      # Advanced

      # 'captureDefault' serves as a sink for events that fail to match the 'filter' pattern
      #  or do not contain any of the capture group names specified by 'actions'. These events are
      #  transformed into TenXObjects whose 'extractorKey' field returns this action's name (i.e., 'other')
      - captureDefault:other

      # 'noTransform' is the same as 'captureDefault' (see above), except captured events are not
      #  transformed into TenXObjects and remain as 'plain text' objects.
      #  This is useful when the events need to be routed by the 10x pipeline to a specific
      #  output, but do not require access to structured elements (e.g., the intrinsic 'vars' and 'timestamp' fields).
      - noTransform:other

Options

Specify the options below to configure multiple Input extractor:

Name Description Category
extractorName Logical name identifying this extractor General
extractorEnabled A JavaScript expression that must be evaluated as 'truthy' to enable the extractor General
extractorTargetInput Regex pattern identifying all inputs to which this extractor should be applied General
extractorType The method used to extract values from the event. Possible values:[json, pattern] General
extractorGroup Name of group of extractors which will run together General
extractorActions Actions in the form of actionType:name. Possible values:[captureAll, captureFirst, captureLast, captureArrays, captureFirstArray, captureLastArray, captureDefault, captureField, setOuterText, drop, redact, noTransform] General
extractorFilter Regex pattern to match for the extractor to be applied Pattern
extractorPattern For regex extractors, the pattern for capturing named match groups. For JSON, the segment of text within events to scan for objects Pattern
extractorForeach Name of options group for whose instances to create matching extractors Advanced

General

extractorName

Logical name identifying this extractor.

Type Required Category
String General

Provides a logical name (e.g., 'message') for this JSON/regex extractor that target input(s) can reference to apply this extractor to events which the produce.

extractorEnabled

A JavaScript expression that must be evaluated as 'truthy' to enable the extractor.

Type Default Category
Boolean true General

enables/disables this extractor. If set, the JavaScript expression returns a truthy value to enable the extractor. For example, to configure this value to use a startup argument/shell variable, use:

extractor:
   name: myExtractor
   enabled: TenXEnv.get("performExtraction") 
  ... 

extractorTargetInput

Regex pattern identifying all inputs to which this extractor should be applied.

Type Default Category
String "" General

Defines a regex pattern identifying all inputs to which this extractor should be applied.

For example, to apply an extractor to a datadog input, specify:

extractor:
  name: message
  type: json
  targetInput: datadog
  actions:
  - captureAll:message 

This argument enables applying this extractor to input(s) without changing their definition. vs. directly the input referencing the extractor directly via extractorName.

extractorType

The method used to extract values from the event. Possible values:[json, pattern].

Type Required Category
String General

Sets the type of extraction method to select, drop and redact values from a target input stream. Possible values:

  • json: scan events for JSON objects containing fields specified in extractorActions.
  • pattern: scan events for regex named capture groups of the pattern set by extractorPattern.

extractorGroup

Name of group of extractors which will run together.

Type Default Category
String "" General

Defines the name of extractor group this extractor will belong to. extractors of the same group will attempt to run together on events in a single processing pass to improve performance.

extractorActions

Actions in the form of actionType:name. Possible values:[captureAll, captureFirst, captureLast, captureArrays, captureFirstArray, captureLastArray, captureDefault, captureField, setOuterText, drop, redact, noTransform].

Type Default Category
List [] General

Defines a list of actions defined as: 'actionType:name:alias' (e.g., 'captureAll:message:alias') to capture and redact values to transform into TenXObjects.

For JSON extractors, 'name' refers to a field to look for in events and the value to which to apply the 'actionType'. This can be in the form of 'x.y.z' which will match internal fields in the json. For example: 'metadata.id' will match the 'id' field of the 'metadata' object in {"metadata": {"id": "1234"}}

For regex extractors, 'name' refers to a regex pattern match group defined by 'extractorPattern' to which to apply the 'actionType'.

This setting must be specified when extractorType = 'json'. If not specified and extractorType is 'pattern', the 'extractorPattern' pattern scans for regex capture groups, performing on each as'captureAll' action.

The 'alias' part is optional, and if provided is used to reference the object as it's name. This is useful when multiple extractors are used to extract different things which are later used in the same way.

If no capture groups are defined, any matches of the pattern within the input text line are captured.

Pattern

extractorFilter

Regex pattern to match for the extractor to be applied.

Type Default Category
String "" Pattern

Specifies a regex pattern an event must match to scan for JSON fields/regex pattern capture groups. This argument provides a way to rule out events from being transformed into TenXObject. To select events failing to meet this filter, define a captureDefault action.

extractorPattern

For regex extractors, the pattern for capturing named match groups. For JSON, the segment of text within events to scan for objects.

Type Default Category
String "" Pattern

Defines a regex pattern that applies to events read from an input stream.

For extractorType = pattern, matching groups are used as the 'name' portions of actions specified by 'extractorActions'.

If extractorType = json, only scan events for JSONs within the boundaries of the pattern's matches.

Advanced

extractorForeach

Name of options group for whose instances to create matching extractors.

Type Default Category
String "" Advanced

Specifies the name of an options group for whose instances to replicate this extractor. For example, the Elastic input defines the 'elastic' options group and configures an extractor to apply to each input stream created from its configured instances.


This unit is defined in extract/unit.yaml.