Skip to content

Text File Scanner

Extracts symbol values from text/binary files using plain text tokenization or Jackson parsers.

Parses input by splitting lines with delimiters or using a JsonFactory for structured token reading.

When source code for a log format is unavailable (e.g., third-party services), scan a sample log to extract symbols for future parsing.

Size limit

Files over 50KB are skipped to avoid slow parsing of machine-generated content.

Configuration

To configure the Text file scanner module, Edit these settings.

Below is the default configuration from: text/jackson.yaml.

Edit Online

Edit jackson.yaml Locally

# 🔟❎ 'compile' text symbol scanner configuration

# The 'text' scanner parses text files for symbol values. It utilizes
# the Jackson parser library to parse any text/binary format it supports.
# This includes formats such json, yaml, xml, ini, protobuf and more.

# The configuration below is added by default to the 10x 'compile' pipeline.
# Even so, if another text scanner is defined via the 'textScanners' options group
# whose 'fileNameFilter' matches that of the current target input file,
# it will take precedence over the text scanners defined below.
# To learn more about text scanner options below, see:
# https://doc.log10x.com/compile/scanner/text

# Set the 10x pipeline to 'compile'
tenx: compile

# =============================== Text Options ================================

text:

  - parserName: text
    fileNameFilter: '^.*\.(csv|txt|properties|csv|tsv|ini|conf|sh|log|out)$'
    maxLines: 500
    lineOffset: 0
    allowDigits: false
    minLength: 2
    maxLength: 50

  - parserName: json
    fileNameFilter: '^.*\.(json)$'
    parserFactoryClass: com.fasterxml.jackson.core.JsonFactory
    scanFieldValues: true

  - parserName: yaml
    fileNameFilter: '^.*\.(yml|yaml)$'
    parserFactoryClass: com.log10x.eng.scanner.text.TextYamlFactory
    scanFieldValues: true

  - parserName: xml
    fileNameFilter: '^.*\.(xml|xsd)$'
    parserFactoryClass: com.fasterxml.jackson.dataformat.xml.XmlFactory
    scanFieldValues: true

Options

Specify the options below to configure multiple Text file scanner:

Name Description Category
textParserName Parser logical name General
textFileNameFilter Pattern to match for target input file name General
textParserFactoryClass Parser factory class Parser
textParserFactoryArgs Arguments for 'textParserFactory' ctor Parser
textScanFieldValues Controls whether to capture Jackson parser field values Parser
textMaxLines Max number of lines to scan Text
textLineOffset Line number from which to start scan Text
textAllowDigits Controls whether to capture tokens containing numeric as symbol tokens Text
textMinLength Min character length for a token to be considered a symbol value Text
textMaxLength Max character length for a token to be considered a symbol value Text

General

textParserName

Parser logical name.

Type Required Category
String General

Defines a logical unique name for this parser (e.g., 'logs').

textFileNameFilter

Pattern to match for target input file name.

Type Required Category
String General

Defines a regex pattern a file must match against for this scanner to apply to it.

Parser

textParserFactoryClass

Parser factory class.

Type Default Category
String "" Parser

Provides an optional fully qualified name of a class name derived from a JsonFactory.

If specified, the scanner instantiates the factory using a parameterless constructor and invokes its createParser method to generate a parser instance. The scanner uses the parser to read token values from the file.

if 'textParserFactoryArgs' is specified, a constructor receiving a string[] must be defined by this class to receive optional config parameters.

textParserFactoryArgs

Arguments for 'textParserFactory' ctor.

Type Default Category
List [] Parser

Specifies arguments to pass to the parser factory instance constructor. This option only applies if 'textParserFactoryClass' is set.

textScanFieldValues

Controls whether to capture Jackson parser field values.

Type Default Category
Boolean false Parser

Controls whether to capture a Jackson parser's VALUE_STRING tokens are scanned for entries or just values of FIELD_NAME tokens.

This option only applies if 'textParserFactoryClass' is set.

Text

textMaxLines

Max number of lines to scan.

Type Default Category
Number 0 Text

Controls the maximum number of lines to scan for symbol values from the input file. This option is useful when scanning existing log files as 'templates' for parsing future logs from a similar input stream.

textLineOffset

Line number from which to start scan.

Type Default Category
Number 0 Text

Specifies the line number from which to start scanning for symbols. This option is useful when scanning a specific portion of a text file.

textAllowDigits

Controls whether to capture tokens containing numeric as symbol tokens.

Type Default Category
Boolean false Text

Controls whether tokens contain numeric chars (e.g., 0-9) are accepted as symbol tokens. As alphanumeric combinations tend to have high cardinality (e.g., GUID, trace_id), it is not generally advised to add them to symbol units unless specifically known to be 'constant' / low cardinality values.

textMinLength

Min character length for a token to be considered a symbol value.

Type Default Category
Number 0 Text

Sets the minimal character length a token must have to constitute a symbol value. Very short tokens (e.g., len \< 3) have a high probability of being dynamic values with high cardinality and, as such, should not be captured as symbol values.

textMaxLength

Max character length for a token to be considered a symbol value.

Type Default Category
Number 0 Text

Sets the maximum character length a token must have to constitute a symbol value. Very long tokens (e.g., len > 100) have a high probability of being dynamic values with high cardinality and, as such, should not be captured as symbol values.


This module is defined in text/module.yaml.