Text File Scanner
Extracts symbol values from text/binary files using plain text tokenization or Jackson parsers.
Parses input by splitting lines with delimiters or using a JsonFactory for structured token reading.
When source code for a log format is unavailable (e.g., third-party services), scan a sample log to extract symbols for future parsing.
Size limit
Files over 50KB are skipped to avoid slow parsing of machine-generated content.
Configuration
To configure the Text file scanner module, Edit these settings.
Below is the default configuration from: text/jackson.yaml.
ewogICJ0eXBlIiA6ICJvYmplY3QiLAogICJwcm9wZXJ0aWVzIiA6IHsKICAgICJ0ZW54IiA6IHsKICAgICAgInR5cGUiIDogInN0cmluZyIKICAgIH0sCiAgICAidGV4dCIgOiB7CiAgICAgICJ0eXBlIiA6ICJhcnJheSIsCiAgICAgICJpdGVtcyIgOiB7CiAgICAgICAgInR5cGUiIDogIm9iamVjdCIsCiAgICAgICAgImFkZGl0aW9uYWxQcm9wZXJ0aWVzIiA6IGZhbHNlLAogICAgICAgICJwcm9wZXJ0aWVzIiA6IHsKICAgICAgICAgICJwYXJzZXJOYW1lIiA6IHsKICAgICAgICAgICAgInR5cGUiIDogWwogICAgICAgICAgICAgICJzdHJpbmciLAogICAgICAgICAgICAgICJudWxsIgogICAgICAgICAgICBdLAogICAgICAgICAgICAibWFya2Rvd25EZXNjcmlwdGlvbiIgOiAiUGFyc2VyIGxvZ2ljYWwgbmFtZVxuXG5EZWZpbmVzIGEgbG9naWNhbCB1bmlxdWUgbmFtZSBmb3IgdGhpcyBwYXJzZXIgKGUuZy4sICdsb2dzJykiCiAgICAgICAgICB9LAogICAgICAgICAgImZpbGVOYW1lRmlsdGVyIiA6IHsKICAgICAgICAgICAgInR5cGUiIDogWwogICAgICAgICAgICAgICJzdHJpbmciLAogICAgICAgICAgICAgICJudWxsIgogICAgICAgICAgICBdLAogICAgICAgICAgICAibWFya2Rvd25EZXNjcmlwdGlvbiIgOiAiUGF0dGVybiB0byBtYXRjaCBmb3IgdGFyZ2V0IGlucHV0IGZpbGUgbmFtZVxuXG5EZWZpbmVzIGEgcmVnZXggcGF0dGVybiBhIGZpbGUgbXVzdCBtYXRjaCBhZ2FpbnN0IGZvciB0aGlzIHNjYW5uZXIgdG8gYXBwbHkgdG8gaXQuIgogICAgICAgICAgfSwKICAgICAgICAgICJwYXJzZXJGYWN0b3J5Q2xhc3MiIDogewogICAgICAgICAgICAidHlwZSIgOiBbCiAgICAgICAgICAgICAgInN0cmluZyIsCiAgICAgICAgICAgICAgIm51bGwiCiAgICAgICAgICAgIF0sCiAgICAgICAgICAgICJtYXJrZG93bkRlc2NyaXB0aW9uIiA6ICJQYXJzZXIgZmFjdG9yeSBjbGFzc1xuXG5Qcm92aWRlcyBhbiBvcHRpb25hbCBmdWxseSBxdWFsaWZpZWQgbmFtZSBvZiBhIGNsYXNzIG5hbWUgZGVyaXZlZCBmcm9tIGEgW0pzb25GYWN0b3J5XShodHRwczovL2Zhc3RlcnhtbC5naXRodWIuaW8vamFja3Nvbi1jb3JlL2phdmFkb2MvMi42L2NvbS9mYXN0ZXJ4bWwvamFja3Nvbi9jb3JlL0pzb25GYWN0b3J5Lmh0bWwpLiAgSWYgc3BlY2lmaWVkLCB0aGUgc2Nhbm5lciBpbnN0YW50aWF0ZXMgdGhlIGZhY3RvcnkgdXNpbmcgYSBwYXJhbWV0ZXJsZXNzIGNvbnN0cnVjdG9yICBhbmQgaW52b2tlcyBpdHMgW2NyZWF0ZVBhcnNlcl0oaHR0cHM6Ly9mYXN0ZXJ4bWwuZ2l0aHViLmlvL2phY2tzb24tY29yZS9qYXZhZG9jLzIuNi9jb20vZmFzdGVyeG1sL2phY2tzb24vY29yZS9Kc29uRmFjdG9yeS5odG1sI2NyZWF0ZVBhcnNlcihqYXZhLmlvLklucHV0U3RyZWFtKSkgbWV0aG9kIHRvIGdlbmVyYXRlIGEgcGFyc2VyIGluc3RhbmNlLiBUaGUgc2Nhbm5lciB1c2VzIHRoZSBwYXJzZXIgdG8gcmVhZCB0b2tlbiB2YWx1ZXMgZnJvbSB0aGUgZmlsZS4gIGlmICd0ZXh0UGFyc2VyRmFjdG9yeUFyZ3MnIGlzIHNwZWNpZmllZCwgYSBjb25zdHJ1Y3RvciByZWNlaXZpbmcgYSBzdHJpbmdbXSBtdXN0ICBiZSBkZWZpbmVkIGJ5IHRoaXMgY2xhc3MgdG8gcmVjZWl2ZSBvcHRpb25hbCBjb25maWcgcGFyYW1ldGVycyIKICAgICAgICAgIH0sCiAgICAgICAgICAicGFyc2VyRmFjdG9yeUFyZ3MiIDogewogICAgICAgICAgICAidHlwZSIgOiBbCiAgICAgICAgICAgICAgImFycmF5IiwKICAgICAgICAgICAgICAibnVsbCIKICAgICAgICAgICAgXSwKICAgICAgICAgICAgIm1hcmtkb3duRGVzY3JpcHRpb24iIDogIkFyZ3VtZW50cyBmb3IgJ3RleHRQYXJzZXJGYWN0b3J5JyBjdG9yXG5cblNwZWNpZmllcyBhcmd1bWVudHMgdG8gcGFzcyB0byB0aGUgcGFyc2VyIGZhY3RvcnkgaW5zdGFuY2UgY29uc3RydWN0b3IuIFRoaXMgb3B0aW9uIG9ubHkgYXBwbGllcyBpZiAndGV4dFBhcnNlckZhY3RvcnlDbGFzcycgaXMgc2V0LiIsCiAgICAgICAgICAgICJpdGVtcyIgOiB7CiAgICAgICAgICAgICAgInR5cGUiIDogInN0cmluZyIKICAgICAgICAgICAgfQogICAgICAgICAgfSwKICAgICAgICAgICJzY2FuRmllbGRWYWx1ZXMiIDogewogICAgICAgICAgICAidHlwZSIgOiBbCiAgICAgICAgICAgICAgImJvb2xlYW4iLAogICAgICAgICAgICAgICJzdHJpbmciCiAgICAgICAgICAgIF0sCiAgICAgICAgICAgICJtYXJrZG93bkRlc2NyaXB0aW9uIiA6ICJDb250cm9scyB3aGV0aGVyIHRvIGNhcHR1cmUgSmFja3NvbiBwYXJzZXIgZmllbGQgdmFsdWVzXG5cbkNvbnRyb2xzIHdoZXRoZXIgdG8gY2FwdHVyZSBhIEphY2tzb24gcGFyc2VyJ3MgW1ZBTFVFX1NUUklOR10oaHR0cHM6Ly9mYXN0ZXJ4bWwuZ2l0aHViLmlvL2phY2tzb24tY29yZS9qYXZhZG9jLzIuOC9jb20vZmFzdGVyeG1sL2phY2tzb24vY29yZS9Kc29uVG9rZW4uaHRtbCNWQUxVRV9TVFJJTkcpIHRva2VucyBhcmUgc2Nhbm5lZCBmb3IgZW50cmllcyBvciBqdXN0IHZhbHVlcyBvZiBbRklFTERfTkFNRV0oaHR0cHM6Ly9mYXN0ZXJ4bWwuZ2l0aHViLmlvL2phY2tzb24tY29yZS9qYXZhZG9jLzIuOC9jb20vZmFzdGVyeG1sL2phY2tzb24vY29yZS9Kc29uVG9rZW4uaHRtbCNGSUVMRF9OQU1FKSB0b2tlbnMuICAgIFRoaXMgb3B0aW9uIG9ubHkgYXBwbGllcyBpZiAndGV4dFBhcnNlckZhY3RvcnlDbGFzcycgaXMgc2V0LiAoQWNjZXB0cyBib29sZWFuIG9yIHN0cmluZyB3aXRoICQ9IHByZWZpeCBmb3IgcnVudGltZSBldmFsdWF0aW9uKSIKICAgICAgICAgIH0sCiAgICAgICAgICAibWF4TGluZXMiIDogewogICAgICAgICAgICAidHlwZSIgOiBbCiAgICAgICAgICAgICAgIm51bWJlciIsCiAgICAgICAgICAgICAgInN0cmluZyIKICAgICAgICAgICAgXSwKICAgICAgICAgICAgIm1hcmtkb3duRGVzY3JpcHRpb24iIDogIk1heCBudW1iZXIgb2YgbGluZXMgdG8gc2NhblxuXG5Db250cm9scyB0aGUgbWF4aW11bSBudW1iZXIgb2YgbGluZXMgdG8gc2NhbiBmb3Igc3ltYm9sIHZhbHVlcyBmcm9tIHRoZSBpbnB1dCBmaWxlLiBUaGlzIG9wdGlvbiBpcyB1c2VmdWwgd2hlbiBzY2FubmluZyBleGlzdGluZyBsb2cgZmlsZXMgYXMgJ3RlbXBsYXRlcycgZm9yIHBhcnNpbmcgZnV0dXJlIGxvZ3MgZnJvbSBhIHNpbWlsYXIgaW5wdXQgc3RyZWFtLiAoQWNjZXB0cyBudW1iZXIgb3Igc3RyaW5nIHdpdGggJD0gcHJlZml4IGZvciBydW50aW1lIGV2YWx1YXRpb24pIgogICAgICAgICAgfSwKICAgICAgICAgICJsaW5lT2Zmc2V0IiA6IHsKICAgICAgICAgICAgInR5cGUiIDogWwogICAgICAgICAgICAgICJudW1iZXIiLAogICAgICAgICAgICAgICJzdHJpbmciCiAgICAgICAgICAgIF0sCiAgICAgICAgICAgICJtYXJrZG93bkRlc2NyaXB0aW9uIiA6ICJMaW5lIG51bWJlciBmcm9tIHdoaWNoIHRvIHN0YXJ0IHNjYW5cblxuU3BlY2lmaWVzIHRoZSBsaW5lIG51bWJlciBmcm9tIHdoaWNoIHRvIHN0YXJ0IHNjYW5uaW5nIGZvciBzeW1ib2xzLiBUaGlzIG9wdGlvbiBpcyB1c2VmdWwgd2hlbiBzY2FubmluZyBhIHNwZWNpZmljIHBvcnRpb24gb2YgYSB0ZXh0IGZpbGUuIChBY2NlcHRzIG51bWJlciBvciBzdHJpbmcgd2l0aCAkPSBwcmVmaXggZm9yIHJ1bnRpbWUgZXZhbHVhdGlvbikiCiAgICAgICAgICB9LAogICAgICAgICAgImFsbG93RGlnaXRzIiA6IHsKICAgICAgICAgICAgInR5cGUiIDogWwogICAgICAgICAgICAgICJib29sZWFuIiwKICAgICAgICAgICAgICAic3RyaW5nIgogICAgICAgICAgICBdLAogICAgICAgICAgICAibWFya2Rvd25EZXNjcmlwdGlvbiIgOiAiQ29udHJvbHMgd2hldGhlciB0byBjYXB0dXJlIHRva2VucyBjb250YWluaW5nIG51bWVyaWMgYXMgc3ltYm9sIHRva2Vuc1xuXG5Db250cm9scyB3aGV0aGVyIHRva2VucyBjb250YWluIG51bWVyaWMgY2hhcnMgKGUuZy4sIDAtOSkgYXJlIGFjY2VwdGVkIGFzIHN5bWJvbCB0b2tlbnMuIEFzIGFscGhhbnVtZXJpYyBjb21iaW5hdGlvbnMgdGVuZCB0byBoYXZlIGhpZ2ggY2FyZGluYWxpdHkgKGUuZy4sIEdVSUQsIHRyYWNlX2lkKSwgaXQgaXMgbm90IGdlbmVyYWxseSBhZHZpc2VkICB0byBhZGQgdGhlbSB0byBzeW1ib2wgdW5pdHMgdW5sZXNzIHNwZWNpZmljYWxseSBrbm93biB0byBiZSAnY29uc3RhbnQnIC8gbG93IGNhcmRpbmFsaXR5IHZhbHVlcyAoQWNjZXB0cyBib29sZWFuIG9yIHN0cmluZyB3aXRoICQ9IHByZWZpeCBmb3IgcnVudGltZSBldmFsdWF0aW9uKSIKICAgICAgICAgIH0sCiAgICAgICAgICAibWluTGVuZ3RoIiA6IHsKICAgICAgICAgICAgInR5cGUiIDogWwogICAgICAgICAgICAgICJudW1iZXIiLAogICAgICAgICAgICAgICJzdHJpbmciCiAgICAgICAgICAgIF0sCiAgICAgICAgICAgICJtYXJrZG93bkRlc2NyaXB0aW9uIiA6ICJNaW4gY2hhcmFjdGVyIGxlbmd0aCBmb3IgYSB0b2tlbiB0byBiZSBjb25zaWRlcmVkIGEgc3ltYm9sIHZhbHVlXG5cblNldHMgdGhlIG1pbmltYWwgY2hhcmFjdGVyIGxlbmd0aCBhIHRva2VuIG11c3QgaGF2ZSB0byBjb25zdGl0dXRlIGEgc3ltYm9sIHZhbHVlLiBWZXJ5IHNob3J0IHRva2VucyAoZS5nLiwgbGVuIDwgMykgaGF2ZSBhIGhpZ2ggcHJvYmFiaWxpdHkgb2YgYmVpbmcgZHluYW1pYyB2YWx1ZXMgd2l0aCBoaWdoIGNhcmRpbmFsaXR5IGFuZCwgYXMgc3VjaCwgc2hvdWxkIG5vdCAgYmUgY2FwdHVyZWQgYXMgc3ltYm9sIHZhbHVlcy4gKEFjY2VwdHMgbnVtYmVyIG9yIHN0cmluZyB3aXRoICQ9IHByZWZpeCBmb3IgcnVudGltZSBldmFsdWF0aW9uKSIKICAgICAgICAgIH0sCiAgICAgICAgICAibWF4TGVuZ3RoIiA6IHsKICAgICAgICAgICAgInR5cGUiIDogWwogICAgICAgICAgICAgICJudW1iZXIiLAogICAgICAgICAgICAgICJzdHJpbmciCiAgICAgICAgICAgIF0sCiAgICAgICAgICAgICJtYXJrZG93bkRlc2NyaXB0aW9uIiA6ICJNYXggY2hhcmFjdGVyIGxlbmd0aCBmb3IgYSB0b2tlbiB0byBiZSBjb25zaWRlcmVkIGEgc3ltYm9sIHZhbHVlXG5cblNldHMgdGhlIG1heGltdW0gY2hhcmFjdGVyIGxlbmd0aCBhIHRva2VuIG11c3QgaGF2ZSB0byBjb25zdGl0dXRlIGEgc3ltYm9sIHZhbHVlLiBWZXJ5IGxvbmcgdG9rZW5zIChlLmcuLCBsZW4gPiAxMDApIGhhdmUgYSBoaWdoIHByb2JhYmlsaXR5IG9mIGJlaW5nIGR5bmFtaWMgdmFsdWVzIHdpdGggaGlnaCBjYXJkaW5hbGl0eSBhbmQsIGFzIHN1Y2gsIHNob3VsZCBub3QgIGJlIGNhcHR1cmVkIGFzIHN5bWJvbCB2YWx1ZXMuIChBY2NlcHRzIG51bWJlciBvciBzdHJpbmcgd2l0aCAkPSBwcmVmaXggZm9yIHJ1bnRpbWUgZXZhbHVhdGlvbikiCiAgICAgICAgICB9CiAgICAgICAgfSwKICAgICAgICAicmVxdWlyZWQiIDogWwogICAgICAgICAgInBhcnNlck5hbWUiLAogICAgICAgICAgImZpbGVOYW1lRmlsdGVyIgogICAgICAgIF0KICAgICAgfQogICAgfQogIH0sCiAgImFkZGl0aW9uYWxQcm9wZXJ0aWVzIiA6IGZhbHNlCn0=
# 🔟❎ 'compile' text symbol scanner configuration
# The 'text' scanner parses text files for symbol values. It utilizes
# the Jackson parser library to parse any text/binary format it supports.
# This includes formats such json, yaml, xml, ini, protobuf and more.
# The configuration below is added by default to the 10x 'compile' pipeline.
# Even so, if another text scanner is defined via the 'textScanners' options group
# whose 'fileNameFilter' matches that of the current target input file,
# it will take precedence over the text scanners defined below.
# To learn more about text scanner options below, see:
# https://doc.log10x.com/compile/scanner/text
# Set the 10x pipeline to 'compile'
tenx: compile
# =============================== Text Options ================================
text:
- parserName: text
fileNameFilter: '^.*\.(csv|txt|properties|csv|tsv|ini|conf|sh|log|out)$'
maxLines: 500
lineOffset: 0
allowDigits: false
minLength: 2
maxLength: 50
- parserName: json
fileNameFilter: '^.*\.(json)$'
parserFactoryClass: com.fasterxml.jackson.core.JsonFactory
scanFieldValues: true
- parserName: yaml
fileNameFilter: '^.*\.(yml|yaml)$'
parserFactoryClass: com.log10x.eng.scanner.text.TextYamlFactory
scanFieldValues: true
- parserName: xml
fileNameFilter: '^.*\.(xml|xsd)$'
parserFactoryClass: com.fasterxml.jackson.dataformat.xml.XmlFactory
scanFieldValues: true
Options
Specify the options below to configure multiple Text file scanner:
| Name | Description | Category |
|---|---|---|
| textParserName | Parser logical name | General |
| textFileNameFilter | Pattern to match for target input file name | General |
| textParserFactoryClass | Parser factory class | Parser |
| textParserFactoryArgs | Arguments for 'textParserFactory' ctor | Parser |
| textScanFieldValues | Controls whether to capture Jackson parser field values | Parser |
| textMaxLines | Max number of lines to scan | Text |
| textLineOffset | Line number from which to start scan | Text |
| textAllowDigits | Controls whether to capture tokens containing numeric as symbol tokens | Text |
| textMinLength | Min character length for a token to be considered a symbol value | Text |
| textMaxLength | Max character length for a token to be considered a symbol value | Text |
General
textParserName
Parser logical name.
| Type | Required | Category |
|---|---|---|
| String | ✔ | General |
Defines a logical unique name for this parser (e.g., 'logs').
textFileNameFilter
Pattern to match for target input file name.
| Type | Required | Category |
|---|---|---|
| String | ✔ | General |
Defines a regex pattern a file must match against for this scanner to apply to it.
Parser
textParserFactoryClass
Parser factory class.
| Type | Default | Category |
|---|---|---|
| String | "" | Parser |
Provides an optional fully qualified name of a class name derived from a JsonFactory.
If specified, the scanner instantiates the factory using a parameterless constructor and invokes its createParser method to generate a parser instance. The scanner uses the parser to read token values from the file.
if 'textParserFactoryArgs' is specified, a constructor receiving a string[] must be defined by this class to receive optional config parameters.
textParserFactoryArgs
Arguments for 'textParserFactory' ctor.
| Type | Default | Category |
|---|---|---|
| List | [] | Parser |
Specifies arguments to pass to the parser factory instance constructor. This option only applies if 'textParserFactoryClass' is set.
textScanFieldValues
Controls whether to capture Jackson parser field values.
| Type | Default | Category |
|---|---|---|
| Boolean | false | Parser |
Controls whether to capture a Jackson parser's VALUE_STRING tokens are scanned for entries or just values of FIELD_NAME tokens.
This option only applies if 'textParserFactoryClass' is set.
Text
textMaxLines
Max number of lines to scan.
| Type | Default | Category |
|---|---|---|
| Number | 0 | Text |
Controls the maximum number of lines to scan for symbol values from the input file. This option is useful when scanning existing log files as 'templates' for parsing future logs from a similar input stream.
textLineOffset
Line number from which to start scan.
| Type | Default | Category |
|---|---|---|
| Number | 0 | Text |
Specifies the line number from which to start scanning for symbols. This option is useful when scanning a specific portion of a text file.
textAllowDigits
Controls whether to capture tokens containing numeric as symbol tokens.
| Type | Default | Category |
|---|---|---|
| Boolean | false | Text |
Controls whether tokens contain numeric chars (e.g., 0-9) are accepted as symbol tokens. As alphanumeric combinations tend to have high cardinality (e.g., GUID, trace_id), it is not generally advised to add them to symbol units unless specifically known to be 'constant' / low cardinality values.
textMinLength
Min character length for a token to be considered a symbol value.
| Type | Default | Category |
|---|---|---|
| Number | 0 | Text |
Sets the minimal character length a token must have to constitute a symbol value. Very short tokens (e.g., len \< 3) have a high probability of being dynamic values with high cardinality and, as such, should not be captured as symbol values.
textMaxLength
Max character length for a token to be considered a symbol value.
| Type | Default | Category |
|---|---|---|
| Number | 0 | Text |
Sets the maximum character length a token must have to constitute a symbol value. Very long tokens (e.g., len > 100) have a high probability of being dynamic values with high cardinality and, as such, should not be captured as symbol values.
This module is defined in text/module.yaml.