Input Scanner
Scan input source code/binary files to produces symbol files from a variety of programming languages, text and binary formats using configurable scanner modules.
Scanner modules parse the contents of a target input file to generate an AST from which to capture symbol values and the context in which they appear (e.g., class, function, printout).
The run pipeline utilizes symbol files to transform input log/trace events into typed TenXObjects.
Once scanner modules complete, the compile pipeline links symbol files into a single symbol library file for use at run time.
Output
Scanners emit information captured from each input into a respective symbol unit (.json) file. The path of each output symbol unit within outputSymbolFolder matches its folder structure within inputPaths
For example, when launching a compile pipeline with the following config:
For a foo.js file located in:
~/dev/app/foo.js
The respective output symbol unit output file is:
~/symbols/app/foo.js.10x.json
Parallel Processing
The compile pipeline supports running multiple instances of the compiler app in parallel by using shared file locks to safely reuse existing symbol unit files residing in the output path. The underlying storage system must support advisory locks for parallel execution to be enabled.
A separate scheduled task periodically can link symbol unit files into a single symbol library artifact used by edge/cloud apps at runtime.
Modules
-
ANTLR Scanner
Extract symbols from any programming language using the ANTLR framework.
-
JavaParser Scanner
Scan Java source code for symbol values using the JavaParser library.
-
CPython AST Scanner
Scan Python source code for symbol values.
-
JVM Bytecode Scanner
Scan JVM .class files for symbol values.
-
Scalameta Scanner
Scan Java source code for symbol values using the Scalameta library.
-
Executable Scanners
Launch a target process from which to read symbol values via its stdout.
-
Text File Scanners
Scan text/binary files for symbol values using the Jackson library.
-
Regex Pattern Scanners
Extract symbol values from a files using regular expressions.
-
Archive Scanner
Scan compressed archives for enclosed files containing symbol values.
-
Symbol Unit Scanner
Discover existing symbol unit files to reuse.
-
Log Method/stream Definitions
Define logger method and output stream names.
Configuration
To configure the Input Scanner unit, Edit these settings:
Scan
Scan
Configure the Input Scanner to capture symbol values from source code/binary files and link to an output symbol library.
Below is the default configuration from: scanners/config.yaml.
{
  "type" : "object",
  "properties" : {
    "tenx" : {
      "type" : "string"
    },
    "inputPaths" : {
      "type" : [
        "array"
      ],
      "markdownDescription" : "Input folders to scan\n\nProvides a list of folders that are traversed in search of source code (e.g. .js, .cpp), binary (e.g., .so, .jar, .zip) and text (e.g., .json, .xml) files to capture symbol values from and output to symbol unit files. Input folders may contain files and folders pulled from [GitHub](https://doc.log10x.com/compile/pull/github/) or  and [Artifactory](https://doc.log10x.com/compile/artifactory).",
      "items" : {
        "type" : "string"
      }
    },
    "outputSymbolFolder" : {
      "type" : [
        "string",
        "null"
      ],
      "markdownDescription" : "Location of output symbol folder\n\nSpecifies the root folder in which to write output symbol unit files. This compiler [scan phase](https://doc.log10x.com/compile/scan/) mirrors the structure of the [inputPaths](https://doc.log10x.com/compile/scan/#inputpaths) argument, placing each output symbol unit in the same relative path under [outputSymbolFolder](https://doc.log10x.com/compile/scan/#outputsymbolfolder) as its source/binary input file under its respective `inputPaths` folder entry.  For example, if [inputPaths](https://doc.log10x.com/compile/scan/#inputpaths) is `~/dev` and it contains the file `~/dev/app/foo.js`, and this value is `~/symbols`, the respective output symbol unit file is: `~/symbols/app/foo.js.10x.json`."
    },
    "scan" : {
      "type" : "object",
      "additionalProperties" : false,
      "properties" : {
        "force" : {
          "type" : [
            "boolean",
            "string"
          ],
          "markdownDescription" : "Control whether to scan input files even if a matching existing symbol file exists\n\nControls whether to reuse matching symbol files within [inputPaths](https://doc.log10x.com/compile/scan/#inputpaths). This option is primarily useful when debugging a new scanner module configuration (e.g., ANTLR grammar)  in conjunction with the [scanner debugging](https://doc.log10x.com/compile/scan) options. (Accepts boolean or string with $= prefix for runtime evaluation) (Default: false)",
          "default" : false
        },
        "inProcess" : {
          "type" : [
            "boolean",
            "string"
          ],
          "markdownDescription" : "Controls whether to scan source/binary input files using sub-process(es)\n\nThe scanner can launch [subprocesses](https://doc.log10x.com/compile/scan/#subprocess_1) to capture [symbol](https://doc.log10x.com/run/transform/structure/#symbols) information from specific target source/binary input files.   Generating input file [ASTs](https://en.wikipedia.org/wiki/Abstract_syntax_tree) to scan for symbols values for complex syntaxes can be a lengthy, resource-consuming operation, which in the case of  [ANTLR syntaxes](https://tomassetti.me/improving-the-performance-of-an-antlr-parser/) may timeout or run out of memory.     Process [isolation](https://en.wikipedia.org/wiki/Process_isolation) allows for terminating scan operations that timeout for specific files, while allowing others to proceed. (Accepts boolean or string with $= prefix for runtime evaluation) (Default: false)",
          "default" : false
        },
        "threadPoolSize" : {
          "type" : [
            "string",
            "null"
          ],
          "markdownDescription" : "Maximum number of threads to use concurrently process input source code /binary files\n\nControls the maximum number of threads to process input source code/binary files concurrently: - If the value is between 0 and 1, the number calculates a percentage of available cores (e.g., 0.5 = use up to 50% of available cores).   - If the value is >= 1, the value sets a fixed number of threads (e.g., 10 = 10 threads).   - If the value is 1, allocate a single thread to process input files. (Default: 0.5)",
          "default" : "0.5"
        },
        "unitTimeout" : {
          "type" : [
            "string",
            "null"
          ],
          "markdownDescription" : "Timeout interval for scanning a source code /binary input file\n\nSets the timeout interval when scanning a source code /binary input file before dropping it. Set to null to ignore. (Default: 20s)",
          "default" : "20s"
        },
        "operationTimeout" : {
          "type" : [
            "string",
            "null"
          ],
          "markdownDescription" : "Timeout interval for the entire scanning operation\n\nSets the timeout interval for the entire scanning operation before terminating it. Set to null to ignore. (Default: 10m)",
          "default" : "10m"
        },
        "debug" : {
          "type" : "object",
          "additionalProperties" : false,
          "properties" : {
            "origins" : {
              "type" : [
                "array",
                "null"
              ],
              "markdownDescription" : "List of input file names for which to log information\n\nSpecifies a list of input file names (e.g., my.scala, foo.java) to log information relating to which symbols were captured/skipped. Specify '*' for all files.",
              "items" : {
                "type" : "string"
              }
            },
            "symbols" : {
              "type" : [
                "array",
                "null"
              ],
              "markdownDescription" : "List of node types within a scanner AST tree for which to log information\n\nList of nodes within an input AST tree, such as class and method names(e.g., 'MyClass', 'foo') for which to log information relating to the context in which they were captured/skipped as symbols.  Specify '*' for all nodes.",
              "items" : {
                "type" : "string"
              }
            },
            "loggerName" : {
              "type" : [
                "string",
                "null"
              ],
              "markdownDescription" : "Debug logger name\n\nSpecifies the log4j logger for logging values for symbols/files matching 'debugOrigins' and 'debugSymbols'. (Default: [consoleOut](https://github.com/log-10x/config/blob/main/log4j2.yaml#L66))",
              "default" : "[consoleOut](https://github.com/log-10x/config/blob/main/log4j2.yaml#L66)"
            }
          }
        }
      }
    },
    "printOutput" : {
      "type" : [
        "boolean",
        "string"
      ],
      "markdownDescription" : "Controls whether to emit output scan operations progress to the console\n\nControls whether information relating to symbol units scanning is emitted to the stdout stream. This value should be set to true when the current 10x process is  spawned by a parent 10x 'compile' pipeline to report its progress. (Accepts boolean or string with $= prefix for runtime evaluation) (Default: false)",
      "default" : false
    },
    "unitsToScanPerBatch" : {
      "type" : [
        "number",
        "string"
      ],
      "markdownDescription" : "Maximum number of input files to scan per sub-process.\n\nSets the maximum number of source/binary input files to scan by a single 'compile' sub-process. Set to 0 to unlimited. (Accepts number or string with $= prefix for runtime evaluation) (Default: 40)",
      "default" : 40
    },
    "tokenDelims" : {
      "type" : [
        "string",
        "null"
      ],
      "markdownDescription" : "Token delimiters\n\nDefines which characters break a string of event characters into tokens to classify as symbols or variables. (Default: <>|. ][:-+/*=\\\\,{}_();'\\\"$\\t\\n@)",
      "default" : "<>|. ][:-+/*=\\\\,{}_();'\\\"$\\t\\n@"
    },
    "fileExtStringFormats" : {
      "type" : [
        "array",
        "null"
      ],
      "markdownDescription" : "String format specifier characters\n\nControls which characters identify the start of a [format specifier](https://en.wikipedia.org/wiki/Printf#Format_specifier) for each programming language extension in the form of <ext>:<formatPrefix> (e.g., `.java:%`).  It is a standard programming practice to use string formats to log messages in the form of:  ``` java log.error(\"could not connect to {} with status {}\", host, status); ```  When an TenXTemplate tokenizer processing an event identifies a format prefix (e.g., `%`) it skips it and appends subsequent symbol tokens (e.g., `with status`) to the symbol sequence preceding them (e.g., `could not connect to`).",
      "items" : {
        "type" : "string"
      }
    },
    "maxSymbol" : {
      "type" : "object",
      "additionalProperties" : false,
      "properties" : {
        "unitSectionSize" : {
          "type" : [
            "number",
            "string"
          ],
          "markdownDescription" : "Maximum number of symbol tokens to store in a single symbol unit file\n\nControls the maximum number of symbol tokens to store in a symbol unit before splitting it. This option supports large source code /binary input files that may have thousands of text symbols to avoid  creating sizeable individual symbol unit files.         Set to null to ignore. (Accepts number or string with $= prefix for runtime evaluation) (Default: 500)",
          "default" : 500
        },
        "unitsPerToken" : {
          "type" : [
            "number",
            "string"
          ],
          "markdownDescription" : "Number of symbol units to retrieve when searching for the origin of an TenXTemplate symbol sequence\n\nControls the maximum number of symbol units to load from the pipeline's symbol library when searching for the origin of a specific [TenXTemplate](https://doc.log10x.com/run/template/) symbol.   The [symbolSequence](https://doc.log10x.com/api/js/#TenXObject+symbolSequence) function queries the source code/binary origin of symbol sequences in a target TenXTemplate to identify the logical [message](https://doc.log10x.com/run/initialize/message/) portion of a target app/infra events vs. variable and context information (e.g., user, host, severity).   Since a [symbol](https://doc.log10x.com/run/transform/structure/#symbols) value may appear in hundreds or more locations across a code base,  limiting the number of symbol units to load is necessary to reduce memory consumption.   A second consideration is placing an upper limit on the number of units to load from the symbol library,  which is the more frequent a symbol is, the lower the probability of selecting the correct origin. (Accepts number or string with $= prefix for runtime evaluation) (Default: 128)",
          "default" : 128
        }
      }
    }
  },
  "required" : [
    "inputPaths"
  ],
  "additionalProperties" : false
}
# 🔟❎ 'compile' symbol scanner config
# Configure input and output options for the compile pipeline.
# To learn more see https://doc.log10x.com/compile/scan
tenx: compile
# ========================== Configure I/O Paths ==============================
# 'inputPaths' lists folders on disk containing source code and binary files to scan.
# To learn more see https://doc.log10x.com/compile/scan/#inputpaths
inputPaths:
- $=path("data/compile/sources", "") # Default to none ("") if source folder not found
# - <path-to-input-files> # uncomment to specify additional input paths
# 'outputSymbolFolder' specifies the folder for output symbol unit files.
# To learn more see https://doc.log10x.com/compile/scan/#outputsymbolfolder
outputSymbolFolder: $=TenXEnv.get("TENX_OUTPUT_SYMBOL_FOLDER", path("data/shared/symbols", "<tenx.io.tmpdir>")) # Default to 10x temp dir
Advanced
Advanced
Below is the default configuration from: advanced/config.yaml.
{
  "type" : "object",
  "properties" : {
    "tenx" : {
      "type" : "string"
    },
    "inputPaths" : {
      "type" : [
        "array"
      ],
      "markdownDescription" : "Input folders to scan\n\nProvides a list of folders that are traversed in search of source code (e.g. .js, .cpp), binary (e.g., .so, .jar, .zip) and text (e.g., .json, .xml) files to capture symbol values from and output to symbol unit files. Input folders may contain files and folders pulled from [GitHub](https://doc.log10x.com/compile/pull/github/) or  and [Artifactory](https://doc.log10x.com/compile/artifactory).",
      "items" : {
        "type" : "string"
      }
    },
    "outputSymbolFolder" : {
      "type" : [
        "string",
        "null"
      ],
      "markdownDescription" : "Location of output symbol folder\n\nSpecifies the root folder in which to write output symbol unit files. This compiler [scan phase](https://doc.log10x.com/compile/scan/) mirrors the structure of the [inputPaths](https://doc.log10x.com/compile/scan/#inputpaths) argument, placing each output symbol unit in the same relative path under [outputSymbolFolder](https://doc.log10x.com/compile/scan/#outputsymbolfolder) as its source/binary input file under its respective `inputPaths` folder entry.  For example, if [inputPaths](https://doc.log10x.com/compile/scan/#inputpaths) is `~/dev` and it contains the file `~/dev/app/foo.js`, and this value is `~/symbols`, the respective output symbol unit file is: `~/symbols/app/foo.js.10x.json`."
    },
    "scan" : {
      "type" : "object",
      "additionalProperties" : false,
      "properties" : {
        "force" : {
          "type" : [
            "boolean",
            "string"
          ],
          "markdownDescription" : "Control whether to scan input files even if a matching existing symbol file exists\n\nControls whether to reuse matching symbol files within [inputPaths](https://doc.log10x.com/compile/scan/#inputpaths). This option is primarily useful when debugging a new scanner module configuration (e.g., ANTLR grammar)  in conjunction with the [scanner debugging](https://doc.log10x.com/compile/scan) options. (Accepts boolean or string with $= prefix for runtime evaluation) (Default: false)",
          "default" : false
        },
        "inProcess" : {
          "type" : [
            "boolean",
            "string"
          ],
          "markdownDescription" : "Controls whether to scan source/binary input files using sub-process(es)\n\nThe scanner can launch [subprocesses](https://doc.log10x.com/compile/scan/#subprocess_1) to capture [symbol](https://doc.log10x.com/run/transform/structure/#symbols) information from specific target source/binary input files.   Generating input file [ASTs](https://en.wikipedia.org/wiki/Abstract_syntax_tree) to scan for symbols values for complex syntaxes can be a lengthy, resource-consuming operation, which in the case of  [ANTLR syntaxes](https://tomassetti.me/improving-the-performance-of-an-antlr-parser/) may timeout or run out of memory.     Process [isolation](https://en.wikipedia.org/wiki/Process_isolation) allows for terminating scan operations that timeout for specific files, while allowing others to proceed. (Accepts boolean or string with $= prefix for runtime evaluation) (Default: false)",
          "default" : false
        },
        "threadPoolSize" : {
          "type" : [
            "string",
            "null"
          ],
          "markdownDescription" : "Maximum number of threads to use concurrently process input source code /binary files\n\nControls the maximum number of threads to process input source code/binary files concurrently: - If the value is between 0 and 1, the number calculates a percentage of available cores (e.g., 0.5 = use up to 50% of available cores).   - If the value is >= 1, the value sets a fixed number of threads (e.g., 10 = 10 threads).   - If the value is 1, allocate a single thread to process input files. (Default: 0.5)",
          "default" : "0.5"
        },
        "unitTimeout" : {
          "type" : [
            "string",
            "null"
          ],
          "markdownDescription" : "Timeout interval for scanning a source code /binary input file\n\nSets the timeout interval when scanning a source code /binary input file before dropping it. Set to null to ignore. (Default: 20s)",
          "default" : "20s"
        },
        "operationTimeout" : {
          "type" : [
            "string",
            "null"
          ],
          "markdownDescription" : "Timeout interval for the entire scanning operation\n\nSets the timeout interval for the entire scanning operation before terminating it. Set to null to ignore. (Default: 10m)",
          "default" : "10m"
        },
        "debug" : {
          "type" : "object",
          "additionalProperties" : false,
          "properties" : {
            "origins" : {
              "type" : [
                "array",
                "null"
              ],
              "markdownDescription" : "List of input file names for which to log information\n\nSpecifies a list of input file names (e.g., my.scala, foo.java) to log information relating to which symbols were captured/skipped. Specify '*' for all files.",
              "items" : {
                "type" : "string"
              }
            },
            "symbols" : {
              "type" : [
                "array",
                "null"
              ],
              "markdownDescription" : "List of node types within a scanner AST tree for which to log information\n\nList of nodes within an input AST tree, such as class and method names(e.g., 'MyClass', 'foo') for which to log information relating to the context in which they were captured/skipped as symbols.  Specify '*' for all nodes.",
              "items" : {
                "type" : "string"
              }
            },
            "loggerName" : {
              "type" : [
                "string",
                "null"
              ],
              "markdownDescription" : "Debug logger name\n\nSpecifies the log4j logger for logging values for symbols/files matching 'debugOrigins' and 'debugSymbols'. (Default: [consoleOut](https://github.com/log-10x/config/blob/main/log4j2.yaml#L66))",
              "default" : "[consoleOut](https://github.com/log-10x/config/blob/main/log4j2.yaml#L66)"
            }
          }
        }
      }
    },
    "printOutput" : {
      "type" : [
        "boolean",
        "string"
      ],
      "markdownDescription" : "Controls whether to emit output scan operations progress to the console\n\nControls whether information relating to symbol units scanning is emitted to the stdout stream. This value should be set to true when the current 10x process is  spawned by a parent 10x 'compile' pipeline to report its progress. (Accepts boolean or string with $= prefix for runtime evaluation) (Default: false)",
      "default" : false
    },
    "unitsToScanPerBatch" : {
      "type" : [
        "number",
        "string"
      ],
      "markdownDescription" : "Maximum number of input files to scan per sub-process.\n\nSets the maximum number of source/binary input files to scan by a single 'compile' sub-process. Set to 0 to unlimited. (Accepts number or string with $= prefix for runtime evaluation) (Default: 40)",
      "default" : 40
    },
    "tokenDelims" : {
      "type" : [
        "string",
        "null"
      ],
      "markdownDescription" : "Token delimiters\n\nDefines which characters break a string of event characters into tokens to classify as symbols or variables. (Default: <>|. ][:-+/*=\\\\,{}_();'\\\"$\\t\\n@)",
      "default" : "<>|. ][:-+/*=\\\\,{}_();'\\\"$\\t\\n@"
    },
    "fileExtStringFormats" : {
      "type" : [
        "array",
        "null"
      ],
      "markdownDescription" : "String format specifier characters\n\nControls which characters identify the start of a [format specifier](https://en.wikipedia.org/wiki/Printf#Format_specifier) for each programming language extension in the form of <ext>:<formatPrefix> (e.g., `.java:%`).  It is a standard programming practice to use string formats to log messages in the form of:  ``` java log.error(\"could not connect to {} with status {}\", host, status); ```  When an TenXTemplate tokenizer processing an event identifies a format prefix (e.g., `%`) it skips it and appends subsequent symbol tokens (e.g., `with status`) to the symbol sequence preceding them (e.g., `could not connect to`).",
      "items" : {
        "type" : "string"
      }
    },
    "maxSymbol" : {
      "type" : "object",
      "additionalProperties" : false,
      "properties" : {
        "unitSectionSize" : {
          "type" : [
            "number",
            "string"
          ],
          "markdownDescription" : "Maximum number of symbol tokens to store in a single symbol unit file\n\nControls the maximum number of symbol tokens to store in a symbol unit before splitting it. This option supports large source code /binary input files that may have thousands of text symbols to avoid  creating sizeable individual symbol unit files.         Set to null to ignore. (Accepts number or string with $= prefix for runtime evaluation) (Default: 500)",
          "default" : 500
        },
        "unitsPerToken" : {
          "type" : [
            "number",
            "string"
          ],
          "markdownDescription" : "Number of symbol units to retrieve when searching for the origin of an TenXTemplate symbol sequence\n\nControls the maximum number of symbol units to load from the pipeline's symbol library when searching for the origin of a specific [TenXTemplate](https://doc.log10x.com/run/template/) symbol.   The [symbolSequence](https://doc.log10x.com/api/js/#TenXObject+symbolSequence) function queries the source code/binary origin of symbol sequences in a target TenXTemplate to identify the logical [message](https://doc.log10x.com/run/initialize/message/) portion of a target app/infra events vs. variable and context information (e.g., user, host, severity).   Since a [symbol](https://doc.log10x.com/run/transform/structure/#symbols) value may appear in hundreds or more locations across a code base,  limiting the number of symbol units to load is necessary to reduce memory consumption.   A second consideration is placing an upper limit on the number of units to load from the symbol library,  which is the more frequent a symbol is, the lower the probability of selecting the correct origin. (Accepts number or string with $= prefix for runtime evaluation) (Default: 128)",
          "default" : 128
        }
      }
    }
  },
  "required" : [
    "inputPaths"
  ],
  "additionalProperties" : false
}
# 🔟❎ 'compile' symbol scanner advanced config
# Configure input and output options for the compile pipeline.
# To learn more see https://doc.log10x.com/compile/scan/
# Set the 10x pipeline to 'compile'
tenx: compile
inputPaths: []
# ============================ Advanced options ===============================
# ----------------------------- Tokenizer options -----------------------------
# Tokenization options provide control over how the symbol scanners parse source and config file input.
# To learn more https://doc.log10x.com/compile/scan/#tokenize
# 'tokenDelims' defines which characters break a string of event characters into tokens to
# classify as symbols or variables.
tokenDelims: "<>|. ][:-+/*=\\,{}_();'\"$\t\n@"
# 'fileExtStringFormats' controls which characters identify the start of a format specifier (https://en.wikipedia.org/wiki/Printf#Format_specifier).
fileExtStringFormats:
- .java:%
- .py:{
- .js:${
- .php:$
- .swift:\(
- .kt:$
- .rb:#{
- .ex:#{
- .c:%
- .cpp:%
- .cs:{
- .go:%
- .rs:{
- .ts:${
- .pl:$
- .sh:$
- .r:%
- .m:%
- .scala:$
maxSymbol:
# 'unitsPerToken' controls the maximum number of symbol units to load from the
# pipeline's symbol library when searching for the origin of a specific TenXTemplate symbol.
unitsPerToken: 128
# ----------------------------- Subprocess options ----------------------------
# Subprocess options control whether scanners execute in the current 10x Engine instance
# or via a sub-process to provide better isolation over parsing of complex source code ASTs.
# To learn more https://doc.log10x.com/compile/scan/#subprocess_1
# 'unitsToScanPerBatch' sets the maximum number of input files to scan per subprocess instance.
# When this threshold is exceeded the current subprocess terminated and a new one is spawned until scan is complete.
unitsToScanPerBatch: 200
scan:
# ----------------------------- Threading options ----------------------------
# 'threadPoolSize'controls the maximum number of threads to concurrently
# process input source/binary files. If the value is between 0 and 1, the number calculates
# a percentage of available cores (e.g., 0.5 = use up to 50% of available cores).
# If the value is >= 1, the value sets a fixed number of threads (e.g., 10 = 10 threads).
# If the value is 1, allocate a single thread to process input files.
# To learn more see https://doc.log10x.com/compile/scan/#parallel
threadPoolSize: "0.5"
# 'scanUnitTimeout' sets the timeout interval when scanning a source/binary input
# file before dropping it.
unitTimeout: 30s
# 'scanOperationTimeout' sets the timeout interval for the entire scanning operation
# before terminating it.
operationTimeout: 10m
# -------------------------------- Debug Option -------------------------------
# The settings below provide control over what information is logged
# to console/file when scanning input source code/text/binary files for symbol values.
# These options are primarily used when adding support for a new programming language
# syntax via the 'ANTLR' scanner, text file formats via the Jackson 'text' scanner
# or an external command using the 'executable' scanner.
debug:
# 'origins' lists names of input source/binary files
# (e.g. .class, .js, .yaml, .exe) for which debug information
# is printed to the console via 'consoleOut' log4j logger.
# Printed output is dependent on the scanner assigned to process the input file
# and usually includes the AST of the input file from which tokens are selected.
# This information can be used when debugging which symbols are extracted
# from an input file in the context of an 'antlr', 'text' or 'executable' scanner.
# Set to '*' to match all input files.
origins: [
# '*'
]
# 'symbols' lists the values of nodes within the AST of a
# target input file (e.g. MyClass, MyEnum, myFunc, ..) for which
# debug information is printed to the console via the 'consoleOut' logger.
# This includes source context (e.g. class, enum) and path within the AST
# in which the node was detected. Set to '*' to match all AST nodes.
symbols: [
#'*'
]
# 'inProcess' controls whether to scan source/binary input files from within the current
# 10x process or launch a sequential series subprocesses for better isolation
# inProcess: true
# 'force' enables scanning of source/binary input files even if a matching
# symbol unit was found in 'inputPaths'. This flag is useful when debugging
# an Antlr/text/exec scanner using 'debugOrigins' and 'debugSymbols'
# to ensure a scan takes place even if a matching symbol unit is found.
# force: false
Options
Specify the options below to configure the Input Scanner:
| Name | Description | Category |
|---|---|---|
| inputPaths | Input folders to scan | Input |
| outputSymbolFolder | Location of output symbol folder | Output |
| scanForce | Control whether to scan input files even if a matching existing symbol file exists | Output |
| maxSymbolUnitSectionSize | Maximum number of symbol tokens to store in a single symbol unit file | Output |
| scanInProcess | Controls whether to scan source/binary input files using sub-process(es) | Subprocess |
| printOutput | Controls whether to emit output scan operations progress to the console | Subprocess |
| unitsToScanPerBatch | Maximum number of input files to scan per sub-process. | Subprocess |
| scanThreadPoolSize | Maximum number of threads to use concurrently process input source code /binary files | Parallel |
| scanUnitTimeout | Timeout interval for scanning a source code /binary input file | Parallel |
| scanOperationTimeout | Timeout interval for the entire scanning operation | Parallel |
| tokenDelims | Token delimiters | Tokenize |
| fileExtStringFormats | String format specifier characters | Tokenize |
| maxSymbolUnitsPerToken | Number of symbol units to retrieve when searching for the origin of an TenXTemplate symbol sequence | Tokenize |
| scanDebugOrigins | List of input file names for which to log information | Debug |
| scanDebugSymbols | List of node types within a scanner AST tree for which to log information | Debug |
| scanDebugLoggerName | Debug logger name | Debug |
Input
inputPaths
Input folders to scan.
| Type | Required | Category |
|---|---|---|
| List | ✔ | Input |
Provides a list of folders that are traversed in search of source code (e.g. .js, .cpp), binary (e.g., .so, .jar, .zip) and text (e.g., .json, .xml) files to capture symbol values from and output to symbol unit files. Input folders may contain files and folders pulled from GitHub or and Artifactory.
Output
outputSymbolFolder
Location of output symbol folder.
| Type | Default | Category |
|---|---|---|
| File | Output |
Specifies the root folder in which to write output symbol unit files.
This compiler scan phase mirrors the structure of the inputPaths argument, placing each
output symbol unit in the same relative path under outputSymbolFolder
as its source/binary input file under its respective inputPaths folder entry.
For example, if inputPaths is ~/dev and it contains the file ~/dev/app/foo.js,
and this value is ~/symbols, the respective output symbol unit file is: ~/symbols/app/foo.js.10x.json.
scanForce
Control whether to scan input files even if a matching existing symbol file exists.
| Type | Default | Category |
|---|---|---|
| Boolean | false | Output |
Controls whether to reuse matching symbol files within inputPaths. This option is primarily useful when debugging a new scanner module configuration (e.g., ANTLR grammar) in conjunction with the scanner debugging options.
maxSymbolUnitSectionSize
Maximum number of symbol tokens to store in a single symbol unit file.
| Type | Default | Category |
|---|---|---|
| Number | 500 | Output |
Controls the maximum number of symbol tokens to store in a symbol unit before splitting it.
This option supports large source code /binary input files that may have thousands of text symbols to avoid
creating sizeable individual symbol unit files.
Set to null to ignore.
Subprocess
scanInProcess
Controls whether to scan source/binary input files using sub-process(es).
| Type | Default | Category |
|---|---|---|
| Boolean | false | Subprocess |
The scanner can launch subprocesses to capture symbol information from specific target source/binary input files.
Generating input file ASTs to scan for symbols values for complex syntaxes can be a lengthy, resource-consuming operation, which in the case of ANTLR syntaxes may timeout or run out of memory.
Process isolation allows for terminating scan operations that timeout for specific files, while allowing others to proceed.
printOutput
Controls whether to emit output scan operations progress to the console.
| Type | Default | Category |
|---|---|---|
| Boolean | false | Subprocess |
Controls whether information relating to symbol units scanning is emitted to the stdout stream. This value should be set to true when the current 10x process is spawned by a parent 10x 'compile' pipeline to report its progress.
unitsToScanPerBatch
Maximum number of input files to scan per sub-process.
| Type | Default | Category |
|---|---|---|
| Number | 40 | Subprocess |
Sets the maximum number of source/binary input files to scan by a single 'compile' sub-process.
Set to 0 to unlimited.
Parallel
scanThreadPoolSize
Maximum number of threads to use concurrently process input source code /binary files.
| Type | Default | Category |
|---|---|---|
| String | 0.5 | Parallel |
Controls the maximum number of threads to process input source code/binary files concurrently:
- If the value is between 0 and 1, the number calculates a percentage of available cores (e.g., 0.5 = use up to 50% of available cores).
- If the value is >= 1, the value sets a fixed number of threads (e.g., 10 = 10 threads).
- If the value is 1, allocate a single thread to process input files.
scanUnitTimeout
Timeout interval for scanning a source code /binary input file.
| Type | Default | Category |
|---|---|---|
| String | 20s | Parallel |
Sets the timeout interval when scanning a source code /binary input file before dropping it. Set to null to ignore.
scanOperationTimeout
Timeout interval for the entire scanning operation.
| Type | Default | Category |
|---|---|---|
| String | 10m | Parallel |
Sets the timeout interval for the entire scanning operation before terminating it.
Set to null to ignore.
Tokenize
tokenDelims
Token delimiters.
| Type | Default | Category |
|---|---|---|
| String | \<>|. ][:-+/*=\\,{}_();'\"$\t\n@ | Tokenize |
Defines which characters break a string of event characters into tokens to classify as symbols or variables.
fileExtStringFormats
String format specifier characters.
| Type | Default | Category |
|---|---|---|
| List | [] | Tokenize |
Controls which characters identify the start of a format specifier
for each programming language extension in the form of .java:%).
It is a standard programming practice to use string formats to log messages in the form of:
When an TenXTemplate tokenizer processing an event identifies a format prefix (e.g., %)
it skips it and appends subsequent symbol tokens (e.g., with status)
to the symbol sequence preceding them (e.g., could not connect to).
maxSymbolUnitsPerToken
Number of symbol units to retrieve when searching for the origin of an TenXTemplate symbol sequence.
| Type | Default | Category |
|---|---|---|
| Number | 128 | Tokenize |
Controls the maximum number of symbol units to load from the pipeline's symbol library when searching for the origin of a specific TenXTemplate symbol.
The symbolSequence function queries the source code/binary origin of symbol sequences in a target TenXTemplate to identify the logical message portion of a target app/infra events vs. variable and context information (e.g., user, host, severity).
Since a symbol value may appear in hundreds or more locations across a code base, limiting the number of symbol units to load is necessary to reduce memory consumption.
A second consideration is placing an upper limit on the number of units to load from the symbol library, which is the more frequent a symbol is, the lower the probability of selecting the correct origin.
Debug
scanDebugOrigins
List of input file names for which to log information.
| Type | Default | Category |
|---|---|---|
| List | [] | Debug |
Specifies a list of input file names (e.g., my.scala, foo.java) to log information relating to which symbols were captured/skipped.
Specify '*' for all files.
scanDebugSymbols
List of node types within a scanner AST tree for which to log information.
| Type | Default | Category |
|---|---|---|
| List | [] | Debug |
List of nodes within an input AST tree, such as class and method names(e.g., 'MyClass', 'foo') for which to log information relating to the context in which they were captured/skipped as symbols.
Specify '*' for all nodes.
scanDebugLoggerName
Debug logger name.
| Type | Default | Category |
|---|---|---|
| String | consoleOut | Debug |
Specifies the log4j logger for logging values for symbols/files matching 'debugOrigins' and 'debugSymbols'.
This unit is defined in scan/unit.yaml.