Skip to content

ANTLR Language

Defines language configurations for scanning symbols using ANTLR grammars.

Specify a grammar or parser to produce an AST for symbol extraction via configurable rules.

Options

Specify the options below to configure multiple ANTLR language:

Name Description Category
antlrLang Programming language name General
antlrFileExt Language file extensions General
antlrRootRule Syntax root rule General
antlrGrammarFile ANTLR .g4 grammar file G4 grammar
antlrLexerFile ANTLR .g4 lexer file G4 grammar
antlrParserClass Java parser class name Compiled grammar
antlrLexerClass Java lexer class name Compiled grammar
antlrTokenStreamClass Java token stream class name Compiled grammar
antlrCharStreamClass ANTLR CharStream factory class name Compiled grammar
antlrReserved Language-specific keywords Filter
antlrLineFilters Regex patterns for skipping input lines Filter
antlrMaxLineLength Max line length to parse Filter

General

antlrLang

Programming language name.

Type Required Category
String General

Defines a logical name of the programming language processed by this scanner (e.g., 'cpp', 'go').

antlrFileExt

Language file extensions.

Type Required Category
List General

Defines the file extensions to be scanned by this ANTLR syntax (e.g., '.cpp', '.js').

antlrRootRule

Syntax root rule.

Type Required Category
String General

Provides the root grammar rule of the ANTLR-generated parser class for iterating over the root of the AST structure To learn more see ANTLR start rules.

G4 grammar

antlrGrammarFile

ANTLR .g4 grammar file.

Type Default Category
String "" G4 grammar

Provides an ANTLR grammar file that is used directly to parse source files for the lang. This value takes precedence over a 'parserClass' value. This option enables the parsing of source files without requiring a pre-compile step of the grammar into a .java for grammars that do not have language-specific extensions If this value is not specified, 'antlrParserClass' must be set.

antlrLexerFile

ANTLR .g4 lexer file.

Type Default Category
String "" G4 grammar

Provides an optional ANTLR lexer file to use with 'antlrGrammarFile'.

Compiled grammar

antlrParserClass

Java parser class name.

Type Default Category
String "" Compiled grammar

Provides the fully qualified name Java class name of the ANTLR-generated parser class that will be used to parse the token stream from the input source. This option applies in cases where the ANTLR syntax requires additional Java code to parse target sources such as custom parser, lexer, or token stream classes

To learn more https://ocw.mit.edu/ans7870/6/6.005/s16/classes/18-parser-generators/ and https://www.baeldung.com/java-antlr#1-prepare-a-grammar-file For an example implementation, see com.log10x.antlr.generated.cpp.CPP14Parser

If this value is not specified, 'antlrGrammarFile' must be set.

antlrLexerClass

Java lexer class name.

Type Default Category
String "" Compiled grammar

Provides the fully qualified name Java class name of the ANTLR-generated lexer class that will be used to draw input symbols from a character stream. For an example implementation, see com.log10x.antlr.generated.cpp.CPP14Lexer.

antlrTokenStreamClass

Java token stream class name.

Type Default Category
String "" Compiled grammar

Provides an optional Java class name for the ANTLR-generated token stream to bridge between the lexer and parser. This option supports ANTLR syntaxes that require a custom token stream.

The class must be a sub-class of ANTLR CommonTokenStream

The token stream class must define a constructor that receives an ANTLR TokenSource reference and a millisecond timeout:

public MyTokenStream(TokenSource tokenSource, long timeout) {...}   

If 'timeout' is exceeded from the moment of instantiation to a subsequent call to the stream's 'seek' or 'consume' methods, the stream should throw an exception to indicate a timeout to halt the scanning for the current source file.

antlrCharStreamClass

ANTLR CharStream factory class name.

Type Default Category
String CharStreams Compiled grammar

Provides a fully qualified Java class name that acts as a factory for CharStream instances by declaring the following method:

public static CharStream fromStream(InputStream is) {...}

If not specified, defaults to CharStreams.fromStream.

This option provides a mechanism for preprocessing input code prior to parsing it using the target ANTLR grammar lexer/parsers.

Filter

antlrReserved

Language-specific keywords.

Type Default Category
List [] Filter

Provides a list of symbols specifying reserved words in the target language syntax (e.g., 'class', 'int'..) to be skip over when scanning for symbol values.

antlrLineFilters

Regex patterns for skipping input lines.

Type Default Category
List [] Filter

Specifies a list of patterns for filtering matching input code lines. This options can be used to filter out comments as well as language constructs unsupported by the target grammar. If a pattern in the list matches a line it is skipped.

antlrMaxLineLength

Max line length to parse.

Type Default Category
Number 0 Filter

Specifies a max number of chars an input line may contain for it to be parsed. This option provides a method for skipping minified code (e.g., .js) which may slow down the ANTLR parser.


This module is defined in langs/module.yaml.