Configuration Reference
Configuration Reference¶
This page documents all configuration types and their defaults across all languages.
DataAttribute¶
An XML-style attribute attached to an Element node.
Populated only for DataNodeKind.Element; always empty for KeyValue and
Sequence nodes.
| Field | Type | Default | Description |
|---|---|---|---|
name |
str |
— | Attribute name (e.g. "class", "href"). |
value |
str |
— | Attribute value as a raw string (quotes stripped). |
span |
Span |
— | Source span covering the entire name="value" attribute token. |
DataNode¶
A node in the hierarchical data tree produced by data-format extraction.
When ProcessConfig.data_extraction is
True, ProcessResult.data is populated with a root DataNode whose
children mirror the structure of the parsed file.
The kind field determines which other fields are meaningful:
kind |
key |
value |
attributes |
children |
|---|---|---|---|---|
KeyValue |
key / mapping key / index | leaf value | empty | nested map |
Element |
XML tag name | text content | XML attrs | child elements |
Sequence |
positional index ("0") |
leaf value | empty | sub-items |
| Field | Type | Default | Description |
|---|---|---|---|
kind |
DataNodeKind |
DataNodeKind.KEY_VALUE |
Whether this node is a key/value pair, XML element, or sequence item. |
key |
str \| None |
None |
Key, attribute name, tag name, or positional index ("0", "1", …). None at the document root. |
value |
str \| None |
None |
Leaf scalar value, if any. None for containers (objects, arrays, XML elements with child elements). |
attributes |
list\[DataAttribute\] |
\[\] |
Attributes on element-shape nodes (XML STag attributes). Empty for all other kinds. |
children |
list\[DataNode\] |
\[\] |
Children for nested containers and XML element bodies. |
span |
Span |
— | Source span covering this node in the original source file. |
Span¶
Byte and line/column range in source code.
Represents both byte offsets (for slicing) and human-readable line/column positions (for display and diagnostics).
| Field | Type | Default | Description |
|---|---|---|---|
start_byte |
int |
— | Inclusive start byte offset in the source. |
end_byte |
int |
— | Exclusive end byte offset in the source. |
start_line |
int |
— | Zero-indexed line number of the span's start. |
start_column |
int |
— | Zero-indexed column number of the span's start. |
end_line |
int |
— | Zero-indexed line number of the span's end. |
end_column |
int |
— | Zero-indexed column number of the span's end. |
ProcessResult¶
Complete analysis result from processing a source file.
Contains metrics, structural analysis, imports/exports, comments,
docstrings, symbols, diagnostics, and optionally chunked code segments.
Fields are populated based on the ProcessConfig flags.
| Field | Type | Default | Description |
|---|---|---|---|
language |
str |
— | The language name used to parse the source file. |
metrics |
FileMetrics |
— | File-level metrics (line counts, byte size, error count). |
structure |
list\[StructureItem\] |
\[\] |
Top-level structural items (functions, classes, etc.). |
imports |
list\[ImportInfo\] |
\[\] |
Import statements extracted from the source. |
exports |
list\[ExportInfo\] |
\[\] |
Export statements extracted from the source. |
comments |
list\[CommentInfo\] |
\[\] |
Comments extracted from the source. |
docstrings |
list\[DocstringInfo\] |
\[\] |
Docstrings extracted from the source. |
symbols |
list\[SymbolInfo\] |
\[\] |
Symbol definitions (variables, types, functions) extracted from the source. |
diagnostics |
list\[Diagnostic\] |
\[\] |
Parse diagnostics (syntax errors, missing nodes) from tree-sitter. |
chunks |
list\[CodeChunk\] |
\[\] |
Syntax-aware code chunks produced when chunking is enabled. |
data |
DataNode \| None |
None |
Hierarchical data tree extracted when config.data_extraction is True. Populated for supported data-format languages (JSON, YAML, TOML, properties, HCL, INI, XML, CSV, and more). None when data_extraction is False (the default) or when the language is not a recognised data format. See DataNode for the shape of the returned tree. |
FileMetrics¶
Aggregate metrics for a source file.
| Field | Type | Default | Description |
|---|---|---|---|
total_lines |
int |
— | Total number of lines (including blank and comment lines). |
code_lines |
int |
— | Number of lines containing non-blank, non-comment source code. |
comment_lines |
int |
— | Number of lines that are entirely comments. |
blank_lines |
int |
— | Number of blank (whitespace-only) lines. |
total_bytes |
int |
— | Total byte length of the source file. |
node_count |
int |
— | Total number of nodes in the syntax tree. |
error_count |
int |
— | Number of error nodes in the syntax tree (parse errors). |
max_depth |
int |
— | Maximum nesting depth reached in the syntax tree. |
StructureItem¶
A structural item (function, class, struct, etc.) in source code.
| Field | Type | Default | Description |
|---|---|---|---|
kind |
StructureKind |
StructureKind.FUNCTION |
The kind of structural item. |
name |
str \| None |
None |
The declared name of the item, if present. |
visibility |
str \| None |
None |
Visibility modifier (e.g., "pub", "public", "private"). |
span |
Span |
— | Source span covering the entire item declaration. |
children |
list\[StructureItem\] |
\[\] |
Nested structural items (e.g., methods within a class). |
decorators |
list\[str\] |
\[\] |
Decorator or attribute names applied to the item. |
doc_comment |
str \| None |
None |
Documentation comment attached to the item, if any. |
signature |
str \| None |
None |
Full signature text of the item (e.g., function parameters and return type). |
body_span |
Span \| None |
None |
Source span covering only the body of the item, if distinct from the declaration. |
CommentInfo¶
A comment extracted from source code.
| Field | Type | Default | Description |
|---|---|---|---|
text |
str |
— | The raw text content of the comment. |
kind |
CommentKind |
CommentKind.LINE |
The kind of comment (line, block, or doc). |
span |
Span |
— | Source span covering the comment. |
associated_node |
str \| None |
None |
Name of the syntax node this comment is directly associated with. |
DocstringInfo¶
A docstring extracted from source code.
| Field | Type | Default | Description |
|---|---|---|---|
text |
str |
— | The raw text of the docstring. |
format |
DocstringFormat |
DocstringFormat.PYTHON_TRIPLE_QUOTE |
The docstring format (Python, JSDoc, Rustdoc, etc.). |
span |
Span |
— | Source span covering the docstring. |
associated_item |
str \| None |
None |
Name of the item this docstring documents. |
parsed_sections |
list\[DocSection\] |
\[\] |
Parsed sections of the docstring (Args, Returns, Raises, etc.). |
DocSection¶
A section within a docstring (e.g., Args, Returns, Raises).
| Field | Type | Default | Description |
|---|---|---|---|
kind |
str |
— | Section kind (e.g., "args", "returns", "raises"). |
name |
str \| None |
None |
Parameter or return value name, if applicable. |
description |
str |
— | Description text for this section. |
ImportInfo¶
An import statement extracted from source code.
| Field | Type | Default | Description |
|---|---|---|---|
source |
str |
— | The module or path being imported from. |
items |
list\[str\] |
\[\] |
Specific names imported from the source module. |
alias |
str \| None |
None |
Alias assigned to the import (e.g., import numpy as np). |
is_wildcard |
bool |
— | Whether this is a wildcard import (e.g., import * or use foo.*). |
span |
Span |
— | Source span covering the import statement. |
ExportInfo¶
An export statement extracted from source code.
| Field | Type | Default | Description |
|---|---|---|---|
name |
str |
— | The exported name. |
kind |
ExportKind |
ExportKind.NAMED |
The kind of export (named, default, or re-export). |
span |
Span |
— | Source span covering the export statement. |
SymbolInfo¶
A symbol (variable, function, type, etc.) extracted from source code.
| Field | Type | Default | Description |
|---|---|---|---|
name |
str |
— | The name of the symbol. |
kind |
SymbolKind |
SymbolKind.VARIABLE |
The kind of symbol (variable, function, class, etc.). |
span |
Span |
— | Source span covering the symbol definition. |
type_annotation |
str \| None |
None |
Explicit type annotation, if present in the source. |
doc |
str \| None |
None |
Documentation comment associated with this symbol. |
Diagnostic¶
A diagnostic (syntax error, missing node, etc.) from parsing.
| Field | Type | Default | Description |
|---|---|---|---|
message |
str |
— | Human-readable description of the diagnostic. |
severity |
DiagnosticSeverity |
DiagnosticSeverity.ERROR |
Severity of the diagnostic. |
span |
Span |
— | Source span where the diagnostic was detected. |
CodeChunk¶
A chunk of source code with rich metadata.
| Field | Type | Default | Description |
|---|---|---|---|
content |
str |
— | The raw source text of this chunk. |
start_byte |
int |
— | Inclusive start byte offset of this chunk in the original source. |
end_byte |
int |
— | Exclusive end byte offset of this chunk in the original source. |
start_line |
int |
— | Zero-indexed start line of this chunk. |
end_line |
int |
— | Zero-indexed end line of this chunk. |
metadata |
ChunkContext |
— | Contextual metadata about this chunk. |
ChunkContext¶
Metadata for a single chunk of source code.
| Field | Type | Default | Description |
|---|---|---|---|
language |
str |
— | Language name used to parse this chunk. |
chunk_index |
int |
— | Zero-indexed position of this chunk within the file's chunk list. |
total_chunks |
int |
— | Total number of chunks the file was split into. |
node_types |
list\[str\] |
\[\] |
Tree-sitter node kinds that appear at the top level of this chunk. |
context_path |
list\[str\] |
\[\] |
Hierarchical path of enclosing structural items (e.g., \["MyClass", "my_method"\]). |
symbols_defined |
list\[str\] |
\[\] |
Names of symbols defined within this chunk. |
comments |
list\[CommentInfo\] |
\[\] |
Comments contained within this chunk. |
docstrings |
list\[DocstringInfo\] |
\[\] |
Docstrings contained within this chunk. |
has_error_nodes |
bool |
— | Whether this chunk contains any tree-sitter error nodes. |
PackConfig¶
Configuration for the tree-sitter language pack.
Controls cache directory and which languages to pre-download. Can be loaded from a TOML file, constructed programmatically, or passed as a dict/object from language bindings.
| Field | Type | Default | Description |
|---|---|---|---|
cache_dir |
str \| None |
None |
Override default cache directory. Default: ~/.cache/tree-sitter-language-pack/v{version}/libs/ |
languages |
list\[str\] \| None |
\[\] |
Languages to pre-download on init. Each entry is a language name (e.g. "python", "rust"). |
groups |
list\[str\] \| None |
\[\] |
Language groups to pre-download (e.g. "web", "systems", "scripting"). |
ProcessConfig¶
Configuration for the process() function.
Controls which analysis features are enabled and whether chunking is performed.
| Field | Type | Default | Description |
|---|---|---|---|
language |
str |
— | Language name (required). |
structure |
bool |
True |
Extract structural items (functions, classes, etc.). Default: true. |
imports |
bool |
True |
Extract import statements. Default: true. |
exports |
bool |
True |
Extract export statements. Default: true. |
comments |
bool |
False |
Extract comments. Default: false. |
docstrings |
bool |
False |
Extract docstrings. Default: false. |
symbols |
bool |
False |
Extract symbol definitions. Default: false. |
diagnostics |
bool |
False |
Include parse diagnostics. Default: false. |
chunk_max_size |
int \| None |
None |
Maximum chunk size in bytes. None disables chunking. |
data_extraction |
bool |
False |
Extract hierarchical key/value data tree from data-format files. Default: false. When True, ProcessResult.data is populated with a DataNode tree for supported languages: JSON, YAML, TOML, .properties, HCL/HOCON, INI, editorconfig, KDL, CUE, CSV, PSV, PO, nginx config, Caddy config, XML, and DTD. For languages outside this set the field is left as None. |
Enums¶
CommentKind¶
The kind of a comment found in source code.
Distinguishes between single-line comments, block (multi-line) comments, and documentation comments.
| Variant | Description |
|---|---|
Line |
A single-line comment (e.g., // ... or # ...). |
Block |
A block or multi-line comment using slash-star delimiters. |
Doc |
A documentation comment such as /// ... or slash-double-star block. |
DataNodeKind¶
The kind of a data node extracted from a data-format file.
Classifies each node in the hierarchical DataNode tree returned when
data_extraction is enabled on ProcessConfig.
Wire format (public JSON contract)¶
Unit variants serialize as a bare string ("KeyValue"). DO NOT add
#[serde(tag = "...")] or rename variants — every language binding has a
hand-written deserializer matching this exact shape, and any change breaks
all bindings' process() tests simultaneously.
Covered by tests/wire_format.rs.
| Variant | Description |
|---|---|
KeyValue |
A key/value pair or mapping (json/toml/properties/yaml/hcl/cue/kdl pair, or a wrapper "object"/"mapping" container). |
Element |
An XML element with a tag name in key and attributes in attributes. |
Sequence |
A positional sequence item (JSON array element, YAML block sequence item, CSV/PSV row or cell). |
DiagnosticSeverity¶
Severity level of a diagnostic produced during parsing.
Used to classify parse errors, warnings, and informational messages found in the syntax tree.
| Variant | Description |
|---|---|
Error |
A parse error (e.g., an ERROR or MISSING node in the tree). |
Warning |
A warning-level diagnostic. |
Info |
An informational diagnostic. |
DocstringFormat¶
The format of a docstring extracted from source code.
Identifies the docstring convention used, which varies by language
(e.g., Python triple-quoted strings, JSDoc, Rustdoc /// comments).
Wire format (public JSON contract)¶
Unit variants serialize as a bare string ("JSDoc"); the Other
variant serializes as a single-keyed object ({"Other": "rst"}). DO
NOT add #[serde(tag = "...")]. Covered by tests/wire_format.rs.
| Variant | Description |
|---|---|
PythonTripleQuote |
Python triple-quoted string docstring ("""..."""). |
JSDoc |
JavaScript/TypeScript JSDoc block comment (opens with two stars, closes with star-slash). |
Rustdoc |
Rust /// or //! doc comment. |
GoDoc |
Go doc comment (a comment block immediately preceding a declaration). |
JavaDoc |
Java Javadoc block comment (opens with two stars, closes with star-slash). |
Other |
A language-specific docstring format not covered by the standard variants. — Fields: _0: String |
ExportKind¶
The kind of an export statement found in source code.
Covers named exports, default exports, and re-exports from other modules.
| Variant | Description |
|---|---|
Named |
A named export (e.g., export { foo }). |
Default |
A default export (e.g., export default foo). |
ReExport |
A re-export from another module (e.g., export { foo } from 'bar'). |
StructureKind¶
The kind of structural item found in source code.
Categorizes top-level and nested declarations such as functions, classes,
structs, enums, traits, and more. Use Other for
language-specific constructs that do not fit a standard category.
Wire format (public JSON contract)¶
Unit variants serialize as a bare string ("Function"); the Other
variant serializes as a single-keyed object ({"Other": "macro"}). DO
NOT add #[serde(tag = "...")] or rename variants — every language
binding has a hand-written deserializer matching this exact shape, and
any change breaks all bindings' process() tests simultaneously.
Covered by tests/wire_format.rs.
| Variant | Description |
|---|---|
Function |
A free-standing or associated function. |
Method |
A method defined inside a class, struct, trait, or impl block. |
Class |
A class definition. |
Struct |
A struct definition. |
Interface |
An interface or protocol definition. |
Enum |
An enum definition. |
Module |
A module or package declaration. |
Trait |
A trait definition. |
Impl |
An impl block (Rust) or similar implementation block. |
Namespace |
A namespace declaration. |
Other |
A language-specific construct that does not fit any standard category. — Fields: _0: String |
SymbolKind¶
The kind of a symbol definition found in source code.
Categorizes symbol definitions such as variables, constants, functions, classes, types, interfaces, enums, and modules.
Wire format (public JSON contract)¶
Unit variants serialize as a bare string ("Function"); the Other
variant serializes as a single-keyed object ({"Other": "macro"}). DO
NOT add #[serde(tag = "...")]. Covered by tests/wire_format.rs.
| Variant | Description |
|---|---|
Variable |
A variable binding. |
Constant |
A constant (immutable binding). |
Function |
A function definition. |
Class |
A class definition. |
Type |
A type alias or typedef. |
Interface |
An interface definition. |
Enum |
An enum definition. |
Module |
A module declaration. |
Other |
A symbol kind not covered by the standard variants. — Fields: _0: String |