Code Intelligence

The process function goes beyond raw syntax trees. It parses source, then the Rust core walks the AST to extract structured information useful for code analysis, search, documentation, and LLM ingestion. Bundled query helpers return query source strings; arbitrary query execution is left to host-language tree-sitter APIs.

ProcessConfig¶

All intelligence extraction is opt-in via ProcessConfig. Enable what you need:

PythonNode.jsRust

from tree_sitter_language_pack import ProcessConfig

config = ProcessConfig(
    language="python",
    structure=True,    # functions, classes, methods
    imports=True,      # import statements
    exports=True,      # exported symbols
    comments=True,     # inline comments
    docstrings=True,   # docstring extraction
    symbols=True,      # all identifiers
    diagnostics=True,  # syntax errors / error nodes
    # chunk_max_size=1000  # uncomment to enable chunking
)

import { process } from "@kreuzberg/tree-sitter-language-pack";

const result = await process(source, {
  language: "typescript",
  structure: true,
  imports: true,
  exports: true,
  comments: true,
  docstrings: true,
  symbols: true,
  diagnostics: true,
});

use tree_sitter_language_pack::{process, ProcessConfig};

let config = ProcessConfig::new("rust").all();

let result = process(source, &config)?;

Use .all() in Rust or ProcessConfig.all("python") in Python to enable everything at once.

ProcessResult Fields¶

`structure` - Functions, Classes, and Methods¶

A list of top-level code constructs with their names, kinds, ranges, and optionally their docstrings.

for item in result.structure:
    print(item.kind)       # "function" | "class" | "method" | "interface" | ...
    print(item.name)       # "greet"
    print(item.start_line) # 3
    print(item.end_line)   # 6
    print(item.docstring)  # "Greet a user by name."  (if docstrings=True)

Supported kinds vary by language:

Kind	Languages
`function`	All languages
`class`	Python, JS/TS, Java, C#, Ruby, PHP, Kotlin, …
`method`	Same as class
`interface`	TypeScript, Java, C#, Go, Kotlin, …
`struct`	Rust, Go, C, C++, C#, …
`impl`	Rust
`module`	Elixir, Ruby, Rust, …
`enum`	Rust, Java, C#, TypeScript, Kotlin, …
`trait`	Rust
`type_alias`	TypeScript, Rust
`decorator`	Python, TypeScript

`imports` - Import Statements¶

All import declarations with their source module and imported names.

for imp in result.imports:
    print(imp.source)    # "os"  or  "pathlib"
    print(imp.names)     # ["path", "getcwd"]  (empty = wildcard or bare import)
    print(imp.start_line)

Example output as JSON:

[
  { "source": "os", "names": [], "start_line": 1 },
  { "source": "pathlib", "names": ["Path"], "start_line": 2 },
  { "source": "./utils", "names": ["readFile", "writeFile"], "start_line": 3 }
]

`exports` — Exported Symbols¶

Symbols that are part of the module's public API.

for exp in result.exports:
    print(exp.name)  # "readFile"
    print(exp.kind)  # "function" | "class" | "const" | ...

!!! Note Export detection is language-specific. For Python, everything defined at module level counts as exported unless prefixed with _. For JavaScript/TypeScript, explicit export declarations determine what the module exposes.

`comments` - Inline Comments¶

All comments in the file with their text and location.

for comment in result.comments:
    print(comment.text)       # "// TODO: handle edge case"
    print(comment.start_line) # 42
    print(comment.is_block)   # False

`docstrings` - Documentation Strings¶

Docstrings appear under their parent construct in structure. When docstrings=True, each structure item gains a docstring field:

func = result.structure[0]
print(func.docstring)
# "Read and return the contents of a file.\n\nArgs:\n    path: Path to the file."

Docstring extraction understands language-specific conventions:

Language	Convention
Python	`"""..."""` triple-quoted string immediately after `def`/`class`
Rust	`///` or `//!` doc comments above item
JavaScript/TypeScript	`/** ... */` JSDoc block above function
Java	`/** ... */` Javadoc block above method/class
Ruby	`# ...` lines immediately above `def`/`class`
Go	`// FuncName ...` comment block above func
Elixir	`@doc "..."` or `@moduledoc "..."`

`symbols` - All Identifiers¶

A deduplicated list of all identifiers referenced in the file, useful for search indexing.

print(result.symbols)
# ["os", "Path", "read_file", "FileManager", "base_dir", "get", ...]

`diagnostics` - Syntax Errors¶

Tree-sitter produces partial trees for malformed code, marking error nodes. diagnostics surfaces these:

for error in result.diagnostics:
    print(error.message)    # "Unexpected token"
    print(error.start_line)
    print(error.start_col)

!!! Tip A non-empty diagnostics list does not mean the file is unparsable — tree-sitter recovers and continues. Use it to detect broken syntax rather than to gate parsing.

`chunks` - Syntax-Aware Splits¶

When chunk_max_size > 0, the chunks field contains the file split into byte-budget segments. See Chunking for LLMs for full documentation.

for chunk in result.chunks:
    print(chunk.content)      # the source code text
    print(chunk.start_byte)   # start byte offset
    print(chunk.end_byte)     # end byte offset
    print(chunk.start_line)   # first line of chunk
    print(chunk.end_line)     # last line of chunk
    print(chunk.node_types)   # ["function_definition", "class_definition"]

`metrics` - File-Level Statistics¶

Basic metrics about the file:

m = result.metrics
print(m.total_lines)       # 120
print(m.code_lines)        # 95   (non-blank, non-comment lines)
print(m.comment_lines)     # 18
print(m.blank_lines)       # 7
print(m.max_depth)         # maximum nesting depth of the syntax tree

Full Example¶

from tree_sitter_language_pack import process, ProcessConfig

source = '''
import os
from pathlib import Path
from typing import Optional

def read_file(path: str, encoding: str = "utf-8") -> Optional[str]:
    """Read and return the contents of a file.

    Args:
        path: Path to the file to read.
        encoding: File encoding. Defaults to utf-8.

    Returns:
        File contents, or None if the file doesn't exist.
    """
    p = Path(path)
    if not p.exists():
        return None
    return p.read_text(encoding=encoding)

class FileCache:
    """In-memory cache for file contents."""

    def __init__(self, root: str):
        self._root = root
        self._cache: dict[str, str] = {}

    def get(self, name: str) -> Optional[str]:
        if name not in self._cache:
            self._cache[name] = read_file(os.path.join(self._root, name))
        return self._cache[name]
'''

config = ProcessConfig(
    language="python",
    structure=True,
    imports=True,
    docstrings=True,
    comments=True,
    diagnostics=True,
)
result = process(source, config)

# Structure
for item in result.structure:
    print(f"{item.kind:12} {item.name:20} lines {item.start_line}-{item.end_line}")

# Output:
# function     read_file            lines 6-20
# class        FileCache            lines 22-33
# method       __init__             lines 26-28
# method       get                  lines 30-33

# Imports
for imp in result.imports:
    names = ", ".join(imp.names) or "*"
    print(f"from {imp.source} import {names}")

# Output:
# from os import *
# from pathlib import Path
# from typing import Optional

# Docstrings
func = result.structure[0]
print(f"\n{func.name} docstring:\n{func.docstring}")

# Metrics
m = result.metrics
print(f"\nLines: {m.total_lines} total, {m.code_lines} code, {m.comment_lines} comments")

Custom Queries¶

Custom query execution helpers are not part of the v1.9 public API. Use get_highlights_query, get_injections_query, get_locals_query, or get_tags_query Available by v1.9 to retrieve bundled query source, then run host-language tree-sitter query APIs or walk the AST manually when process() fields are not enough.

Edit this page on GitHub

Code Intelligence

ProcessConfig¶

ProcessResult Fields¶

structure - Functions, Classes, and Methods¶

imports - Import Statements¶

exports — Exported Symbols¶

comments - Inline Comments¶

docstrings - Documentation Strings¶

symbols - All Identifiers¶

diagnostics - Syntax Errors¶

chunks - Syntax-Aware Splits¶

metrics - File-Level Statistics¶