Code Intelligence
The process function goes beyond raw syntax trees. It parses source, then the Rust core walks the AST to extract structured information useful for code analysis, search, documentation, and LLM ingestion. Bundled query helpers return query source strings; arbitrary query execution is left to host-language tree-sitter APIs.
ProcessConfig¶
All intelligence extraction is opt-in via ProcessConfig. Enable what you need:
from tree_sitter_language_pack import ProcessConfig
config = ProcessConfig(
language="python",
structure=True, # functions, classes, methods
imports=True, # import statements
exports=True, # exported symbols
comments=True, # inline comments
docstrings=True, # docstring extraction
symbols=True, # all identifiers
diagnostics=True, # syntax errors / error nodes
# chunk_max_size=1000 # uncomment to enable chunking
)
Use .all() in Rust or ProcessConfig.all("python") in Python to enable everything at once.
ProcessResult Fields¶
structure - Functions, Classes, and Methods¶
A list of top-level code constructs with their names, kinds, ranges, and optionally their docstrings.
for item in result.structure:
print(item.kind) # "function" | "class" | "method" | "interface" | ...
print(item.name) # "greet"
print(item.start_line) # 3
print(item.end_line) # 6
print(item.docstring) # "Greet a user by name." (if docstrings=True)
Supported kinds vary by language:
| Kind | Languages |
|---|---|
function |
All languages |
class |
Python, JS/TS, Java, C#, Ruby, PHP, Kotlin, … |
method |
Same as class |
interface |
TypeScript, Java, C#, Go, Kotlin, … |
struct |
Rust, Go, C, C++, C#, … |
impl |
Rust |
module |
Elixir, Ruby, Rust, … |
enum |
Rust, Java, C#, TypeScript, Kotlin, … |
trait |
Rust |
type_alias |
TypeScript, Rust |
decorator |
Python, TypeScript |
imports - Import Statements¶
All import declarations with their source module and imported names.
for imp in result.imports:
print(imp.source) # "os" or "pathlib"
print(imp.names) # ["path", "getcwd"] (empty = wildcard or bare import)
print(imp.start_line)
Example output as JSON:
[
{ "source": "os", "names": [], "start_line": 1 },
{ "source": "pathlib", "names": ["Path"], "start_line": 2 },
{ "source": "./utils", "names": ["readFile", "writeFile"], "start_line": 3 }
]
exports — Exported Symbols¶
Symbols that are part of the module's public API.
for exp in result.exports:
print(exp.name) # "readFile"
print(exp.kind) # "function" | "class" | "const" | ...
!!! Note Export detection is language-specific. For Python, everything defined at module level counts as exported unless prefixed with _. For JavaScript/TypeScript, explicit export declarations determine what the module exposes.
comments - Inline Comments¶
All comments in the file with their text and location.
for comment in result.comments:
print(comment.text) # "// TODO: handle edge case"
print(comment.start_line) # 42
print(comment.is_block) # False
docstrings - Documentation Strings¶
Docstrings appear under their parent construct in structure. When docstrings=True, each structure item gains a docstring field:
func = result.structure[0]
print(func.docstring)
# "Read and return the contents of a file.\n\nArgs:\n path: Path to the file."
Docstring extraction understands language-specific conventions:
| Language | Convention |
|---|---|
| Python | """...""" triple-quoted string immediately after def/class |
| Rust | /// or //! doc comments above item |
| JavaScript/TypeScript | /** ... */ JSDoc block above function |
| Java | /** ... */ Javadoc block above method/class |
| Ruby | # ... lines immediately above def/class |
| Go | // FuncName ... comment block above func |
| Elixir | @doc "..." or @moduledoc "..." |
symbols - All Identifiers¶
A deduplicated list of all identifiers referenced in the file, useful for search indexing.
diagnostics - Syntax Errors¶
Tree-sitter produces partial trees for malformed code, marking error nodes. diagnostics surfaces these:
for error in result.diagnostics:
print(error.message) # "Unexpected token"
print(error.start_line)
print(error.start_col)
!!! Tip A non-empty diagnostics list does not mean the file is unparsable — tree-sitter recovers and continues. Use it to detect broken syntax rather than to gate parsing.
chunks - Syntax-Aware Splits¶
When chunk_max_size > 0, the chunks field contains the file split into byte-budget segments. See Chunking for LLMs for full documentation.
for chunk in result.chunks:
print(chunk.content) # the source code text
print(chunk.start_byte) # start byte offset
print(chunk.end_byte) # end byte offset
print(chunk.start_line) # first line of chunk
print(chunk.end_line) # last line of chunk
print(chunk.node_types) # ["function_definition", "class_definition"]
metrics - File-Level Statistics¶
Basic metrics about the file:
m = result.metrics
print(m.total_lines) # 120
print(m.code_lines) # 95 (non-blank, non-comment lines)
print(m.comment_lines) # 18
print(m.blank_lines) # 7
print(m.max_depth) # maximum nesting depth of the syntax tree
Full Example¶
from tree_sitter_language_pack import process, ProcessConfig
source = '''
import os
from pathlib import Path
from typing import Optional
def read_file(path: str, encoding: str = "utf-8") -> Optional[str]:
"""Read and return the contents of a file.
Args:
path: Path to the file to read.
encoding: File encoding. Defaults to utf-8.
Returns:
File contents, or None if the file doesn't exist.
"""
p = Path(path)
if not p.exists():
return None
return p.read_text(encoding=encoding)
class FileCache:
"""In-memory cache for file contents."""
def __init__(self, root: str):
self._root = root
self._cache: dict[str, str] = {}
def get(self, name: str) -> Optional[str]:
if name not in self._cache:
self._cache[name] = read_file(os.path.join(self._root, name))
return self._cache[name]
'''
config = ProcessConfig(
language="python",
structure=True,
imports=True,
docstrings=True,
comments=True,
diagnostics=True,
)
result = process(source, config)
# Structure
for item in result.structure:
print(f"{item.kind:12} {item.name:20} lines {item.start_line}-{item.end_line}")
# Output:
# function read_file lines 6-20
# class FileCache lines 22-33
# method __init__ lines 26-28
# method get lines 30-33
# Imports
for imp in result.imports:
names = ", ".join(imp.names) or "*"
print(f"from {imp.source} import {names}")
# Output:
# from os import *
# from pathlib import Path
# from typing import Optional
# Docstrings
func = result.structure[0]
print(f"\n{func.name} docstring:\n{func.docstring}")
# Metrics
m = result.metrics
print(f"\nLines: {m.total_lines} total, {m.code_lines} code, {m.comment_lines} comments")
Custom Queries¶
Custom query execution helpers are not part of the v1.9 public API. Use get_highlights_query, get_injections_query, get_locals_query, or get_tags_query Available by v1.9 to retrieve bundled query source, then run host-language tree-sitter query APIs or walk the AST manually when process() fields are not enough.