Code Intelligence¶
The process function goes beyond raw syntax trees. It runs tree-sitter queries against the parsed AST to extract structured information useful for code analysis, search, documentation, and LLM ingestion.
The ProcessConfig¶
All intelligence extraction is opt-in via ProcessConfig. Enable only what you need:
from tree_sitter_language_pack import ProcessConfig
config = ProcessConfig(
language="python",
structure=True, # functions, classes, methods
imports=True, # import statements
exports=True, # exported symbols
comments=True, # inline comments
docstrings=True, # docstring extraction
symbols=True, # all identifiers
diagnostics=True, # syntax errors / error nodes
chunk_max_size=0, # 0 = no chunking
)
Use .all() (Rust) or ProcessConfig(language=..., all=True) (Python) to enable everything at once.
ProcessResult Fields¶
structure — Functions, Classes, and Methods¶
A list of top-level code constructs with their names, kinds, ranges, and optionally their docstrings.
for item in result["structure"]:
print(item["kind"]) # "function" | "class" | "method" | "interface" | ...
print(item["name"]) # "greet"
print(item["start_line"]) # 3
print(item["end_line"]) # 6
print(item["docstring"]) # "Greet a user by name." (if docstrings=True)
```text
**Supported kinds** vary by language:
| Kind | Languages |
|------|-----------|
| `function` | All languages |
| `class` | Python, JS/TS, Java, C#, Ruby, PHP, Kotlin, … |
| `method` | Same as class |
| `interface` | TypeScript, Java, C#, Go, Kotlin, … |
| `struct` | Rust, Go, C, C++, C#, … |
| `impl` | Rust |
| `module` | Elixir, Ruby, Rust, … |
| `enum` | Rust, Java, C#, TypeScript, Kotlin, … |
| `trait` | Rust |
| `type_alias` | TypeScript, Rust |
| `decorator` | Python, TypeScript |
### `imports` — Import Statements
All import declarations with their source module and imported names.
```python
for imp in result["imports"]:
print(imp["source"]) # "os" or "pathlib"
print(imp["names"]) # ["path", "getcwd"] (empty = wildcard or bare import)
print(imp["start_line"])
```text
```json
[
{ "source": "os", "names": [], "start_line": 1 },
{ "source": "pathlib", "names": ["Path"], "start_line": 2 },
{ "source": "./utils", "names": ["readFile", "writeFile"], "start_line": 3 }
]
```text
### `exports` — Exported Symbols
Symbols that are part of the module's public API.
```python
for exp in result["exports"]:
print(exp["name"]) # "readFile"
print(exp["kind"]) # "function" | "class" | "const" | ...
```text
!!! note
Export detection is language-specific. For Python, everything defined at module level is considered exported unless prefixed with `_`. For JavaScript/TypeScript, only explicit `export` declarations are included.
### `comments` — Inline Comments
All comments in the file with their text and location.
```python
for comment in result["comments"]:
print(comment["text"]) # "// TODO: handle edge case"
print(comment["start_line"]) # 42
print(comment["is_block"]) # False
```text
### `docstrings` — Documentation Strings
Docstrings are attached to their parent construct in `structure`. When `docstrings=True`, each `structure` item gains a `docstring` field:
```python
func = result["structure"][0]
print(func["docstring"])
# "Read and return the contents of a file.\n\nArgs:\n path: Path to the file."
```text
Docstring extraction understands language-specific conventions:
| Language | Convention |
|----------|-----------|
| Python | `"""..."""` triple-quoted string immediately after `def`/`class` |
| Rust | `///` or `//!` doc comments above item |
| JavaScript/TypeScript | `/** ... */` JSDoc block above function |
| Java | `/** ... */` Javadoc block above method/class |
| Ruby | `# ...` lines immediately above `def`/`class` |
| Go | `// FuncName ...` comment block above func |
| Elixir | `@doc "..."` or `@moduledoc "..."` |
### `symbols` — All Identifiers
A deduplicated list of all identifiers referenced in the file, useful for search indexing.
```python
print(result["symbols"])
# ["os", "Path", "read_file", "FileManager", "base_dir", "get", ...]
```text
### `diagnostics` — Syntax Errors
Tree-sitter produces partial trees for invalid code, marking error nodes. `diagnostics` surfaces these:
```python
for error in result["diagnostics"]:
print(error["message"]) # "Unexpected token"
print(error["start_line"])
print(error["start_col"])
```text
!!! tip
A non-empty `diagnostics` list does not mean the file is unparsable — tree-sitter recovers and continues. Use it to detect broken syntax rather than to gate parsing.
### `chunks` — Syntax-Aware Splits
When `chunk_max_size > 0`, the `chunks` field contains the file split into token-budget segments. See [Chunking for LLMs](../guides/chunking.md) for full documentation.
```python
for chunk in result["chunks"]:
print(chunk["content"]) # the source code text
print(chunk["start_line"]) # first line of chunk
print(chunk["end_line"]) # last line of chunk
print(chunk["token_count"]) # estimated token count
print(chunk["node_types"]) # ["function_definition", "class_definition"]
```text
### `metrics` — File-Level Statistics
Basic metrics about the file:
```python
m = result["metrics"]
print(m["total_lines"]) # 120
print(m["code_lines"]) # 95 (non-blank, non-comment lines)
print(m["comment_lines"]) # 18
print(m["blank_lines"]) # 7
print(m["complexity"]) # cyclomatic complexity estimate (if supported)
```text
## Full Example
```python
from tree_sitter_language_pack import process, ProcessConfig
source = '''
import os
from pathlib import Path
from typing import Optional
def read_file(path: str, encoding: str = "utf-8") -> Optional[str]:
"""Read and return the contents of a file.
Args:
path: Path to the file to read.
encoding: File encoding. Defaults to utf-8.
Returns:
File contents, or None if the file doesn't exist.
"""
p = Path(path)
if not p.exists():
return None
return p.read_text(encoding=encoding)
class FileCache:
"""In-memory cache for file contents."""
def __init__(self, root: str):
self._root = root
self._cache: dict[str, str] = {}
def get(self, name: str) -> Optional[str]:
if name not in self._cache:
self._cache[name] = read_file(os.path.join(self._root, name))
return self._cache[name]
'''
config = ProcessConfig(
language="python",
structure=True,
imports=True,
docstrings=True,
comments=True,
diagnostics=True,
)
result = process(source, config)
# Structure
for item in result["structure"]:
print(f"{item['kind']:12} {item['name']:20} lines {item['start_line']}-{item['end_line']}")
# Output:
# function read_file lines 6-20
# class FileCache lines 22-33
# method __init__ lines 26-28
# method get lines 30-33
# Imports
for imp in result["imports"]:
names = ", ".join(imp["names"]) or "*"
print(f"from {imp['source']} import {names}")
# Output:
# from os import *
# from pathlib import Path
# from typing import Optional
# Docstrings
func = result["structure"][0]
print(f"\n{func['name']} docstring:\n{func['docstring']}")
# Metrics
m = result["metrics"]
print(f"\nLines: {m['total_lines']} total, {m['code_lines']} code, {m['comment_lines']} comments")