Quick Start

This guide walks you from install to parsing, code intelligence, and LLM chunking.

1. Install¶

PythonNode.jsRustCLI

pip install tree-sitter-language-pack

npm install @kreuzberg/tree-sitter-language-pack

cargo add tree-sitter-language-pack

brew install kreuzberg-dev/tap/ts-pack

!!! Tip "Other ecosystems" Go, Java, Ruby, Elixir, PHP, and WebAssembly are also supported. See Installation for the full list.

2. Download Parsers¶

Parsers download automatically on first use. For production, CI, Docker, or offline environments, pre-download them.

Specific languages¶

CLIPythonNode.jsRust

ts-pack download python javascript rust go

from tree_sitter_language_pack import download

download(["python", "javascript", "rust", "go"])

import { download } from "@kreuzberg/tree-sitter-language-pack";

await download(["python", "javascript", "rust", "go"]);

use tree_sitter_language_pack::download;

download(&["python", "javascript", "rust", "go"])?;

All 306 languages¶

CLIPythonNode.jsRust

ts-pack download --all

from tree_sitter_language_pack import download_all

download_all()

import { downloadAll } from "@kreuzberg/tree-sitter-language-pack";

await downloadAll();

use tree_sitter_language_pack::download_all;

download_all()?;

By language group¶

Groups bundle related languages: web, systems, scripting, data, jvm, functional.

CLIPythonNode.jsRust

# Download all web languages (HTML, CSS, JS, TS, Vue, Svelte, …)
ts-pack download --groups web,data

# See what's cached
ts-pack list --downloaded

from tree_sitter_language_pack import init

init({"groups": ["web", "data"]})

import { init } from "@kreuzberg/tree-sitter-language-pack";

await init({ groups: ["web", "data"] });

use tree_sitter_language_pack::{PackConfig, init};

let config = PackConfig {
    groups: Some(vec!["web".into(), "data".into()]),
    ..Default::default()
};
init(&config)?;

Docker and CI¶

Pre-download parsers during your build to avoid runtime network calls:

Dockerfile

FROM python:3.12-slim
RUN pip install tree-sitter-language-pack
# Pre-download at build time — no network needed at runtime
RUN python -c "from tree_sitter_language_pack import download_all; download_all()"

GitHub Actions

- name: Install and pre-download parsers
  run: |
    pip install tree-sitter-language-pack
    python -c "from tree_sitter_language_pack import download; download(['python', 'javascript', 'rust'])"

Configuration file¶

Declare which languages your project needs in a language-pack.toml:

language-pack.toml

languages = ["python", "javascript", "rust", "go"]
# groups = ["web", "systems"]
# cache_dir = "/tmp/parsers"

Then download everything declared in the config:

CLIPython

# Reads language-pack.toml automatically
ts-pack download

from tree_sitter_language_pack import init

# Reads language-pack.toml from current directory
init()

!!! Info "Cache location" Parsers cache to ~/.cache/tree-sitter-language-pack/ on Linux/macOS and %LOCALAPPDATA%\tree-sitter-language-pack\ on Windows. Override with cache_dir in language-pack.toml or the programmatic API. See Download Model for full details.

3. Parse Code¶

Build a concrete syntax tree from source code.

PythonNode.jsRustCLI

from tree_sitter_language_pack import get_parser

parser = get_parser("python")

source = b"""
def greet(name: str) -> str:
    return f"Hello, {name}!"

result = greet("world")
"""

tree = parser.parse(source)
root = tree.root_node

print(root.type)           # module
print(root.child_count)    # 2
print(root.sexp()[:120])   # S-expression preview

import { parseString, treeRootNodeType, treeRootChildCount } from "@kreuzberg/tree-sitter-language-pack";

const source = `
function greet(name) {
  return \`Hello, \${name}!\`;
}
greet("world");
`;

const tree = parseString("javascript", source);

console.log(treeRootNodeType(tree));       // program
console.log(treeRootChildCount(tree));     // 2

use tree_sitter_language_pack::get_parser;

fn main() -> anyhow::Result<()> {
    let mut parser = get_parser("rust")?;

    let source = r#"
fn greet(name: &str) -> String {
    format!("Hello, {}!", name)
}
"#;

    let tree = parser.parse(source, None).unwrap();
    let root = tree.root_node();

    println!("{}", root.kind());        // source_file
    println!("{}", root.child_count()); // 1
    println!("{}", root.to_sexp());
    Ok(())
}

# Parse a file and display the syntax tree
ts-pack parse src/main.py

# Output as JSON
ts-pack parse src/main.py --format json

# Parse inline code via stdin
echo "def hello(): pass" | ts-pack parse --language python

4. Extract Code Intelligence¶

Go beyond the raw syntax tree. Extract functions, classes, imports, docstrings, and more with process.

PythonNode.jsRustCLI

from tree_sitter_language_pack import process, ProcessConfig

source = """
import os
from pathlib import Path

def read_file(path: str) -> str:
    \"\"\"Read and return the contents of a file.\"\"\"
    return Path(path).read_text()

class FileManager:
    def __init__(self, base_dir: str):
        self.base_dir = base_dir

    def get(self, name: str) -> str:
        return read_file(os.path.join(self.base_dir, name))
"""

config = ProcessConfig(
    language="python",
    structure=True,   # functions and classes
    imports=True,     # import statements
    comments=True,    # inline comments
    docstrings=True,  # docstring extraction
)
result = process(source, config)

print(f"Imports:   {[i['name'] for i in result['imports']]}")
print(f"Symbols:   {[s['name'] for s in result['structure']]}")
print(f"Docstring: {result['structure'][0]['docstring']}")

import { process } from "@kreuzberg/tree-sitter-language-pack";

const source = `
import fs from "fs";
import { join } from "path";

/**
 * Read and return the contents of a file.
 */
function readFile(path: string): string {
  return fs.readFileSync(path, "utf8");
}

class FileManager {
  constructor(private baseDir: string) {}

  get(name: string): string {
    return readFile(join(this.baseDir, name));
  }
}
`;

const result = await process(source, {
  language: "typescript",
  structure: true,
  imports: true,
  docstrings: true,
});

console.log("Imports:", result.imports.map(i => i.name));
console.log("Symbols:", result.structure.map(s => s.name));

use tree_sitter_language_pack::{process, ProcessConfig};

fn main() -> anyhow::Result<()> {
    let source = r#"
use std::fs;
use std::path::Path;

/// Read and return the contents of a file.
fn read_file(path: &str) -> String {
    fs::read_to_string(path).unwrap()
}

struct FileManager {
    base_dir: String,
}
"#;

    let mut config = ProcessConfig::new("rust");
    config.structure = true;
    config.imports = true;
    config.docstrings = true;

    let result = process(source, &config)?;

    println!("Imports: {:?}", result.imports.iter().map(|i| &i.name).collect::<Vec<_>>());
    println!("Symbols: {:?}", result.structure.iter().map(|s| &s.name).collect::<Vec<_>>());
    Ok(())
}

# Full code intelligence on a file
ts-pack process src/main.py --structure --imports --docstrings

# JSON output for piping
ts-pack process src/main.py --all | jq '.structure[].name'

5. Run Extraction Queries¶

Use extract to run custom tree-sitter queries and get structured results with captured text and metadata.

Python

import tree_sitter_language_pack as tslp

source = """
def greet(name: str) -> str:
    return f"Hello, {name}!"

def farewell(name: str) -> str:
    return f"Goodbye, {name}!"
"""

result = tslp.extract(source, {
    "language": "python",
    "patterns": {
        "functions": {
            "query": "(function_definition name: (identifier) @name)",
            "capture_output": "Text",
        }
    }
})

for match in result["results"]["functions"]["matches"]:
    print(match["captures"][0]["text"])
# greet
# farewell

7. Chunk for LLMs¶

Split code at natural boundaries so language models receive coherent, complete units which is ideal for embedding pipelines and context windows.

PythonNode.jsCLI

from tree_sitter_language_pack import process, ProcessConfig

with open("large_module.py") as f:
    source = f.read()

config = ProcessConfig(
    language="python",
    chunk_max_size=1500,  # max bytes per chunk
    structure=True,
)
result = process(source, config)

for i, chunk in enumerate(result["chunks"]):
    print(f"Chunk {i}: lines {chunk['start_line']}-{chunk['end_line']} "
          f"({chunk['end_byte'] - chunk['start_byte']} bytes)")

import { process } from "@kreuzberg/tree-sitter-language-pack";
import { readFileSync } from "fs";

const source = readFileSync("large_module.ts", "utf8");

const result = await process(source, {
  language: "typescript",
  chunkMaxSize: 1500,
  structure: true,
});

result.chunks.forEach((chunk, i) => {
  console.log(`Chunk ${i}: lines ${chunk.startLine}-${chunk.endLine} (${chunk.endByte - chunk.startByte} bytes)`);
});

# Chunk a file for LLM ingestion
ts-pack process large_module.py --chunk-size 1500 \
  | jq '.chunks[] | {start: .start_line, end: .end_line, bytes: (.end_byte - .start_byte)}'

You now have the full workflow. You can now install, download, parse, extract intelligence, run queries, and chunk for LLMs. Go further with the following guides:

Parsing guide — syntax trees, error handling, and incremental parsing
Configuration — language-pack.toml and advanced options
API Reference — full API docs for every binding