Download Model
Tree-sitter-language-pack does not bundle parser binaries into the package. Instead, the pack fetches parsers on first use and caches them locally. This keeps install sizes small and gives you control over which languages are available.
How It Works¶
sequenceDiagram
participant App
participant Core as ts-pack-core
participant Cache as Local Cache
participant Remote as GitHub Releases
App->>Core: get_parser("python")
Core->>Cache: is "python" cached?
alt cached
Cache-->>Core: python.so
Core-->>App: Parser
else not cached
Core->>Remote: GET parsers.json
Remote-->>Core: manifest with download URL
Core->>Remote: GET python-linux-x64.so
Remote-->>Core: binary bytes
Core->>Cache: write python.so
Cache-->>Core: python.so
Core-->>App: Parser
end
The flow in detail:
- Your code calls
get_parser("python")(orget_language, orprocess). - The core checks the local cache directory for the parser binary.
- If not cached, it fetches
parsers.jsonfrom GitHub releases to find the correct download URL for the current platform. - The binary downloads and writes to the cache directory.
- The process opens the binary via
dlopen/LoadLibraryand resolves the parser symbol. - On later calls, the cached binary serves directly — no network access.
Cache Directory¶
The default cache location is platform-specific:
| Platform | Default Path |
|---|---|
| Linux | $XDG_CACHE_HOME/tree-sitter-language-pack or ~/.cache/tree-sitter-language-pack |
| macOS | ~/Library/Caches/tree-sitter-language-pack |
| Windows | %LOCALAPPDATA%\tree-sitter-language-pack |
You can override it programmatically:
Parser Manifest¶
The manifest is a JSON file (parsers.json) hosted on each GitHub release. It maps language names to platform-specific download URLs:
{
"version": "1.0.0",
"languages": {
"python": {
"linux-x64": "https://github.com/.../python-linux-x64.so",
"linux-arm64": "https://github.com/.../python-linux-arm64.so",
"macos-x64": "https://github.com/.../python-macos-x64.dylib",
"macos-arm64": "https://github.com/.../python-macos-arm64.dylib",
"windows-x64": "https://github.com/.../python-windows-x64.dll"
}
}
}
The manifest caches locally alongside the parser binaries and refreshes on version upgrades.
Pre-Downloading Parsers¶
For production, CI, or offline environments, download parsers explicitly rather than relying on auto-download at runtime.
Inspecting the Cache¶
from tree_sitter_language_pack import downloaded_languages, cache_dir, manifest_languages
# Languages available locally (no network needed)
local = downloaded_languages()
print(f"{len(local)} parsers cached at {cache_dir()}")
# All languages in the remote manifest
remote = manifest_languages()
missing = set(remote) - set(local)
print(f"{len(missing)} not yet downloaded")
Cleaning the Cache¶
Docker and CI¶
For containerized deployments, pre-download parsers during the build stage to remove network access at runtime.
FROM python:3.12-slim
RUN pip install tree-sitter-language-pack
# Pre-download the parsers your application uses
RUN python -c "from tree_sitter_language_pack import download; download(['python', 'javascript', 'rust'])"
COPY . /app
WORKDIR /app
CMD ["python", "app.py"]
For CI pipelines, cache the parser directory between runs:
- name: Cache tree-sitter parsers
uses: actions/cache@v4
with:
path: ~/.cache/tree-sitter-language-pack
key: tslp-parsers-${{ hashFiles('requirements.txt') }}
Configuration File¶
For projects that always use the same set of languages, create a language-pack.toml in the project root:
languages = ["python", "javascript", "typescript", "rust", "go"]
cache_dir = ".cache/parsers" # optional: project-local cache
Then download everything declared:
See Configuration for the full file format and discovery rules.