ADR-05: Parser Pooling Disabled

🇰🇷 한국어 버전

Date	Author	Repos
2025-12-23	@KubrickCode	core

Status: Accepted

Context

Problem Statement

Tree-sitter parsers were initially pooled using sync.Pool for performance optimization. However, this caused intermittent test failures that were difficult to reproduce and diagnose.

Root Cause

When a context is cancelled during ParseCtx() execution:

Tree-sitter sets an internal cancellation flag
This flag is not properly reset when the parser is returned to the pool
Subsequent reuse of that parser fails with "operation limit was hit" error

Impact

Flaky tests in CI/CD pipelines
Non-deterministic behavior in production
Debugging complexity due to intermittent nature

Strategic Question

How should we handle tree-sitter parser lifecycle to ensure reliability while maintaining acceptable performance?

Decision

Disable parser pooling. Create fresh parsers per-use while caching language grammars via sync.Once.

This approach:

Eliminates the cancellation flag bug completely
Preserves the main performance optimization (grammar caching)
Trades ~10µs per-parse overhead for guaranteed reliability

Options Considered

Option A: Fresh Parser Per-Use (Selected)

Create a new parser for each parse operation.

Pros:

Guaranteed reliability: No state leakage between parse operations
Simple implementation: No pool management complexity
Predictable behavior: Each parse is independent

Cons:

Per-parse overhead: ~10µs allocation cost per file
More GC pressure: Fresh allocations increase garbage collection work

Option B: Fix Tree-sitter Bug Upstream

Contribute a fix to the tree-sitter C library.

Pros:

Addresses root cause
Benefits entire tree-sitter ecosystem

Cons:

External dependency: Fix timeline not under our control
Maintenance burden: Must track upstream changes
Uncertain acceptance: PR may not be accepted or may take months

Option C: Manual Flag Reset

Implement workaround to reset parser state before reuse.

Pros:

Preserves pooling performance benefits

Cons:

Fragile: Depends on internal tree-sitter implementation details
Maintenance risk: May break with tree-sitter updates
Incomplete: May not address all edge cases

Implementation Details

Current Architecture

pkg/parser/tspool/
├── pool.go         # Parser creation, language grammar caching
└── pool_test.go    # Concurrency tests (race detection)

Parser Creation

Fresh parser created per-use:

func Get(lang domain.Language) *sitter.Parser {
    initLanguages()
    parser := sitter.NewParser()
    parser.SetLanguage(GetLanguage(lang))
    return parser
}

Language Grammar Caching

Expensive grammar initialization is still cached via sync.Once:

var (
    goLang   *sitter.Language
    jsLang   *sitter.Language
    // ... all supported languages
    langOnce sync.Once
)

func initLanguages() {
    langOnce.Do(func() {
        goLang = golang.GetLanguage()
        jsLang = javascript.GetLanguage()
        // ...
    })
}

Rationale: Grammar initialization involves C FFI calls and memory allocation. sync.Once ensures thread-safe single initialization while deferring the cost until first use.

Parse Helper

The Parse function provides a clean API with guaranteed cleanup:

func Parse(ctx context.Context, lang domain.Language, source []byte) (*sitter.Tree, error) {
    parser := Get(lang)
    defer parser.Close()

    tree, err := parser.ParseCtx(ctx, nil, source)
    if err != nil {
        return nil, fmt.Errorf("parse %s failed: %w", lang, err)
    }
    return tree, nil
}

Performance Impact

Operation	Overhead	Status
Parser allocation	~10µs/parse	Acceptable
Language grammar init	~1-5ms	Cached once
Query compilation	~1-5ms	Cached once
Query execution	~0.1-1ms/file	Optimized

Net Impact: Grammar and query caching provide 10-50x speedup for repeated operations. The ~10µs per-parse overhead is negligible compared to typical file I/O latency.

Consequences

Positive

Test Stability
- No more flaky tests from parser state leakage
- Deterministic CI/CD pipeline behavior
Code Simplicity
- No pool management code to maintain
- Clear ownership semantics (caller creates, caller closes)
Debugging Ease
- Each parse operation is isolated
- No cross-contamination between operations

Negative

Per-Parse Overhead
- ~10µs allocation per file
- Mitigation: Acceptable for core library use case
Increased GC Pressure
- More short-lived allocations
- Mitigation: Grammar caching keeps most allocations long-lived

Constraints on Future Changes

Cannot re-enable pooling without upstream tree-sitter fix
Performance optimization efforts must focus on query caching, not parser reuse

ADR-03: Tree-sitter as AST Parsing Engine - Why tree-sitter was chosen

References

smacker/go-tree-sitter - Go bindings used
Tree-sitter Documentation - Official documentation

ADR-05: Parser Pooling Disabled ​

Context ​

Problem Statement ​

Root Cause ​

Impact ​

Strategic Question ​

Decision ​

Options Considered ​

Option A: Fresh Parser Per-Use (Selected) ​

Option B: Fix Tree-sitter Bug Upstream ​

Option C: Manual Flag Reset ​

Implementation Details ​

Current Architecture ​

Parser Creation ​

Language Grammar Caching ​

Parse Helper ​

Performance Impact ​

Consequences ​

Positive ​

Negative ​

Constraints on Future Changes ​

Related ADRs ​

References ​

ADR-05: Parser Pooling Disabled

Context

Problem Statement

Root Cause

Impact

Strategic Question

Decision

Options Considered

Option A: Fresh Parser Per-Use (Selected)

Option B: Fix Tree-sitter Bug Upstream

Option C: Manual Flag Reset

Implementation Details

Current Architecture

Parser Creation

Language Grammar Caching

Parse Helper

Performance Impact

Consequences

Positive

Negative

Constraints on Future Changes

Related ADRs

References