Skip to content

API Reference

Detailed technical specification for the AutoChunks core engine.


AutoChunker

The primary controller for the optimization search and deployment lifecycle.

class AutoChunker:
    def __init__(
        self,
        mode: str = "light",
        embedding_provider: str = "hashing",
        embedding_model_or_path: str = "BAAI/bge-small-en-v1.5",
        eval_config: Optional[EvalConfig] = None,
        # ...
    ):

Key Methods

optimize()

Runs the multi-objective search tournament across document configurations.

  • Parameters:
    • documents (str | List[Dict]): Directory path or list of loaded document objects.
    • on_progress (Callable[[str, int], None]): Callback for real-time status updates and telemetry.
    • sweep_params (Dict): Overrides for hyperparameter ranges (Sizes, Overlaps).
  • Returns: Tuple[Plan, Dict] — The optimized Plan object and a comprehensive Report dictionary.

Plan

Represents a portable, serialized optimization result.

Attributes

  • generator_pipeline: The target chunking strategy name and validated hyperparameters.
  • metrics: The expected performance profile (nDCG, MRR, Recall) recorded during optimization.

Methods

apply(docs_path, chunker)

Executes the strategy on a new corpus. * Returns: List[Dict] — A standard RAG-ready list of chunks with metadata.


EvalHarness

The vectorized evaluation and simulation engine.

Methods

evaluate(chunks, qa)

Performs the O(1) vectorized retrieval simulation. * Performance Note: Leverages BLAS-optimized matrix multiplication via NumPy for high-concurrency evaluation of large candidate sets.