API Reference
Detailed technical specification for the AutoChunks core engine.
AutoChunker
The primary controller for the optimization search and deployment lifecycle.
class AutoChunker:
def __init__(
self,
mode: str = "light",
embedding_provider: str = "hashing",
embedding_model_or_path: str = "BAAI/bge-small-en-v1.5",
eval_config: Optional[EvalConfig] = None,
# ...
):
Key Methods
optimize()
Runs the multi-objective search tournament across document configurations.
- Parameters:
documents(str | List[Dict]): Directory path or list of loaded document objects.on_progress(Callable[[str, int], None]): Callback for real-time status updates and telemetry.sweep_params(Dict): Overrides for hyperparameter ranges (Sizes, Overlaps).
- Returns:
Tuple[Plan, Dict]— The optimizedPlanobject and a comprehensiveReportdictionary.
Plan
Represents a portable, serialized optimization result.
Attributes
generator_pipeline: The target chunking strategy name and validated hyperparameters.metrics: The expected performance profile (nDCG, MRR, Recall) recorded during optimization.
Methods
apply(docs_path, chunker)
Executes the strategy on a new corpus.
* Returns: List[Dict] — A standard RAG-ready list of chunks with metadata.
EvalHarness
The vectorized evaluation and simulation engine.
Methods
evaluate(chunks, qa)
Performs the O(1) vectorized retrieval simulation. * Performance Note: Leverages BLAS-optimized matrix multiplication via NumPy for high-concurrency evaluation of large candidate sets.