Vector AI Search Modes
- exact: full candidate scoring and stable quality baseline.
- ivf: candidate pruning for higher throughput with recall tradeoff.
- auto: policy-based mode selection when workload behavior changes.
Batch Query Strategy
For throughput-heavy Vector AI Search, use
/search/topk/batch. Keep single-query
endpoints for latency-sensitive low-volume integration
paths.
Runtime Guards for Stability
-
Set
AIONBD_MAX_TOPK_LIMITto control fanout cost. -
Set
AIONBD_MAX_CONCURRENCYbased on p95/p99 behavior. -
Set
AIONBD_MAX_BODY_BYTESto prevent request spikes. - Track fallback and cache metrics to tune IVF behavior.
Practical Tuning Workflow
- Start with default safety-first durability profile.
- Run benchmark and soak baselines with production-like data shapes.
- Tune read path knobs, then write path knobs, then runtime limits.
- Rollback quickly if 5xx ratio or p95/p99 materially regress.
Reference: performance_tuning.md