AIONBD Blog

Vector AI Search at the Edge: Exact vs IVF

This page focuses on Vector AI Search strategy selection for edge deployments, including exact, IVF, and auto mode behavior under production constraints.

Vector AI Search Modes

  • exact: full candidate scoring and stable quality baseline.
  • ivf: candidate pruning for higher throughput with recall tradeoff.
  • auto: policy-based mode selection when workload behavior changes.

Batch Query Strategy

For throughput-heavy Vector AI Search, use /search/topk/batch. Keep single-query endpoints for latency-sensitive low-volume integration paths.

Runtime Guards for Stability

  • Set AIONBD_MAX_TOPK_LIMIT to control fanout cost.
  • Set AIONBD_MAX_CONCURRENCY based on p95/p99 behavior.
  • Set AIONBD_MAX_BODY_BYTES to prevent request spikes.
  • Track fallback and cache metrics to tune IVF behavior.

Practical Tuning Workflow

  1. Start with default safety-first durability profile.
  2. Run benchmark and soak baselines with production-like data shapes.
  3. Tune read path knobs, then write path knobs, then runtime limits.
  4. Rollback quickly if 5xx ratio or p95/p99 materially regress.

Reference: performance_tuning.md