Vector Databases: The Hidden Engine Behind Modern AI Applications
Discover how vector databases power the next generation of AI, enabling smarter search, personalized recommendations, and context-aware chatbots.

You probably don't need one yet
Most of the "you need a vector database" advice you'll read online is selling you infrastructure you don't have a problem for. I've shipped maybe a dozen AI features for clients, and on more than half of them the right answer was a Postgres extension and 40 lines of code. Sometimes it was a SQLite file. Reach for a managed vector DB when you've actually measured a reason to, not because a blog post told you Pinecone is "production-grade."
So before any of that, here's the one moment where a vector database earns its keep: you have content that people search by meaning, not by keyword, and there's enough of it that brute-forcing every comparison on every query gets slow. That's it. If you don't have both halves of that sentence, keep reading anyway, because the rest of this explains how to tell.
Embeddings, briefly
A traditional database is great at exact matching. Search "red running shoes" and it finds rows containing those words. But a customer who types "crimson sneakers" gets nothing, even though they want the same product. Keyword search has no idea those mean the same thing.
Embeddings fix that. An embedding model (text-embedding-3-small, say) takes text, an image, or audio and turns it into a vector — a list of floats, usually a few hundred to a couple thousand of them. The trick is that similar meanings land near each other in that space. "Crimson sneakers" and "red running shoes" end up as nearby vectors even though they share zero words.
To find matches you compute distance between vectors, usually cosine similarity, and grab the closest ones. That's the whole game: embed your data, embed the query, return nearest neighbors.
A vector database is just storage plus an index built for that one question — "what's closest to this vector?" — at scale. The "at scale" part is the only reason the index exists. With a few thousand vectors you can compute every distance on every query and never notice. The math doesn't change; the bookkeeping does.
Why everyone's talking about them now
RAG. Retrieval-Augmented Generation is what put vector databases on every roadmap.
An LLM knows the public internet up to its training cutoff. It does not know your client's Shopify inventory from this morning, their internal runbooks, or a given customer's order history. You can't retrain a frontier model every time a row changes, and fine-tuning is the wrong tool for "keep facts current."
RAG sidesteps it:
- Embed your knowledge base, store the vectors.
- At query time, embed the user's question and pull the most relevant chunks.
- Hand those chunks plus the question to the LLM as context.
The model answers from your data instead of guessing, which is most of how you cut hallucinations on a support bot. I went deep on the moving parts of this in implementing RAG for a custom AI knowledge base — the retrieval quality matters far more than which database you pick, and that's the part people skip.
Where I've actually used this
Semantic search on a store. Customer types "summer vibes dress," the product copy says "floral yellow sundress," and keyword search whiffs. Vector search connects the intent to the product. This is the clearest win, but mind the trap: pure vector search can be too loose and surface vaguely-related junk. In practice I usually run hybrid — keyword (BM25) and vector together — and merge the scores. Vector-only search looks magic in a demo and frustrates real shoppers who typed an exact SKU.
Recommendations. Store a user's behavior as a vector, retrieve semantically similar products. Closer to features and style than a rigid category tree gets you. Useful, but it's a recommender, not a search box — treat it as one and evaluate it like one.
Support chatbots. A user asks "how do I process a refund?" and you retrieve the actual policy chunk before the model writes a word. The LLM phrases the answer; the vector store makes sure it's your policy and not a plausible invention. If you're wiring one of these up end to end, I walked through the full build in how to build an AI chatbot for your website.
The options, and when each one's the right call
- pgvector (Postgres extension). Start here if you're already on Postgres, which for me is most projects. You get vector similarity search inside the database that already holds your users, orders, and metadata, so you can filter by
user_idorin_stockin the sameWHEREclause as your similarity search. No second system to deploy, back up, or keep in sync. It comfortably handles hundreds of thousands of vectors, often into the low millions with an HNSW index. For the vast majority of web apps, this is the answer and you can stop shopping. - Qdrant. When I do outgrow pgvector, this is usually where I go first. Open source, fast, sane filtering, runs in a container you control. Good middle ground between "extension on my existing DB" and "someone else's cloud."
- Weaviate. Open source, strong multi-modal story (text + images), built-in vectorization modules if you want the DB to call the embedding model for you.
- Milvus. Built for genuinely large scale. If you're indexing tens of millions of vectors and up, it's a serious option. Most projects never get here.
- Pinecone. Fully managed, very little to operate, scales without you thinking about it. Fast to start.
An aside on reaching for Pinecone
Here's the unpopular take: a lot of teams stand up Pinecone for a dataset that would fit in pgvector with room to spare. You've now got a second piece of infrastructure, a separate bill, an extra network hop on every query, and — the part that actually bites — your vectors live in one system while the metadata you need to filter on lives in Postgres. So you either duplicate metadata into Pinecone and fight to keep it in sync, or you do a two-step dance: query Pinecone for IDs, then query Postgres for the rows. Both are annoying, and both are work you invented.
If you can run SELECT ... ORDER BY embedding <=> $1 LIMIT 10 against a table that already has your data and your filters, do that. Move to a dedicated vector DB when you've measured pgvector falling over — query latency climbing under load, index size outgrowing the box — not because a managed service felt more "real." I've watched the managed-first instinct add a month of integration work to projects that needed an afternoon.
That said, when you genuinely have the scale or you don't want to run anything yourself, managed is a fine choice. The mistake isn't using Pinecone. It's using it before you have the problem it solves.
A word on HNSW
When people say a vector DB is "indexed," they almost always mean HNSW — Hierarchical Navigable Small World graphs. It's the default in pgvector, Qdrant, and most of the others, and it's worth knowing one thing about it: it's approximate. HNSW trades a little recall for a lot of speed. You might not get the literal top-10 nearest neighbors, you get a very-likely top-10, fast.
For search and recommendations that tradeoff is invisible and absolutely worth it. The knobs that matter:
m— connections per node. Higher means better recall and a bigger index.ef_construction— effort at build time. Higher means a better graph and slower inserts.ef_search— effort at query time. Higher means better recall and slower queries.
You don't tune these on day one. You tune them when recall or latency becomes a measured problem, and not before.
The actual workflow
End to end it's four steps, and none of them are hard:
- Chunk. Split long docs into segments. Chunking strategy affects retrieval quality more than your database choice does, so spend your time here.
- Embed. Run each chunk through an embedding API (OpenAI, Cohere, or a local model).
- Upsert. Store the vector alongside its metadata.
- Query. Embed the incoming query, run a nearest-neighbor search, return the top matches.
Here's the whole thing in pgvector, which is honestly all most apps need:
-- one-time setup
CREATE EXTENSION IF NOT EXISTS vector;
CREATE TABLE documents (
id bigserial PRIMARY KEY,
content text,
metadata jsonb,
embedding vector(1536) -- matches text-embedding-3-small
);
-- HNSW index for fast approximate search
CREATE INDEX ON documents
USING hnsw (embedding vector_cosine_ops);import openai, psycopg
def embed(text: str) -> list[float]:
resp = openai.embeddings.create(
model="text-embedding-3-small",
input=text,
)
return resp.data[0].embedding
def search(conn, query: str, k: int = 5):
q = embed(query)
with conn.cursor() as cur:
# <=> is cosine distance in pgvector; smaller is closer
cur.execute(
"""
SELECT content, metadata,
1 - (embedding <=> %s::vector) AS similarity
FROM documents
WHERE (metadata->>'in_stock')::bool = true
ORDER BY embedding <=> %s::vector
LIMIT %s
""",
(q, q, k),
)
return cur.fetchall()Note the WHERE clause filtering on in_stock in the same query as the similarity search. That's the pgvector advantage in one line — your vectors and your business filters live together, no syncing, no two-step lookup against a separate store.
If you want to see this kind of retrieval wired into full agent workflows — tools, multi-step reasoning, the works — the awesome LLM apps, AI agents and RAG guide has runnable examples worth stealing from.
What actually breaks in production
The demo works. Here's the short list of what goes sideways once real traffic hits it:
- Embedding model mismatch. Your stored vectors and your query vectors must come from the same model. Swap the embedding model and forget to re-embed everything, and every search silently returns garbage. No error, just bad results. Pin the model version.
- Dimension mismatch.
text-embedding-3-smallis 1536 dims; your column is declared for one fixed size. Change models, change dimensions, and inserts fail. Caught fast, at least. - Pure-vector retrieval being too fuzzy. Already mentioned, worth repeating. If exact-match queries matter — SKUs, names, error codes — add keyword search alongside the vectors.
- Re-embedding cost on updates. Every time source content changes, that chunk needs a fresh embedding, which is an API call and a bill. Batch it. Don't re-embed on every tiny edit.
- Latency from too many dimensions or too-high `ef_search`. If queries feel slow, measure before you blame the database. Usually it's a knob, not the engine.
Before you call a retrieval feature done: search for something you know is in the corpus and confirm it comes back ranked first. Search for something that isn't there and confirm you don't get confident nonsense. Those two checks catch most of the embarrassing failures.
FAQ
Do I really need a vector database, or is Postgres enough? For most apps, pgvector inside your existing Postgres is enough — into the hundreds of thousands of vectors, often more. Move to a dedicated system when you've measured it falling over, not before.
Is pgvector slower than Pinecone? At small to medium scale you won't notice a difference that matters, and pgvector saves you a network hop and a sync problem because your data and filters already live there. At very large scale a purpose-built engine pulls ahead. Benchmark with your data and load before deciding; don't trust a generic number, including mine.
What embedding model should I use? text-embedding-3-small is a sensible, cheap default and where I start. Whatever you pick, pin the version and use the exact same model for storing and querying.
HNSW or exact search? Exact (flat) search is fine up to a few thousand vectors and returns the true nearest neighbors. Past that, HNSW's approximate results are faster and the lost recall is invisible for search and recommendations.
How do I improve bad search results? Look at chunking and retrieval quality before touching the database. Better chunks and hybrid keyword-plus-vector search fix more problems than swapping vendors ever will.
Want this built for you instead of DIY?
I'm Karan — a Top Rated Plus Shopify Expert ($300K+ earned, 100% Job Success). If you'd rather hand this to someone who's done it hundreds of times, let's talk.
🛠️Generative AI Tools You Might Like
Tags
📬 Get notified about new tools & tutorials
No spam. Unsubscribe anytime.
Comments (0)
Leave a Comment
No comments yet. Be the first to share your thoughts!


