RAG and Vector Databases
RAG grounds LLMs in real-world data, eliminating hallucinations and enabling knowledge-intensive applications. Full RAG stack: vector storage, retrieval engines, graph RAG, document parsing, and embedding models.

Vector Databases
| Name |
Description |
Links |
| Milvus |
Cloud-native, distributed vector database for billion-scale similarity search. Most scalable open-source vector DB. 44k+ stars. |
milvus.io |
| Qdrant |
High-performance vector search engine in Rust. Dense + sparse vectors, hybrid search, payload filtering, quantization. 30k+ stars. |
qdrant.tech |
| Weaviate |
AI-native vector database with GraphQL API, built-in vectorization, hybrid search, and generative search modules. 16k+ stars. |
weaviate.io |
| Chroma |
AI-native open-source embedding database. Extremely easy to start, built-in embedding functions. 16k+ stars. |
trychroma.com |
| pgvector |
PostgreSQL extension for vector similarity search. HNSW + IVFFlat indexes. Game-changer for Postgres users. 14k+ stars. |
GitHub |
| LanceDB |
Serverless vector database built on Lance columnar format. Embedded, multimodal, full-text + vector hybrid search. |
lancedb.com |
| Meilisearch |
Lightning-fast search engine API with AI-powered hybrid search (full-text + vector). 57k+ stars. |
meilisearch.com |
| Vespa |
Yahoo's battle-tested big data serving engine. ANN + lexical hybrid, tensor expressions, complex ML ranking. |
vespa.ai |
| Pinecone |
Managed vector database for high-performance vector search at scale. |
Docs |
| Zilliz |
Cloud-native vector database service (managed Milvus). |
zilliz.com |
| Turbopuffer |
Serverless vector database using object storage (S3) for cost-efficient large-scale search. |
Docs |
| Name |
Description |
Links |
| RAGFlow |
Leading open-source RAG engine with vision-based document parsing (tables, figures, layouts). Multi-recall: vector + full-text + knowledge graph. 78k+ stars. |
ragflow.io |
| AnythingLLM |
All-in-one AI app with RAG, agents, MCP support. Use any LLM, any document, any vector database. Privacy-first. 58k+ stars. |
useanything.com |
| Kotaemon |
Open-source RAG document QA tool with chat UI, graph RAG, multi-modal support, and citation highlighting. 22k+ stars. |
GitHub |
| Pathway LLM App |
Ready-to-run cloud templates for RAG and AI pipelines with live data sync (Sharepoint, S3, Kafka, Postgres). 60k+ stars. |
pathway.com |
| Cognee |
Memory management for AI agents and apps. Builds knowledge graphs from documents for reasoning-based RAG. 15k+ stars. |
cognee.ai |
| R2R (SciPhi) |
Production RAG engine with hybrid search, knowledge graph building (Neo4j), ingestion pipelines, and REST API. |
Docs |
Graph RAG
| Name |
Description |
Links |
| Microsoft GraphRAG |
Landmark Graph-based RAG: extracts knowledge graphs from documents, performs global queries via community summaries. 32k+ stars. |
Paper, Docs |
| LightRAG |
Faster, simpler alternative to GraphRAG. Dual-level retrieval (entity + thematic) with incremental knowledge graph updates. 15k+ stars. |
GitHub |
| PageIndex |
Document index for vectorless, reasoning-based RAG. Bypasses traditional embedding for reasoning-first retrieval. 25k+ stars. |
GitHub |
| Neo4j GraphRAG |
Official Neo4j library for GraphRAG pipelines. KG construction, hybrid retrieval (vector + Cypher). |
Docs |
Document Parsing for RAG
| Name |
Description |
Links |
| Docling |
IBM's fast document conversion: PDFs with layout understanding, table recognition (TableFormer), figure extraction. 20k+ stars. |
Docs |
| Unstructured |
Document parsing ETL for RAG. Extracts from PDFs, DOCX, HTML, images, and 30+ formats. 10k+ stars. |
unstructured.io |
| PaddleOCR |
Turn any PDF or image into structured data for AI. Supports 100+ languages. 75k+ stars. |
GitHub |
Advanced RAG Techniques
| Technique |
Description |
Links |
| Self-RAG |
LLMs that reflect on retrieval decisions using special tokens. Adaptive retrieval — only retrieve when needed. |
Paper, Code |
| CRAG (Corrective RAG) |
Adds retrieval evaluator + correction mechanism. Falls back to web search if retrieved docs are irrelevant. |
Paper |
| ColPali (Visual RAG) |
Revolutionary: indexes documents as images using vision-language models, eliminating OCR failures entirely. |
Paper, Code |
| HippoRAG |
Neurobiologically-inspired RAG using knowledge graphs mimicking hippocampal memory for multi-hop reasoning. |
Code |
| RAG Techniques Guide |
Comprehensive tutorial repository showcasing 20+ advanced RAG techniques with notebooks. 27k+ stars. |
GitHub |
Embedding Models (2025-2026)
| Model |
Provider |
Key Notes |
Links |
| text-embedding-3-large |
OpenAI |
3072 dims (Matryoshka), strong general performance. |
Docs |
| embed-v4.0 |
Cohere |
Multimodal (text+image), 128K context, 100+ languages. |
Docs |
| voyage-3 |
Voyage AI |
Top MTEB scores (esp. code/finance/law), 32K context. Anthropic-backed. |
Docs |
| NV-Embed-v2 |
NVIDIA |
#1 MTEB overall at release. Decoder-only LLM architecture. 4096 dims. |
HuggingFace |
| nomic-embed-text-v1.5 |
Nomic AI |
Fully open (weights + data + code). Matryoshka dimensions. 8K context. |
HuggingFace |
| jina-embeddings-v3 |
Jina AI |
Task-specific LoRA adapters, Matryoshka, multilingual. |
HuggingFace |
| bge-m3 |
BAAI |
Multilingual (100 langs), multi-functionality: dense + sparse + ColBERT. |
HuggingFace |
| gte-Qwen2-7B-instruct |
Alibaba |
LLM-based embedding, top MTEB, instruction-tuned, 32K context. |
HuggingFace |