NVIDIA’s new take on Enterprise RAG
Very few efforts to build RAG for enterprise scenarios are documented. This work from NVIDIA is a breath of fresh air!
The NVIDIA team builds 3 enterprise bots - Info, Help, Scout (last two in prod) to understand thoroughly various enterprise RAG issues. Most learnings are by now common, but there are some surprises.
👉 They decompose the space, introducing the FACTS framework.
❖ content Freshness,
❖ flexibility in RAG architecture ,
❖ Cost management,
❖ plan for Testing,
❖ Security of data/logs
Usual Stuff
❖ many control points. not 7 or 12 but 15!
❖ combine lexical and semantic search(dense and sparse embeddings)
❖ incorporate section headings in chunks
❖ sample RAG arch includes query decomposition, milvus for scale,
❖ guardrails to reduce risks, sensitive data filters
❗Some surprises
❖ e5 embedding finetuning didn't help much
❖ no need to (hard to?) choose one between a narrow vs a generalized enterprise bot.
❖ a dedicated company wide LLM gateway. cost management and auditing.
❖ prompt change testing. dont hear this mentioned too often.
What I liked most is that they are very candid about what worked, what failed and work-in-progress, without succumbing to the RAG hype.
arxiv.org/pdf/2407.07858
Read our full post on Enterprise RAG