Nishant Sinha on Substack

NVIDIA’s new take on Enterprise RAG

Very few efforts to build RAG for enterprise scenarios are documented. This work from NVIDIA is a breath of fresh air!

The NVIDIA team builds 3 enterprise bots - Info, Help, Scout (last two in prod) to understand thoroughly various enterprise RAG issues. Most learnings are by now common, but there are some surprises.

👉 They decompose the space, introducing the FACTS framework.

❖ content Freshness,

❖ flexibility in RAG architecture ,

❖ Cost management,

❖ plan for Testing,

❖ Security of data/logs

Usual Stuff

❖ many control points. not 7 or 12 but 15!

❖ combine lexical and semantic search(dense and sparse embeddings)

❖ incorporate section headings in chunks

❖ sample RAG arch includes query decomposition, milvus for scale,

❖ guardrails to reduce risks, sensitive data filters

❗Some surprises

❖ e5 embedding finetuning didn't help much

❖ no need to (hard to?) choose one between a narrow vs a generalized enterprise bot.

❖ a dedicated company wide LLM gateway. cost management and auditing.

❖ prompt change testing. dont hear this mentioned too often.

What I liked most is that they are very candid about what worked, what failed and work-in-progress, without succumbing to the RAG hype.

arxiv.org/pdf/2407.07858

Read our full post on Enterprise RAG

OffNote Labs Newsletter

Lessons Building an Enterprise GenAI (RAG) Product

Jul 23

8:36 AM