NVIDIA’s new take on Enterprise RAG

Very few efforts to build RAG for enterprise scenarios are documented. This work from NVIDIA is a breath of fresh air!

The NVIDIA team builds 3 enterprise bots - Info, Help, Scout (last two in prod) to understand thoroughly various enterprise RAG issues. Most learnings are by now common, but there are some surprises. 

👉 They decompose the space, introducing the FACTS framework.

❖ content Freshness, 

❖ flexibility in RAG architecture , 

❖ Cost management, 

❖ plan for Testing, 

❖ Security of data/logs

Usual Stuff 

❖ many control points. not 7 or 12 but 15!

❖ combine lexical and semantic search(dense and sparse embeddings)

❖ incorporate section headings in chunks

❖ sample RAG arch includes query decomposition, milvus for scale, 

❖ guardrails to reduce risks, sensitive data filters

Some surprises

❖ e5 embedding finetuning didn't help much

❖ no need to (hard to?) choose one between a narrow vs a generalized enterprise bot. 

❖ a dedicated company wide LLM gateway. cost management and auditing.

❖ prompt change testing. dont hear this mentioned too often.

What I liked most is that they are very candid about what worked, what failed and work-in-progress, without succumbing to the RAG hype.

arxiv.org/pdf/2407.07858

Read our full post on Enterprise RAG

Lessons Building an Enterprise GenAI (RAG) Product
Jul 23
at
8:36 AM