Samrian
Back to Blog
Compliance / ISO 9001

The Hardest Part of AI Isn't the AI

A
Afiz
Founder @ Samrian6 min read2026-01-31

Everyone is building RAG systems. Most are building them wrong.

They focus on the LLM (the "brain"), assuming a bigger brain is always better. But our research shows that the model is rarely the problem. The single biggest point of failure is the information you feed it. An LLM fed garbage will give you garbage, no matter how smart it is.

This is especially true in manufacturing, where data lives in dense, visual documents that were never meant for a machine to read. Here is what we've learned about building AI that can survive an audit.

The Page is More Important Than the Text

The most critical step in any RAG system is retrieval: finding the right page. Counter-intuitively, it is far better to give an LLM the correct page with minor text errors than a perfect transcript of the wrong page.

Our research proves this. We found that simply swapping a standard text-based retriever for a visual one recovered 70% of the answer accuracy that was lost to OCR errors. The LLM was the same. The text it used for the final answer was the same flawed OCR. The only thing that changed was its ability to find the right page first.

This leads to a simple, powerful conclusion: the quality of your retrieval system is the highest-leverage tool you have.

The Density Problem

Standard retrievers are blind. They rely on OCR to turn a visual document, like a schematic, into a simple string of text. This process is fundamentally lossy, and it fails in three critical ways:

1. Broken Tables
OCR reads left-to-right, top-to-bottom. But a table's meaning comes from its structure: rows and columns that create relationships. When OCR flattens a table into a text stream, it destroys the very thing that makes the data meaningful.

2. Ghost Trends
A line chart showing temperature drift over time contains critical information. But to a text-based system, it's invisible. The trend, the outlier, the pattern: all of it disappears because OCR can't "see" visual data.

3. Handwritten Logs
On the shop floor, operators still write notes by hand. Standard OCR either fails completely or produces garbled text that's worse than useless. The AI can't retrieve what it can't read.

Comparison of Standard AI vs Visual Retrieval: showing how text-based systems miss tables, charts, and handwritten notes while visual retrieval captures the full context
Why Standard AI is Blind: Text-based retrieval destroys the structure and context that manufacturing documents depend on

Visual Retrieval (ColPali)

A newer approach, multimodal retrieval, doesn't just read the text; it sees the page. It analyzes the pixel grid, understanding layout and context. The data is stunning: a visual retriever is ~12% more accurate than a system using a perfect, flawless text transcript.

Why? Because in manufacturing, the layout is the data. A number's position in a column or a line's trend on a chart is a signal that text alone cannot capture.

More Is Often Worse

This obsession with signal quality extends to context size. The common assumption is that giving the LLM more documents (k) is better. It's not.

Research shows that an LLM's performance plateaus around k=10 documents and can even degrade beyond that. It's like asking a genius a question but forcing them to read ten irrelevant books before they answer. The noise drowns out the signal.

A better retriever doesn't just find the right page. It finds only the right pages.

A Leaner Signal is a Truer Signal

This principle has a surprising and powerful side effect. By optimizing a retriever for efficiency (compressing its storage and speeding up its queries), we found it also made the final AI more truthful.

A faster, leaner retriever reduced the LLM's hallucination rate by 33% in a legal summarization task. A clean, precise signal isn't just about speed; it's about reliability. In manufacturing, where a hallucinated safety spec can be catastrophic, this is not a trivial improvement.

Built on Visual Retrieval

Samrian's RAG system uses multimodal retrieval to understand your manufacturing documents: tables, charts, and handwritten notes included.

Conclusion: Fix the Supply Chain

The industry's obsession with the LLM is a red herring. The hardest part of AI is the unglamorous, upstream work of forging a pristine information supply chain.

The next frontier of performance won't be won by those with the biggest models, but by those who master the physics of information itself.


Frequently Asked Questions

What is RAG in AI?

RAG (Retrieval Augmented Generation) is a technique where an AI system first retrieves relevant documents from a knowledge base and then uses those documents to generate accurate answers. The quality of retrieval directly impacts the quality of the final answer.

Why is visual retrieval better than text-based retrieval?

Visual retrieval analyzes the actual layout and structure of documents, not just the text. In manufacturing documents with tables, charts, and diagrams, the spatial arrangement of information is critical context that text-only systems miss.

How many documents should I give my LLM?

Research shows that LLM performance plateaus around 10 documents and can degrade with more. Quality over quantity: it's better to retrieve fewer, highly relevant documents than to flood the context with noise.

What causes AI hallucinations in manufacturing?

Hallucinations often stem from poor retrieval quality. When the AI receives irrelevant or low-quality documents, it fills gaps with plausible-sounding but incorrect information. A precise retrieval system dramatically reduces this risk.