The limitation
If you use DeepResearch from OpenAI, Gemini, DeepSeek, or any similar product, you need to know you have blindfolds you probably aren’t aware of. AI-assisted search or research of any kind whether it uses its context for retrieval, Retrieval Augmented Generation (RAG), or some advanced variation like GraphRAG, suffer from the same flaw: there is no way to know what it missed, unless, of course, you have manually done it yourself and compared results. However, it would defeat the whole purpose of automated research.
So, if the cost of omission is disproportionate, we need to be extra careful when incorporating AI for search. This applies especially to industries like medicine, law, pharmaceuticals, and similar fields where missing a small piece of information can have disastrous consequences.
That said, It can also be argued that not even humans can claim to find results with 100% accuracy every time. This is true but there are socioeconomic and technical differences between the nature and consequence of these omissions I have discussed below:
Differences between AI and humans
- Insurance:
The partial cost of potential ommisions can be offloaded to insurance companies in some cases. I am not aware of any insurance company offerring hallucination insurance of any kind.
- Lack of Introspection
While our brains are as much of a black box as AI, we have a pretty good understanding of our limitations of knowledge and we can express it. Current AI systems don’t know their own limitations very well nor can they express the level of certainty about each fact they think they know. It means we know when we need to ask other people for information or rely upon external tools unlike LLMs. For example, they will often give outdated information because they are not aware of the temporal nature of some information.
- Jagged Intelligence
LLMs don’t natively understand other modalities of data like image or videos, I have written elsewhere about it. It means LLms can surprisingly fail for some simple tasks and we don’t quite understand their failure modes very well. Combined with lack of introspection, this creates a dangerous recipe of confidently asserting incorrect facts.
- Regulation and Trust
When we visit a doctor, we cannot confirm whether their prognosis is correct or not. Of course, the cost of omissions is very high here as well, but somehow we trust doctors. Why? Because we know doctors have passed standardized exams in medicine, and thus we indirectly trust those who certified them. AI is still in its early stages and is not regulated like law or medicine so there is a lack of these systems of trust and regulatory bodies.
A partial solution
Building Trust
If we can’t verify a system’s response every time, we need to build trust. Since search isn’t as complex as medicine or law, we can do this ourselves. We can create an internal benchmark of difficult questions that require reasoning over multiple documents to answer correctly. Create as many questions as we can so they are representative of the types we normally search.
Now we have a way to test every AI research tool. Note that many research tools may be disqualified because some questions will require retrieving facts from document types they don’t support but that’s to be expected.
Just be aware of one thing: AI is pretty good at memorization, so change your questions every few months if possible.
Introducing Transparency
Constrain or transform search space in some way so that it’s easier for humans to verify each step of search. For example, it’s easier to verify a database search than a fact retrieval in a document but of course this is not always possible. Building on Trust, we need to trust each step of the search process if it can’t be broken into substeps.