Inside OpenClaw #1: Web Search Without Hallucination
Local LLMs fabricate search results — URLs, facts, entire sources. Here's how we solved this in OpenClaw: with architecture, not prompt engineering.
Ask a local LLM to search the web, and something interesting happens: it doesn’t tell you it can’t. Instead, it confidently generates search results — complete with URLs, snippets, and source attributions. The problem? They’re entirely made up.
This isn’t a minor nuisance. For an AI agent that’s supposed to retrieve real information, fabricated sources are a critical failure mode. I address challenges like these in my AI and automation consulting. And it’s one of the hardest problems to solve when building on local models.
Why Prompt Engineering Isn’t Enough
The obvious first attempt is to tell the model not to hallucinate. Add instructions like “only return real URLs” or “if you don’t know, say so.” This helps — but only marginally. In our testing, prompt-level interventions reduced hallucinated search results by roughly 30%. That still leaves the majority of outputs unreliable.
The root cause isn’t a prompting problem. It’s an architectural one. The model doesn’t have access to the internet, so it fills the gap with plausible-sounding content drawn from its training data. No amount of instruction can fix that.
The Solution: Three Architectural Layers
In OpenClaw, we solved web search hallucination by removing the model from the search process entirely. The architecture enforces correctness through three layers:
1. Strict Tool Separation
The model never generates search results directly. Instead, it can only request a search by calling a defined tool. The actual search is executed by the Gateway, which queries real search APIs. The model receives real results — it doesn’t produce them.
This is the most important design decision. By making the model a consumer of search data rather than a producer, we eliminate the primary hallucination vector.
2. Auto-Fetch: Real Content from Real Pages
Returning a list of URLs and snippets isn’t enough. Models will still hallucinate details about what a page “probably says.” To counter this, OpenClaw’s Gateway automatically fetches the actual page content from the top search results and includes it in the model’s context.
The model now reasons over real content, not imagined summaries. This dramatically improves factual accuracy in the final response.
3. Deduplication to Break Hallucination Loops
Even with real search data, we observed a subtle failure mode: the model would sometimes enter a loop where it repeatedly requested the same search with slight variations, gradually drifting back toward hallucinated content. Our deduplication layer detects and breaks these loops, ensuring the model moves forward with the information it already has.
Model Size Matters
Not all models handle tool-based search equally well:
- Below 14B parameters: Unreliable. Models frequently ignore tool results and fall back to generating their own content.
- 14B to 24B: Borderline. Works in many cases but requires careful prompt design and monitoring.
- 24B and above: Stable. Models consistently use provided search data and respect tool boundaries.
We run Mistral Small 24B, which sits comfortably in the stable range for this task.
Temperature for Tool Calling
For tool-calling interactions, we keep the temperature between 0.1 and 0.3. Higher values increase creativity — but creativity is exactly what you don’t want when the model is deciding which tools to invoke and how to structure the call. Low temperature keeps tool interactions predictable and reliable.
The Takeaway
Hallucination in local AI agents isn’t a prompting problem — it’s a systems design problem. The solution isn’t to ask the model to be more careful. It’s to build an architecture where the model physically cannot produce the wrong kind of output.
This is one of the core design principles behind OpenClaw. If you want to understand the broader architecture, see our articles on OpenClaw as a personal AI assistant and running a local AI agent on a single GPU. For details on how agents coordinate multi-step tasks, read about autonomous agent orchestration.
Next Step
Building AI systems that need to get facts right? I specialize in local AI architectures that are reliable, privacy-compliant, and production-ready.
→ Or read more first: AI in SMEs — Where It Actually Helps