Inside OpenClaw

Inside OpenClaw #1: Web Search Without Hallucination

Local LLMs fabricate search results — URLs, facts, entire sources. Here's how we solved this in OpenClaw: with architecture, not prompt engineering.

Ask a local LLM to search the web, and something interesting happens: it doesn’t tell you it can’t. Instead, it confidently generates search results — complete with URLs, snippets, and source attributions. The problem? They’re entirely made up.

This isn’t a minor nuisance. For an AI agent that’s supposed to retrieve real information, fabricated sources are a critical failure mode. I address challenges like these in my AI and automation consulting. And it’s one of the hardest problems to solve when building on local models.

Why Prompt Engineering Isn’t Enough

The obvious first attempt is to tell the model not to hallucinate. Add instructions like “only return real URLs” or “if you don’t know, say so.” This helps — but only marginally. In our testing, prompt-level interventions reduced hallucinated search results by roughly 30%. That still leaves the majority of outputs unreliable.

The root cause isn’t a prompting problem. It’s an architectural one. The model doesn’t have access to the internet, so it fills the gap with plausible-sounding content drawn from its training data. No amount of instruction can fix that.

The Solution: Three Architectural Layers

In OpenClaw, we solved web search hallucination by removing the model from the search process entirely. The architecture enforces correctness through three layers:

1. Strict Tool Separation

The model never generates search results directly. Instead, it can only request a search by calling a defined tool. The actual search is executed by the Gateway, which queries real search APIs. The model receives real results — it doesn’t produce them.

This is the most important design decision. By making the model a consumer of search data rather than a producer, we eliminate the primary hallucination vector.

2. Auto-Fetch: Real Content from Real Pages

Returning a list of URLs and snippets isn’t enough. Models will still hallucinate details about what a page “probably says.” To counter this, OpenClaw’s Gateway automatically fetches the actual page content from the top search results and includes it in the model’s context.

The model now reasons over real content, not imagined summaries. This dramatically improves factual accuracy in the final response.

3. Deduplication to Break Hallucination Loops

Even with real search data, we observed a subtle failure mode: the model would sometimes enter a loop where it repeatedly requested the same search with slight variations, gradually drifting back toward hallucinated content. Our deduplication layer detects and breaks these loops, ensuring the model moves forward with the information it already has.

Model Size Matters

Not all models handle tool-based search equally well:

  • Below 14B parameters: Unreliable. Models frequently ignore tool results and fall back to generating their own content.
  • 14B to 24B: Borderline. Works in many cases but requires careful prompt design and monitoring.
  • 24B and above: Stable. Models consistently use provided search data and respect tool boundaries.

We run Mistral Small 24B, which sits comfortably in the stable range for this task.

Temperature for Tool Calling

For tool-calling interactions, we keep the temperature between 0.1 and 0.3. Higher values increase creativity — but creativity is exactly what you don’t want when the model is deciding which tools to invoke and how to structure the call. Low temperature keeps tool interactions predictable and reliable.

The Takeaway

Hallucination in local AI agents isn’t a prompting problem — it’s a systems design problem. The solution isn’t to ask the model to be more careful. It’s to build an architecture where the model physically cannot produce the wrong kind of output.

This is one of the core design principles behind OpenClaw. If you want to understand the broader architecture, see our articles on OpenClaw as a personal AI assistant and running a local AI agent on a single GPU. For details on how agents coordinate multi-step tasks, read about autonomous agent orchestration.


Next Step

Building AI systems that need to get facts right? I specialize in local AI architectures that are reliable, privacy-compliant, and production-ready.

Book a free consultation

→ Or read more first: AI in SMEs — Where It Actually Helps

About the Author René Pfisterer

10+ years in ERP integration, data migration, and process automation for mid-sized companies. Specialized in DATEV, SAP, and AI implementation.

Full profile →
← Previous article Zero API Costs: How We Run an AI Agent on a Single GPU Next article → Inside OpenClaw #2: The Hidden vLLM Flags for Mistral

Interested?

Let's discuss how I can help in a short conversation.