The hallucination problem in legal AI
Generative language models invent sources. In a legal context this is not harmless: a fabricated BGH decision in a brief, a non-existent case reference in client correspondence, or a misquoted statutory norm can have professional, liability and reputational consequences. Universal chatbots have this problem structurally — they generate language statistically, not from a verified knowledge base.
LEGALinhouse solves the problem on two levels: every legal statement must be traceable to an entry in the internal knowledge graph, and every citation is checked against the case law database before and after generation. This is not a filter on top of a generic model, but an architectural decision that runs through the entire system.
Background: Stanford 2024 study, documented hallucinations in court filings worldwide, and how systemic the problem is in free AI assistants — When AI invents the law (ceaveo.com).
The legal knowledge graph
At the heart of LEGALinhouse sits a self-built knowledge graph of German legal sources. It is the only truth the AI may draw on for legal questions.
The graph contains federal and state norms with entry-into-force and repeal dates, and decisions of the highest German courts (BVerfG, BGH, BVerwG, BFH, BSG, BAG) and the higher regional courts (OLG). Every decision is linked to the norms cited in its reasoning — this creates the citation network the AI can traverse during research, instead of guessing in full text.
Practically that means: When the AI answers a question on terminating a works contract, it does not retrieve from model memory but loads §§ 648, 648a BGB plus the most cited decisions of recent years and writes the answer against those sources. If a paragraph is repealed or a decision is overturned, the answer changes automatically.
Specialist routing
A generic prompt produces generic answers. LEGALinhouse routes every query to one of 19 German legal specialists — specialized system prompts with their own vocabulary, standard literature and retrieval configuration. Examples: labor law, tenancy law, corporate law, contract law, IT law, data protection law, tax law, criminal law, administrative law. On top of that come further specialist profiles for international jurisdictions — over 140 profiles in total.
How the routing works
- Classification: A lightweight first step identifies the relevant legal area from the query and the matter context.
- Load specialist: The matching system prompt with vocabulary, retrieval rules and output schemas is activated.
- Source retrieval: Knowledge-graph query for the norms and decisions relevant to this legal area.
- Generation: The answer is written against the loaded specialist context.
The advantage is not only language quality — the specialist knows which norms are relevant for a question, knows the typical argument structure, and distinguishes between case law and prevailing opinion in commentary literature.
Agentic workflows
A pure chat AI answers questions. Agentic AI acts. In LEGALinhouse this means: the AI can chain multiple steps across a matter autonomously — within a controlled corridor and with human-in-the-loop at every handoff.
The steps are not hardcoded — the AI plans them from the facts. But every step is logged traceably (which specialist, which sources, which model, which inputs), and every output produced is marked as a draft until an authorized employee approves it. There is no auto-send function for AI output.
This draft marking is not cosmetic, but RDG-relevant: LEGALinhouse is a productivity tool, not a legal service. Professional responsibility stays with the human — the AI only accelerates the path there.
Case-citation protection (two-stage)
Citations are the most common hallucination failure in legal AI. We prevent it on two levels.
1. Generation-time constraint
Before the AI writes an answer, it loads a concrete list of retrieved decisions from the knowledge graph. It can cite exclusively from this list. Court, date and reference number come structured from the graph entry — not from the free text of the generation. This eliminates the main source of fabricated citations.
2. Post-hoc validation
After generation, the answer is scanned: every case reference appearing in the text is verified against the case law database. If the court-date-reference combination does not match, the citation is flagged in the output and the employee is prompted to verify manually. An unverifiable source is not silently emitted.
Result
In practice this means: a case reference in a LEGALinhouse response either comes from the real case law database — or it is explicitly marked as unverified. Hallucinated citations that appear authentic are structurally impossible.
3-phase anonymization
Before any client text touches an AI, it runs through a three-stage pipeline. Goal: the language model sees no identifying data — only placeholders. Personal data stays in the EU infrastructure under our operation.
Phase 1 — Pseudonymization on our own servers in Germany
A detection layer on our own infrastructure in Germany replaces 15 entity types with consistent placeholders. Detected entities include personal names, companies, addresses, email addresses, phone numbers, IBAN and account data, dates of birth, tax IDs, case reference numbers, vehicle registration plates, URLs, and others. The placeholder ⇄ original mapping stays in an ephemeral mapping table with us.
Phase 2 — AI processing
The pseudonymized text is handed to an inference platform with a datacenter in the EU region Frankfurt. This platform does not process inputs for training purposes and is fully subject to EU data protection law. The model sees only the anonymized text and cannot leak client data — it has never seen it.
Phase 3 — Re-personalization
The AI response contains the same placeholders. They are translated back using the mapping table before the employee sees the output. The mapping table is then discarded.
GDPR consequences
- Personal data does not leave the EU.
- The AI platform is a processor but sees no PII.
- Even in a hypothetical breach of the inference platform, no client data would be affected.
- The audit trail documents every anonymization — who, when, what, how many entities.
AI-free import path
Some documents you fundamentally do not want to run through an AI — for example highly sensitive personnel files, whistleblower notifications or criminal matter files. For this case, LEGALinhouse provides a fully AI-free import path: DOCX, TXT, EML, MSG, HTML and RTF are parsed locally, indexed and filed into the matter without the inference pipeline being touched. OCR for pure text extraction also runs AI-free; only the categorization would need AI — and is skipped in this path.
The employee sees the document in the matter, can read, forward and reference it — with the certainty that it has never gone into an external model.
Audit trail & EU AI Act
The EU AI Act classifies legal AI applications as high-risk systems. This imposes requirements on logging, human oversight and transparency that LEGALinhouse meets by design:
- Full logging: Every AI operation — which specialist, which sources, which inputs, which model, which tokens — is logged and visible in the tenant's audit log.
- Mandatory human approval: No AI output leaves the system without human review. Auto-send does not exist.
- Draft marking: AI-generated content is explicitly marked as AI draft in the UI and audit log until an authorized employee logs approval.
- Explainability: For every AI answer, knowledge graph sources and the specialist used can be traced.
Sovereignty & data location
The entire infrastructure sits in the EU. Client data is processed in a datacenter in Frankfurt. AI inference runs on a platform in the EU region Frankfurt — no data flow to third countries, no sub-processing outside the EU.
We name our infrastructure components functionally rather than by vendor. Independence also means not being uncancelably tied to a single hyperscaler or model provider. The architecture is built so that the inference platform and storage remain swappable as long as they sit in the EU.
Ready for the practical side? Request beta access or read on about a level playing field for the Mittelstand and Security & Compliance.