AI Architecture

The hallucination problem in legal AI

Generative language models invent sources. In a legal context this is not harmless: a fabricated BGH decision in a brief, a non-existent case reference in client correspondence, or a misquoted statutory norm can have professional, liability and reputational consequences. Universal chatbots have this problem structurally — they generate language statistically, not from a verified knowledge base.

LEGALinhouse solves the problem on two levels: every legal statement must be traceable to an entry in the internal knowledge graph, and every citation is checked against the case law database before and after generation. This is not a filter on top of a generic model, but an architectural decision that runs through the entire system.

Background: Stanford 2024 study, documented hallucinations in court filings worldwide, and how systemic the problem is in free AI assistants — When AI invents the law (ceaveo.com).

The legal knowledge graph

At the heart of LEGALinhouse sits a self-built knowledge graph of German legal sources. It is the only truth the AI may draw on for legal questions.

98,431 Statutory norms

83,281 Court decisions

256,085 Judgment relations

The graph contains federal and state norms with entry-into-force and repeal dates, and decisions of the highest German courts (BVerfG, BGH, BVerwG, BFH, BSG, BAG) and the higher regional courts (OLG). Every decision is linked to the norms cited in its reasoning — this creates the citation network the AI can traverse during research, instead of guessing in full text.

Practically that means: When the AI answers a question on terminating a works contract, it does not retrieve from model memory but loads §§ 648, 648a BGB plus the most cited decisions of recent years and writes the answer against those sources. If a paragraph is repealed or a decision is overturned, the answer changes automatically.

Norm analysis — sourced legal interpretation

Knowing a norm is one thing; knowing what it means in a specific case is another. From the knowledge graph, LEGALinhouse builds a structured, fully sourced interpretation for every norm — drawn solely from its own legal corpus (German primary law + case law), without licensed commentary literature. Nothing is freely generated; every statement points to a source held in the system.

Organised by the five methods of interpretation

Literal — the legal definitions the norm text itself carries.
Systematic — the cross-references into and out of the norm, including constitutional and EU-law anchors.
Historical — the norm's version history.
Teleological — the legislative purpose, grounded in official, public-domain legislative materials: for EU regulations from the recitals, for German law from the explanatory memorandum of the Bundestag printed papers — for many central codifications (incl. BGB, VVG, FamFG, trademark and social law), continually expanded.
Case law — the decisions that apply the norm.

Honest, not overconfident. Every norm analysis carries a four-band confidence signal — from "settled case law" to "no settled case law". A norm with no case law in the corpus is flagged as such, rather than feigning a certainty that does not exist.

Current. A relationship graph over the judgments shows which decisions cite, affirm, distinguish or overrule others — so no overruled case law appears as current law, and doctrinal splits are recognisable as such. Research combines full-text and semantic search, so even paraphrased questions find the right passages.

Norm analysis works not only in research but also grounds the AI drafts (letters, briefs, chat). It is a sourced draft for review — not a substitute for legal advice, and, for purely doctrinal questions without case law, not a substitute for a commentary (see RDG and limits).

Assistant routing

A generic prompt produces generic answers. LEGALinhouse routes every query to one of 19 specialized German legal assistants — specialized system prompts with their own vocabulary, standard literature and retrieval configuration. Examples: labor law, tenancy law, corporate law, contract law, IT law, data protection law, tax law, criminal law, administrative law. On top of that come further assistant profiles for international jurisdictions — over 140 profiles in total.

How the routing works

Classification: A lightweight first step identifies the relevant legal area from the query and the matter context.
Load assistant: The matching system prompt with vocabulary, retrieval rules and output schemas is activated.
Source retrieval: Knowledge-graph query for the norms and decisions relevant to this legal area.
Generation: The answer is written against the loaded assistant context.

The advantage is not only language quality — the assistant knows which norms are relevant for a question, knows the typical argument structure, and distinguishes between case law and prevailing opinion in commentary literature.

Agentic workflows

A pure chat AI answers questions. Agentic AI acts. In LEGALinhouse this means: the AI can chain multiple steps across a matter autonomously — within a controlled corridor and with human-in-the-loop at every handoff.

Capture facts → Classification (legal area, process type) → Research in the knowledge graph → Draft: letter to the opposing party → Deadline suggestion into the calendar → Human reviews and approves

The steps are not hardcoded — the AI plans them from the facts. But every step is logged traceably (which assistant, which sources, which model, which inputs), and every output produced is marked as a draft until an authorized employee approves it. There is no auto-send function for AI output.

This draft marking is not cosmetic, but RDG-relevant: LEGALinhouse is a productivity tool, not a legal service. Professional responsibility stays with the human — the AI only accelerates the path there.

Case-citation protection (two-stage)

Citations are the most common hallucination failure in legal AI. We prevent it on two levels.

1. Generation-time constraint

Before the AI writes an answer, it loads a concrete list of retrieved decisions from the knowledge graph. It can cite exclusively from this list. Court, date and reference number come structured from the graph entry — not from the free text of the generation. This eliminates the main source of fabricated citations.

2. Post-hoc validation

After generation, the answer is scanned: every case reference appearing in the text is verified against the case law database. If the court-date-reference combination does not match, the citation is flagged in the output and the employee is prompted to verify manually. An unverifiable source is not silently emitted.

Result

In practice this means: a case reference in a LEGALinhouse response either comes from the real case law database — or it is explicitly marked as unverified. Hallucinated citations that appear authentic are structurally impossible.

3-phase pseudonymization

Before any client text touches an AI, it runs through a three-stage pipeline. Goal: the language model sees no identifying data — only placeholders. Personal data stays in the EU infrastructure under our operation.

Phase 1 — Pseudonymization on our own servers in Germany

A detection layer on our own infrastructure in Germany replaces 17 entity types with consistent placeholders. Detected entities include personal names, companies, addresses, email addresses, phone numbers, IBAN and account data, dates of birth, tax IDs, case reference numbers, vehicle registration plates, URLs, and others. The placeholder ⇄ original mapping stays in an ephemeral mapping table with us.

Phase 2 — AI processing

The pseudonymized text is handed to an inference platform with a datacenter in the EU region Frankfurt. This platform does not process inputs for training purposes and is fully subject to EU data protection law. The model sees only the pseudonymized text and cannot leak client data — it has never seen it.

Phase 3 — Re-personalization

The AI response contains the same placeholders. They are translated back using the mapping table before the employee sees the output. The mapping table is then discarded.

GDPR consequences

Personal data does not leave the EU.
The AI platform is a processor but sees no PII.
Even in a hypothetical breach of the inference platform, no client data would be affected.
The audit trail documents every pseudonymization — who, when, what, how many entities.

AI-free import path

Some documents you fundamentally do not want to run through an AI — for example highly sensitive personnel files, whistleblower notifications or criminal matter files. For this case, LEGALinhouse provides a fully AI-free import path: DOCX, TXT, EML, MSG, HTML and RTF are parsed locally, indexed and filed into the matter without the inference pipeline being touched. OCR for pure text extraction also runs AI-free; only the categorization would need AI — and is skipped in this path.

The employee sees the document in the matter, can read, forward and reference it — with the certainty that it has never gone into an external model.

Audit trail & EU AI Act

The EU AI Act classifies legal AI applications as high-risk systems. This imposes requirements on logging, human oversight and transparency that LEGALinhouse meets by design:

Full logging: Every AI operation — which assistant, which sources, which inputs, which model, which tokens — is logged and visible in the tenant's audit log.
Mandatory human approval: No AI output leaves the system without human review. Auto-send does not exist.
Draft marking: AI-generated content is explicitly marked as AI draft in the UI and audit log until an authorized employee logs approval.
Explainability: For every AI answer, knowledge graph sources and the assistant used can be traced.

Sovereignty & data location

The entire infrastructure sits in the EU. Client data is processed in a datacenter in Frankfurt. AI inference runs on a platform in the EU region Frankfurt — no data flow to third countries, no sub-processing outside the EU.

We name our infrastructure components functionally rather than by vendor. Independence also means not being uncancelably tied to a single hyperscaler or model provider. The architecture is built so that the inference platform and storage remain swappable as long as they sit in the EU.

Ready for the practical side? Request beta access or read on about a level playing field for the Mittelstand and Security & Compliance.