Pseudonymization — no AI ever sees your data

The problem: AI and personal data

A legal query is full of personal data — names, addresses, bank details, case numbers. Send that text to an AI unchanged and personal data leaves the organisation and ends up with an external model provider. That is delicate under data-protection law and, for a legal department, simply not acceptable.

LEGALinhouse solves this at the root: the language model never gets to see the personal data in the first place. It works exclusively with placeholders and therefore cannot disclose client data — it never saw it.

17 types of personal data

3 phases

0 plain data sent to the AI

Pseudonymization — and explicitly not anonymization

The real values are replaced with consistent placeholders: "Hans Bauer" becomes [PERSON_A], "ACME GmbH" becomes [UNTERNEHMEN_A], an IBAN becomes [IBAN_A]. Consistent means: the same name gets the same placeholder throughout the whole operation — so the AI can still follow who is dealing with whom, without knowing the real names.

At the end, the placeholders are translated back into the real values before the member of staff sees the answer. Precisely because this step is possible, this is pseudonymization within the meaning of Art. 4(5) GDPR — the mapping is reversible — and not anonymization (which would be final and irreversible). We label this correctly on purpose: pseudonymized data remains personal data and stays subject to the GDPR — which is why the mapping table is protected and never handed to the AI.

The three phases

Every text runs through the same three-stage pipeline before and after an AI is involved:

Text containing personal data → Phase 1: detect & replace (our own servers in Germany) → Phase 2: AI sees placeholders only (EU region Frankfurt) → Phase 3: translate back for the member of staff → Answer with real data — on screen only

What happens in each phase

Phase 1 — detect & replace. On our own infrastructure in Germany, personal data is detected and replaced with placeholders. Detection combines the known case contacts, pattern matching (e.g. IBAN, email, phone number) and AI-assisted name recognition. The placeholder ⇄ original mapping stays with us in the EU infrastructure.
Phase 2 — AI processing. Only the pseudonymized text goes to the language model (data centre in the EU region Frankfurt, inputs not used for training). The model sees placeholders only.
Phase 3 — translate back. The AI answer contains the same placeholders. They are translated back into the real values via the mapping before the member of staff sees the answer.

What gets detected

17 types of personal data are currently detected and replaced, among them:

Detected data types (selection)

Personal names and company names
Addresses and postal code/city
Email addresses and phone numbers
IBAN, account and credit-card data
Tax IDs and VAT IDs
Dates of birth, social-security and health-insurance numbers
Case numbers and other numeric identifiers

Detection is deliberately cautious: in case of doubt a possible name is replaced rather than missed. Purely legal terms (such as "BGB" or "Hafenverordnung") are not mistaken for names.

The consequences

Personal data does not leave the EU. Detection and replacement happen on our own servers in Germany; AI processing runs in the EU region Frankfurt.
The AI platform is a processor — but sees no plain data. Even in a theoretical breach of the AI platform, no client datum would be affected: none was ever there.
The mapping stays protected and separate. The mapping table (the "additional information" in the sense of the GDPR) lives in your client database, is never handed to the AI, and is deleted with the case.
Everything is logged. The audit trail documents every pseudonymization — who, when, how many data types.

Control & the AI-free path

Pseudonymization is reviewable per case: in a dedicated area, authorised staff see the detected pairs (original ⇄ placeholder) and can correct them — release a wrongly detected term or add an extra name, either just for the case or organisation-wide.

And anyone who does not want to use AI at all for a particular task can switch it off: LEGALinhouse offers a fully AI-free path in which no text is handed to a language model.

Want to go deeper? The technical detail — including infrastructure and data location — is in the AI Architecture. For the full picture on protection and compliance, see Security.