Upload & versioning

Documents can be uploaded via drag-and-drop into a matter, a contract or the central document repository. Every new version of a document is tracked as a version — the previous version remains accessible, so it can be reconstructed what state applied at what time. Deletes are soft deletes with recovery; nothing is silently removed.

Supported upload formats: PDF, DOCX, XLSX, PPTX, TXT, EML, MSG, HTML, RTF, common image formats. Storage location: EU datacenter Frankfurt, encrypted at-rest.

OCR

Scanned documents and image PDFs are useless for search without text recognition. LEGALinhouse automatically applies an OCR layer to uploaded documents, so full text becomes searchable and readable for the AI. OCR is set by default on every upload and runs in the background.

OCR itself is a pure text-extraction procedure — it reads what is on the paper, nothing more. It does not send content to a language model.

AI categorization

So a document does not just sit on the server but lands in the right matter, on the right deadline and with the right document type, the AI proposes a categorization:

  • Document type (contract, brief, authority letter, invoice, internal memo, ...)
  • Matter and contract assignment (based on detected reference numbers, names, dates)
  • Deadline detection (e.g. opposition period in an administrative decision)
  • Brief summary for the file

The suggestions go to the inbox; an employee confirms them. Before any AI processing the 3-phase anonymization runs.

AI-free import path

For highly sensitive documents — personnel files, whistleblower notifications, criminal matter files, M&A documents — LEGALinhouse offers a fully AI-free import path. The document is uploaded, OCR text is extracted, the document is filed in the matter and made searchable — the inference pipeline is never touched.

Which formats work AI-free?

DOCX, TXT, EML, MSG, HTML and RTF are parsed entirely locally — no AI in the processing path. PDFs and images go through OCR (no AI), but the subsequent categorization is omitted in this path. You see the document in the matter, without it ever having touched an external model.

The selection is made by tenant admin per document type or per upload source — a safe default, rather than deciding every upload individually.

All documents (including OCR-recognized scans) are full-text searchable — across all modules. A search for a case reference finds the authority letter, the dunning notice, the email attachment and the contract clause in the same result list. Filters by document type, matter, date, tenant and employee are available.