Juriscan
Context
MVP for automated data extraction from legal documents (writs of summons, enforcement orders, official reports) for a firm of judicial officers. OCR + AI interpretation, structured field extraction, export into the client's internal CRM. One month solo as a freelancer, from scoping to delivery.
My approach
The core: a two-stage pipeline. First an OCR with layout understanding on the PDF, then an LLM that reads the resulting markdown and identifies the fields to extract (parties, dates, amounts, addresses) per document type. All running locally on GPU — no cloud APIs, sensitive data demands it. A Rust worker to orchestrate multithreading and keep throughput acceptable on batches of PDFs.
- 100% self-hosted AI pipeline: DeepSeek-OCR for the read, DeepSeek-Coder-V2 for structuring, llama.cpp + CUDA for inference — data never leaves the client's server.
- Rust worker for multithreading and orchestration: parallel OCR, predictable memory — where Node would have folded under volume.
- Vue 3 + shadcn-vue front: upload, PDF preview on the left, editable extracted data on the right, CSV export to the client CRM.
Stack & technical choices
AdonisJS 6 + PostgreSQL for persistence and the API on the back. Vue 3 + Tailwind + shadcn-vue on the front. Infrastructure centerpiece: a Rust worker that calls DeepSeek-OCR then DeepSeek-Coder-V2 (via llama.cpp) on a local GPU — multithreaded, deployed via Docker + NVIDIA Container Toolkit. Fully self-hosted: a firm of judicial officers can't send its documents to OpenAI.
Outcome & takeaways
MVP delivered, currently in testing on the client side. On a personal level, this is my first real AI project — and the first one I built mostly in pair with an AI coding assistant. A stack I didn't know (Rust, llama.cpp, local DeepSeek on CUDA), learned and shipped in one month — exactly the use case where the assistant speeds you up without degrading quality, as long as you stay in charge of the architecture.