AI Policy Parser

Insurance / AI Document Processing

95%

Automated extraction accuracy

2-stage

LLM pipeline (Haiku + Sonnet)

JSON

JSON output from unstructured PDFs

Overview

Built an AI-powered document processing system that automatically extracts structured data from insurance policy PDFs. The system uses a cost-optimized multi-model LLM pipeline: Claude 3 Haiku for fast document classification and section detection, and Claude 3.5 Sonnet for precise structured data extraction, deployed via AWS Bedrock.

Business Context

Insurance brokers at Insly needed to process policy documents from 23+ insurance providers, each with unique PDF formats. Manual data entry was error-prone and time-consuming, with each document requiring 20-40 minutes of manual work. The extracted data was needed to populate the CRM and enable policy comparison features.

Challenge

Insurance brokers spent hours manually extracting key data from policy PDF documents from dozens of insurers, each with different layouts and terminology.

  • Policy documents from 23+ insurers with completely different PDF layouts and structures
  • Complex insurance terminology requiring domain understanding for accurate extraction
  • Cost constraints requiring smart routing between expensive and cheap LLM models

Solution

Designed a multi-stage pipeline using a smaller, faster model (Claude 3 Haiku) for initial document classification and section detection, then routing relevant sections to the more capable Claude 3.5 Sonnet for precise structured extraction. This approach reduced API costs by 60% while maintaining high extraction accuracy.

  • Multi-model LLM pipeline: Haiku for detection, Sonnet for structured extraction
  • FastAPI service with async processing and Qdrant for document similarity
  • Elasticsearch indexing for extracted structured policy data

Approach & Methodology

Started with a proof-of-concept using a single LLM model, then identified cost as the main constraint at scale. Designed the multi-model pipeline to use cheap models for classification and expensive models only where needed. Iterated on prompts with domain experts to handle edge cases in Polish insurance terminology.

Implementation Details

Multi-Model LLM Routing

Implemented intelligent routing between Claude 3 Haiku (detection) and Claude 3.5 Sonnet (extraction) to optimize the cost-accuracy tradeoff. Haiku classifies document sections at low cost, Sonnet extracts structured data only from relevant sections.

FastAPI Async Processing Service

Built a high-throughput FastAPI service with async processing for batch PDF handling. Integrated Google Document AI for initial PDF text extraction and layout analysis before LLM processing.

Structured Data Storage & Search

Extracted structured policy data is indexed in Elasticsearch for full-text search, with Qdrant for semantic similarity matching to find related policy documents and precedents.

Key Decisions

  • AWS Bedrock over direct Anthropic API — unified billing, IAM security, and no API key management
  • Two-stage LLM pipeline — Haiku for speed/cost, Sonnet for accuracy on complex extractions
  • Google Document AI for PDF layout — better than raw pdfplumber for complex multi-column insurance documents

Tech Stack

Python FastAPI AWS Bedrock Claude 3.5 Sonnet Qdrant Elasticsearch Google Document AI

Related Services

The following services were utilized in this project to deliver successful outcomes.

Lessons Learned

  • Prompt engineering for structured JSON output requires extensive examples and explicit schema descriptions
  • Model routing based on document complexity reduces costs significantly without sacrificing accuracy
  • Human-in-the-loop validation for low-confidence extractions is essential for production reliability

Project Information

Timeline

4 months

Team

2 engineers (ML + backend)

Results

95%

Automated extraction accuracy

2-stage

LLM pipeline (Haiku + Sonnet)

JSON

JSON output from unstructured PDFs

Have a Similar Challenge?

Let's discuss how I can help your project succeed with proven architecture and AI solutions.