FinTech

AI Loan Document
Classifier

When a loan application lands, this Python + n8n workflow uses GPT-4o Vision to extract the document type and every key field — then routes it to the correct checklist, flags any missing documents, and notifies the underwriter in under 60 seconds.

8 hrs/day

saved on intake

vs. manual processing

<60 sec

processing time

per document set

99%

classification accuracy

across doc types

0

missed documents

completeness check

GPT-4o VisionPythonFastAPIn8nEmail
← Browse all workflows

Typical build: 2–3 week sprint · Fixed price · Zero delivery risk

Live workflow — triggers on document upload
PDF UploadWeb form / APIGPT-4o VisionExtract fieldsClassifierRoute to checklistComplete?Doc inventoryNotify UnderwriterSlack / Email12345

Trigger

Webhook upload

Avg runtime

<60 seconds

Error handling

Auto-retry ×3

The problem

Why loan teams still sort documents manually

Hours lost per application

Underwriters spend 60–90 minutes per application manually reviewing, labelling, and sorting uploaded documents before the actual credit analysis can begin. At high volume, this consumes entire workdays.

Missing docs delay closings

When the checklist is checked manually, items get missed. A single missing document discovered late in the process can delay a loan closing by days — costing both the borrower and the lender.

No audit trail on classification

Manual review leaves no structured record of who classified what, when, and with what confidence. This creates compliance exposure during regulatory exams and loan audits.

How it works

Every step, explained

This is the actual workflow Kovil AI builds and deploys — not a diagram. Here's what runs inside every node.

1
n8n Webhook

Document uploaded to intake portal via web form or API

A loan officer or borrower uploads documents to the intake portal. n8n's Webhook node receives the file payload and passes the base64-encoded document to the processing pipeline. Supported formats: PDF, JPG, PNG, TIFF. Max file size: 25MB. Files are stored temporarily in encrypted S3 storage during processing.

n8n WebhookS3 StorageEncrypted transit
2
GPT-4o Vision

GPT-4o Vision identifies the document type and extracts all key fields

GPT-4o Vision receives the document image and runs a structured extraction prompt. Output JSON contains: document_type (W-2, bank statement, pay stub, tax return, etc.), confidence_score, and a fields object with all extracted values. The prompt is engineered to handle poor scan quality, handwritten notes, and multi-page documents.

OpenAI GPT-4o VisionStructured JSON outputMulti-page support
3
Python Classifier

Document routed to the correct checklist based on type

A Python function maps each document_type to the corresponding loan checklist template stored in Airtable. For example, a W-2 maps to the employment verification checklist; a bank statement maps to the asset verification checklist. The classifier also validates that the extracted fields are present and within acceptable ranges (e.g. date ranges, income thresholds).

PythonAirtableBusiness rules engine
4
Completeness Check

System checks whether all required documents for this loan type are present

n8n queries the loan application record to determine the loan type (conventional, FHA, jumbo, HELOC). It then checks the current document inventory against the required checklist. Missing items are identified and stored as a structured list. If all documents are present, the workflow skips to the notification step immediately.

n8n LogicAirtable lookupLoan type rules
5
Flag Missing Docs

Missing documents are flagged and a request email is auto-drafted

For each missing document, GPT-4o drafts a plain-English explanation of why the document is needed and what exactly the borrower needs to provide. The email is personalised with the borrower's name and lists all missing items in a single communication — no repetitive back-and-forth.

GPT-4oPersonalised emailBatched requests
6
Underwriter Notification

Underwriter receives a structured summary notification

When all documents are received and classified, n8n sends a Slack message or email to the assigned underwriter. The notification includes: borrower name, loan type, document count, any low-confidence extractions flagged for manual review, and a direct link to the classified document bundle in the loan management system.

Slack APIGmailLMS integration
7
Audit Log

Every action logged to a compliance audit trail

Every step — upload timestamp, GPT-4o extraction output, classification decision, completeness check result, and notification sent — is written to an immutable audit log in Airtable or a compliance database. Each record includes the model version used, confidence scores, and the processing engineer's credentials for regulatory audit purposes.

Immutable audit logSOC 2 readyModel version tracking
Tech stack

Every tool in the workflow

GPT-4o Vision

Document extraction AI

Extracts document types and structured field data from PDFs and images. Handles low-quality scans, handwritten text, and multi-page documents.

Python / FastAPI

Classification engine

Maps extracted document types to loan checklists and validates field completeness. Runs business rules for each loan product type.

n8n

Workflow orchestration

Manages the full pipeline: webhook intake, API calls, conditional logic, retry handling, and all notifications.

Airtable

Document registry & checklists

Stores loan application records, required document checklists per loan type, and the classified document inventory.

AWS S3

Secure document storage

Encrypted temporary storage for documents during processing. Files are deleted after 24 hours post-classification.

Gmail / Slack

Notifications

Borrower email requests for missing documents; underwriter Slack alerts when a complete document set is ready for review.

What we build

A 2–3 week sprint. Production ready.

Kovil AI scopes, builds, tests and deploys this workflow end-to-end. You don't touch n8n until it's live and processing real applications.

  • n8n webhook intake configured for your document portal
  • GPT-4o Vision prompt engineered for your document types
  • Python classifier with your loan product checklists
  • Completeness logic for all loan types (conventional, FHA, jumbo, HELOC)
  • Encrypted S3 storage with 24-hour auto-deletion
  • Underwriter Slack + email notifications configured
  • Immutable audit log compliant with loan origination requirements
  • 2-week handover: runbook, credentials, support access
Sprint timeline2–3 weeks
Week 1Scoping & access setup
  • API credentials setup
  • S3 bucket + encryption config
  • Document checklist mapping
Week 2Build & test
  • GPT-4o Vision prompt engineering
  • Python classifier logic
  • n8n pipeline end-to-end test
Week 3Deploy & handover
  • Production deployment
  • Audit logging activation
  • Documentation & runbook
FAQ

Common Questions

What document types can the classifier handle?

The standard build handles 15+ common mortgage and loan document types: W-2s, 1099s, bank statements, pay stubs, tax returns, asset statements, property appraisals, insurance declarations, and government-issued ID. Additional document types can be added by extending the classification prompt and checklist mapping.

How accurate is GPT-4o Vision on poor-quality scans?

In testing across typical mortgage document scans, GPT-4o Vision achieves >95% field extraction accuracy. The workflow flags any extraction with a confidence score below 85% for manual underwriter review — ensuring no low-confidence data silently passes through.

Is the audit log sufficient for compliance purposes?

The audit log captures model version, input hash, output JSON, confidence scores, processing timestamp, and operator credentials for every document processed. This satisfies typical loan origination audit requirements. We also support integration with your existing compliance logging infrastructure.

Can this work with our existing loan management system?

Yes. n8n has native connectors for Encompass, Blend, BytePro, and major LOS platforms. For systems without native connectors, we use API or webhook integration. We document all integration points during the scoping phase.

Ready to ship this in 3 weeks?

Book a 30-minute discovery call. We'll scope the classifier for your document types, loan products, and LOS integrations — fixed price, zero delivery risk.

Browse other workflows

Typical sprint: 2–3 weeks · Fixed-price · Fully managed delivery · Post-launch support included