When you upload an invoice to InvoiceFlow, what actually happens? Most AP automation tools treat extraction as a black box. We'd rather explain it — because understanding how it works helps you trust the output and catch the edge cases.
The problem with traditional OCR
Older invoice tools worked by learning the layout of each vendor's document. You'd "train" the system: vendor name is always in the top-left, total is always in the bottom-right. This worked until vendors changed their template, or a new supplier sent something unexpected.
The deeper problem: traditional OCR tells you what characters are on the page. It doesn't tell you what those characters mean.
A traditional system sees 7,290.00 at coordinates (450, 680). It has no idea that's the total amount on an invoice from Acme Supply Co. dated March 15. Turning that raw text into structured data — vendor name, invoice number, line items, total — requires a separate rules layer on top. And that rules layer needs to be maintained for every vendor, every format, every edge case.
How LLMs read invoices differently
InvoiceFlow uses a large language model (LLM) — specifically, Google Gemini 2.5 Flash as the primary extractor. LLMs read invoices the way a person does: by understanding the content and its meaning, regardless of where it appears on the page.
No training. No templates. No setup per vendor.
When Gemini processes an invoice, it reads the entire document as a visual scene. It identifies that Acme Supply Co. is the vendor because it understands the context — not because vendor names are always in a specific location. It extracts 7,290.00 as the total because it understands accounting documents, not because a rule told it to look at coordinates (450, 680).
This is why the same system handles a clean digital PDF from one vendor and a scanned, slightly-rotated document from another without any reconfiguration.
The two-tier extraction pipeline
Not all invoices are equal. A clean, text-based PDF is straightforward. A scanned document with handwritten annotations and complex line items is harder. InvoiceFlow handles both with a two-tier approach.
Tier 1 — Fast extraction (Gemini 2.5 Flash)
Every invoice starts here. Gemini reads the document, extracts all standard AP fields, and returns structured data in under 200 milliseconds. For the vast majority of invoices, this is sufficient.
Tier 2 — High-accuracy fallback (Claude Sonnet)
If the Tier 1 result fails validation (more on that below), the invoice is automatically re-run through Claude Sonnet — a more capable model suited for edge cases that trip up faster extraction. This catches unusual layouts, complex tables, and documents with ambiguous structure.
The result of both tiers is the same: structured invoice data ready for your review queue.
Validation — the step most tools skip
Extraction alone isn't enough. An LLM can confidently return a wrong number. That's why every extraction result goes through a rule-based validation layer before it reaches your review queue.
Six checks run on every invoice:
| Rule | What it checks |
|---|---|
SUMMATION_FAIL | Subtotal + Tax ≠ Total (within rounding tolerance) |
FUTURE_DATE | Invoice date is in the future |
INVALID_CURRENCY | Currency code isn't a recognized ISO 4217 code |
NEGATIVE_VALUE | Amount fields that shouldn't be negative are |
LINE_ITEMS_SUMMATION_FAIL | Line item amounts don't sum to the subtotal |
INVALID_LINE_ITEM | A line item is malformed or missing required fields |
If any of these fail on a Tier 1 result, the invoice is automatically escalated to Tier 2. If validation still fails after Tier 2, the specific fields are flagged for human review with low confidence.
Research from Gennai found that implementing comprehensive validation prevents 60–80% of downstream errors — and that human review of flagged invoices costs far less than fixing errors discovered during month-end reconciliation.
How the system signals uncertainty
Here's the part that separates well-built extraction pipelines from fragile ones: silent failures.
Without a confidence signal, a document AI system looks identical whether it's certain or guessing. Both return an answer. Both get passed downstream. You only find out something was wrong when a payment doesn't match, or a reconciliation breaks weeks later.
InvoiceFlow assigns a confidence score (0–100) to every extracted field. The score is derived from three factors:
Which tier extracted it
Tier 1 results carry a slightly lower baseline confidence than Tier 2 results, reflecting the trade-off between speed and accuracy.
Validation errors
Fields involved in a failed check score lower. A SUMMATION_FAIL lowers confidence on both the total and subtotal fields — not just the rule that triggered.
Document quality signals
Scanned documents, rotated pages, low resolution, and multi-document PDFs all reduce confidence across the affected fields. These are the same factors that research identifies as the primary causes of extraction failure: poor image quality accounts for 35% of AI extraction failures, followed by unusual layouts at 25%.
In your review queue, low-confidence fields are highlighted. High-confidence fields are pre-confirmed. You review what actually needs review — nothing more.
What this means in practice
AI invoice extraction achieves 95–99% accuracy on standard invoices (Precoro, 2024). That's strong — but it also means 1–5% of fields on standard invoices may need a correction. On complex invoices with many line items, that rate is higher.
The confidence scoring and validation layer exist for exactly this reason. The goal isn't to hide uncertainty — it's to surface it precisely, so you spend your attention where it matters.
A wrong invoice number flagged in the review queue costs you 10 seconds. The same error pushed silently to QuickBooks creates a reconciliation problem that costs you an hour.
A note on multi-document PDFs
Sometimes a single PDF contains multiple invoices. InvoiceFlow detects this automatically, processes each invoice separately, and flags the extracted records with lower confidence — since splitting multi-document PDFs is inherently more error-prone than processing single invoices.
If you regularly receive consolidated invoice PDFs from a vendor, splitting them before upload will give you cleaner results.