
How to Edit a Scanned Invoice PDF (2026)
A scanned invoice is a photo of paper — editing it requires OCR to convert the image to real text first. Here's the fastest path from scan to editable content.
A scanned invoice is a photograph of a paper document saved as a PDF. The text you see is pixels in an image — not actual text the computer can edit. Before you can change any content, you need OCR (Optical Character Recognition) to convert the image into editable text.
This guide covers how to run OCR on a scanned invoice and then edit the resulting document.
Important note: Editing a legitimate scanned invoice to change amounts, vendor names, or dates — for the purpose of misrepresenting financial transactions — is fraud. This guide is for legitimate use cases: correcting scanning errors, adding missing information to your own invoices, updating your own business records.
Step 1: Confirm the Invoice is Scanned (Not Native)
Before applying OCR, verify the invoice is actually scanned:
- Open the PDF and try to click and drag to select text
- If text highlights accurately = native PDF (skip OCR, go straight to editing)
- If nothing happens, or if selected text is garbled/wrong = scanned image, OCR needed
You can also check in Acrobat Pro: Tools → Enhance Scans → Recognize Text. If it says "This document contains no text to recognize," it's fully a scan.
OCR Method 1: Adobe Acrobat Pro (Best Quality)
Acrobat Pro's Recognize Text feature produces the most accurate OCR for invoice documents.
- Open the scanned invoice PDF in Acrobat Pro
- Go to Tools → Enhance Scans → Recognize Text → In This File
- Click Settings to configure:
- Language: select the invoice's language (English default)
- Output: "Searchable Image (Exact)" — keeps the original image visible with a hidden text layer on top; or "Editable Text and Graphics" — converts to actual editable elements (more aggressive)
- For invoice editing: choose "Editable Text and Graphics" so you can modify the content
- Click Recognize Text
- Acrobat processes the document — this takes a few seconds per page
- Review the results: check that numbers (invoice totals, amounts, dates) were recognized correctly — OCR occasionally misreads similar-looking characters (0 vs O, 1 vs l vs I, 8 vs B)
After OCR: Use Tools → Edit PDF to click into text blocks and make changes.
OCR accuracy check: Pay special attention to:
- Currency amounts:
$1,234.56should not become$1,234,56or$1234.56 - Invoice numbers: any string of digits should be checked character by character
- Dates:
01/06/2026vs01/06/2026— verify day/month/year are correct - Company names: proper nouns often get mangled by OCR
OCR Method 2: Google Drive (Free)
Google Drive can perform OCR on uploaded scanned PDFs and convert them to editable Google Docs:
- Go to drive.google.com
- Upload the scanned invoice PDF
- Right-click the uploaded PDF → Open with → Google Docs
- Google runs OCR and opens the content as an editable document
- Edit the document (update amounts, dates, addresses)
- File → Download → PDF Document to save back as PDF
Quality: Google's OCR is good for clear scans in common languages. Tables in invoices may not convert cleanly — table borders often disappear, and column alignment may shift.
Best for: Simple invoices with clear print and standard layouts.
OCR Method 3: Python (Batch Processing)
For processing many scanned invoices programmatically:
import pytesseract
from pdf2image import convert_from_path
from PIL import Image
import pikepdf
import os
def ocr_invoice(pdf_path, output_txt_path):
"""Extract text from scanned invoice PDF using Tesseract OCR."""
# Convert PDF pages to images
images = convert_from_path(pdf_path, dpi=300)
full_text = []
for i, image in enumerate(images):
# Run OCR on each page
text = pytesseract.image_to_string(image, lang='eng')
full_text.append(f"--- Page {i+1} ---\n{text}")
with open(output_txt_path, 'w') as f:
f.write('\n'.join(full_text))
return '\n'.join(full_text)
# Usage
extracted_text = ocr_invoice("invoice_scan.pdf", "invoice_text.txt")
print(extracted_text)
Install: pip install pytesseract pdf2image Pillow pikepdf
Also requires: Tesseract installed on system (sudo apt-get install tesseract-ocr on Linux, brew install tesseract on Mac)
For modifying and re-exporting as PDF after OCR, use pikepdf or fpdf2 to reconstruct the invoice programmatically.
Editing the OCR'd Invoice
Adding or Correcting a Field Value
After OCR in Acrobat Pro:
- Tools → Edit PDF
- Click the text you want to change (amount, date, address)
- The text block selects — click inside to edit
- Type the correction
Matching invoice typography: Invoices typically use a consistent font (often Arial, Helvetica, Times New Roman, or a basic sans-serif). The Edit PDF panel on the right shows the current font — ensure new text matches.
If OCR created individual character blocks instead of word/line blocks: This happens with some fonts. Zoom in, select the entire line/word cluster, and note the font. Delete the cluster and use Add Text (in the Edit PDF toolbar) to add a fresh text block with the correct content in the right font.
Correcting an Invoice Total
Invoice totals, subtotals, and tax amounts are high-priority accuracy items. After OCR:
- Find the amount field in the Edit PDF view
- Click the text block containing the amount
- Edit to the correct value
- If the document has a visible total-box or highlighted area (background rectangle), verify the text sits correctly within it
For line-item amounts that need to add up to a total: edit each line item individually, then verify the total matches the sum. PDF editors don't auto-calculate — you need to set each value manually.
Updating Company Information
If you're updating a vendor's address, phone, or email in a scanned invoice you received:
- Use OCR to make the document editable
- Find and edit the relevant address block
- Keep the same font and size to maintain visual consistency
If the company logo is in the invoice: logos in scanned PDFs are image elements, not text. You can't edit the logo text through OCR — logos remain as image pixels unless you replace the entire image element.
Improving Scan Quality Before OCR
OCR accuracy depends heavily on scan quality. If the original scan is poor, OCR results will be unreliable regardless of software.
Optimal scan settings for invoice OCR:
- Resolution: 300 DPI minimum, 600 DPI for best results
- Mode: Black and white or grayscale (not color if ink is standard black)
- Contrast: High — dark text on white background
- Deskew: Most scanners and phone scan apps auto-straighten; verify there's no rotation
Enhancing a poor scan before OCR in Acrobat Pro:
- Tools → Enhance Scans → Enhance
- Acrobat applies automatic deskew, despeckle, and contrast enhancement
- Then run Recognize Text on the enhanced version
Phone scanner apps (Google Drive, Microsoft Lens, iOS Notes) generally produce good quality scans for OCR because they automatically optimize for document readability. If you're creating the scan yourself, use one of these apps rather than a basic camera photo.
When OCR Doesn't Work Well
Some scanned invoices are difficult for OCR:
Handwritten invoices: Standard OCR engines (Tesseract, Adobe OCR) are trained on printed text. Handwriting recognition is a separate, harder problem. For handwritten invoices, manual transcription or specialized handwriting OCR tools may be needed.
Carbon copy / faint impressions: Low contrast makes OCR unreliable. Enhance contrast before OCR, or manually retype the key fields.
Stamps and watermarks: "PAID," "VOID," or security watermarks layered over text reduce OCR accuracy in those areas. OCR may misread characters hidden under stamps.
Non-Latin scripts: Tesseract supports many languages including Arabic, Chinese, Hindi, and others, but accuracy varies. Install the appropriate language pack: pip install pytesseract and download the relevant Tesseract training data.
FAQ
Can I edit a scanned invoice in a PDF editor without running OCR first?
No — you can add text overlays on top of the scan, but you can't modify the existing scanned text. OCR converts the image to real text that editors can modify. If you just need to add an annotation or a new field (like a payment received stamp), you can do that without OCR by adding a text element on top.
After OCR, the text layout is wrong — numbers are in the wrong columns. What happened?
OCR sometimes misinterprets table structures, especially in complex invoice layouts with multiple columns. In Acrobat Pro's Edit PDF mode, individual text blocks may not align with the original columns. You can manually reposition text blocks by clicking and dragging them. For heavily misaligned tables, using Edit PDF → Add Text to type the corrected values in the right position may be faster than trying to reposition OCR-generated blocks.
Is it safe to upload a scanned invoice to an online PDF editor?
Invoices contain financial data, vendor names, amounts, and sometimes account numbers. For business invoices with sensitive financial information, prefer a local tool (Acrobat Pro, Python with Tesseract) over uploading to an online service. For non-sensitive invoices or your own templates, online tools are convenient.
Can I OCR a scanned invoice and import the data into my accounting software?
Yes — this is invoice data extraction / AP automation. Beyond simple OCR, tools like AWS Textract, Google Document AI, and Adobe's PDF Extract API are trained specifically on invoice structures and return structured data (vendor name, invoice number, line items, amounts) in JSON format that can feed directly into QuickBooks, Xero, or SAP. These are more sophisticated than general-purpose OCR and significantly more accurate for invoice fields.


