Digitizing the messy bills: handwritten chits, faded thermal, and the petrol-pump pile
The clean printed invoices were never the problem. PDFs from a vendor's billing software, e-invoices with a QR code, those almost read themselves. The pile that ruins a weekend is the other one: the handwritten kirana chit, the petrol receipt printed on thermal paper that's already going grey, the conveyance bill someone scrawled in the back of an auto.
This is the guide for that pile. What the machine can do now, what it still can't, and how to set up the job so you're not retyping everything by hand.
Handwriting: better than you think, not as good as the sales pitch
Handwriting recognition crossed a real line in the last couple of years. I'll give you our own test rather than a vendor claim. We ran a handwritten invoice (a Ghana commercial invoice, dense handwriting, numbers and names) through the model we use, and it came back close to fully correct, around the high-90s on the fields that matter. That's a genuinely hard document, and it largely worked.
So the headline is true: a good model will pull a name and a rupee figure off a creased, blue-ink chit that you'd squint at. What the headline leaves out is the failure mode. Handwriting recognition doesn't fail loudly. It doesn't return a blank or an error. It returns a confident, plausible, wrong number. A 7 that was a 1. A 3 that was an 8. The total looks fine until the GST doesn't tie out three bills later.
So treat handwriting like this: let the machine read it, then confirm the total and the GSTIN with your own eyes. Not every field. The two or three that cost money. A number you didn't actually look at is a number waiting to embarrass you in front of a client.
Thermal receipts: the time-bomb format
Thermal paper is the worst input in the whole pile, and it gets worse the longer the bill sits in a drawer. The print is heat, not ink, so it fades, and a fuel receipt that was crisp at the pump is grey and patchy by the time it reaches you in March.
A few things I've learned the hard way about thermal:
- Scan or photograph it the day it arrives if you can. The print only degrades. A receipt that's readable today may not be in two months.
- The fade hits the small print first. The big total usually survives; the GSTIN, the date, the small HSN line are the first casualties. That's the opposite of what you want, because the small print is exactly what a return needs.
- Glare is your enemy on thermal. The shiny surface throws back light and wipes out a whole band of the receipt in the photo. Shoot it in soft, even light, slightly off-axis so the glare misses the lens.
Where thermal does work, it works well. A clearly-printed petrol or HPCL-style receipt with the GSTIN still legible reads cleanly. The trouble is always the faded one, and the fix there is partly the machine and mostly catching it early.
The petrol-pump pile, specifically
Fuel bills deserve their own paragraph because every CA with a client who has vehicles drowns in them. They're small, thermal, numerous, and they all look alike, which is exactly the kind of repetitive, low-value typing the machine should take off your plate. The pattern is consistent (pump name, GSTIN, date, litres, rate, amount), so once a tool reads one HPCL or IOCL receipt format cleanly, it reads the rest. The work is volume, not difficulty. That's the best possible case for automation: high count, low variation, fields you can check at a glance.
Getting it into the shape you can book
Reading the bill is half the job. The other half is the output. A wall of recognized text helps nobody. You want a row per bill (vendor, GSTIN, date, taxable value, tax, total) in Excel, or a file you can import into Tally. If a tool reads beautifully and then hands you back unstructured text, it has done the easy half and left you the chore you started with.
For a side-by-side of how different tools score on exactly these messy inputs, raw reading versus structured GST fields versus Tally-ready output, see our India OCR accuracy benchmark. And if your messy pile is also in a regional script, the Indian-language extraction guide covers the script-specific quirks on top of the handwriting ones.
One thing to weigh about whatever tool you reach for: this pile is someone else's confidential paperwork. The consumer versions of ChatGPT and Gemini use what you paste to train their models, so dumping a client's bills into a chat box quietly hands their data over. A weekend bag of a client's invoices is exactly what shouldn't end up in a model's training set.
The honest summary
The machine now reads handwritten and thermal bills well enough to take most of the typing off you. It does not read them well enough to trust blind, and on faded thermal it never will. So the workflow that actually holds up is: photograph early and in good light, let the model read the pile, and spend your attention on the GSTIN and the total of each bill rather than every character. Done that way, the weekend bag stops being a weekend.