Thermal receipt to Excel: why faded petrol and retail bills defeat generic OCR
The handwritten chit and the faded thermal receipt are the two inputs that ruin a reconciliation weekend, and the messy-bills guide covers both at a workflow level. This page goes one level deeper on thermal specifically, because thermal fails for a physical reason that handwriting doesn't, and once you understand the reason the fix becomes obvious.
Thermal print isn't ink, and that's the whole problem
A thermal printer has no ink and no toner. The paper is coated with a chemical layer that turns dark when heated, and the print head just applies heat in the right pattern. Fast, cheap, no cartridge, which is why every petrol pump, toll booth, kirana POS and parking machine in the country uses it.
The catch is that the reaction reverses. Heat, sunlight, friction, and contact with oils or plastics all push the coating back toward blank. A petrol receipt that was crisp at the pump goes grey in a glovebox over a summer. A retail bill stapled into a file and pressed against a plastic sleeve loses contrast where it touched the plastic. By the time a bag of these reaches you in March, half of them are ghosts of what was printed.
That's the part generic OCR was never built for. Most OCR assumes dark marks on a light background with decent contrast. Faded thermal gives it low contrast, uneven contrast across the same receipt, and a background that's drifting toward the same grey as the text. The engine isn't misreading a clear character; it's being handed characters that have physically half-disappeared.
Where the fade lands, and why it's the worst place
Thermal doesn't fade evenly, and it doesn't fade where you'd want it to. The fade hits the smallest, lightest print first.
On a fuel receipt the big total is printed large and bold, so it's the last thing to go. The GSTIN, the date, the HSN or SAC line, the litres-and-rate detail: all small, all the first casualties. So the cruel pattern is that the number you could re-derive (the total) survives, and the fields a return actually needs (the GSTIN, the tax detail) are the ones that go grey. An OCR engine reading a faded fuel bill will often hand you a confident total and a mangled or blank GSTIN, which is precisely backwards from what your filing needs.
This is why "it read the receipt" and "it read the receipt usefully" are different sentences for thermal.
What our own thermal test showed
I'll give you a measured observation rather than a slogan. A clearly printed thermal petrol receipt, captured while the print was still crisp, with the GSTIN legible, read cleanly through the engine we use: pump name, GSTIN, date, amount all came through and checked out by hand against the receipt. No drama.
The faded ones are the whole problem, and on those the honest result is conditional: if a human can still make out the GSTIN by tilting the receipt to the light, a good model usually can too; if the GSTIN has physically faded into the background, no model invents it back, and the right answer is to flag it for a manual look rather than trust a guess. That conditional is the real finding. Thermal accuracy is governed less by the model and more by how degraded the paper was when you captured it. For how this sits against other tools and inputs, including what we did and didn't test, see the accuracy benchmark.
The petrol-pump batch is the best case, if you capture early
Fuel bills are the highest-volume thermal pile for any CA with a client who runs vehicles, and they're also the easiest to automate well, because they're uniform. Pump name, GSTIN, date, litres, rate, amount, in a layout that barely changes between receipts from the same chain. Once a tool reads one HPCL or IOCL format cleanly, it reads the rest of that stack. The work is volume, not difficulty.
So the thing that moves the needle on thermal isn't a better OCR engine. It's a capture habit:
- Photograph or scan the day it arrives. The print only degrades. This single habit does more for thermal accuracy than any post-processing, because you're reading the bill while the chemistry is still intact.
- Kill the glare. The glossy thermal surface reflects light straight back and wipes out a band of the receipt. Soft, even light, camera tilted slightly off-axis so the reflection misses the lens. A flash on a thermal receipt is close to the worst thing you can do.
- Don't store them against plastic or in the heat. A glovebox, a laminated sleeve, a sunny windowsill: all of them eat the print. A plain envelope in a cool drawer keeps a receipt readable far longer.
Getting the thermal pile into a bookable shape
Reading is half the job; the output shape is the other half, and it matters more on thermal because the volume is so high. A wall of recognized text from two hundred fuel receipts helps nobody. You want a row per receipt (pump, GSTIN, date, amount, tax) landing in Excel, or a file you import into Tally, so the pile collapses into a sheet you scan down rather than a stack you retype.
The workflow that holds up for thermal: capture early and glare-free, let the model read the uniform stack, and spend your eyes on the GSTIN of any receipt where the small print looks faded. The crisp ones you trust; the grey ones you check. That's the difference between a fuel-bill pile that takes an afternoon and one that takes a weekend.