Discussion
Unverified: What Practitioners Post About OCR, Agents, and Tables
bonsai_spool: Please write in your own words! I’m not inclined to read something if it consists of what you copy and pasted from Claude
obsidianbases1: Interesting complaint, because many might not share any of their ideas if it weren't for LLMs making it easy. Not everyone has the incentive to dedicate a day to producing writing worth publishing. But maybe they would if it took significantly less time.Even considering HNs no LLMs for comments rule, which I mostly agree with, I think we would all lose of the same rule were applied to publishing in general.
curtisf: "I would rather read the prompt"https://claytonwramsey.com/blog/prompt/discussion: https://news.ycombinator.com/item?id=43888803All of the output beyond the prompt contains, definitionally, essentially no useful information. Unless it's being used to translate from one human language to another, you're wasting your reader's time and energy in exchange for you own. If you have useful ideas, share them, and if you believe in the age of LLMs, be less afraid of them being unpolished and simply ask you readers to rely on their preferred tools to piece through it.
quinndupont: Very helpful analysis that confirms everything I’ve encountered. OCR remains a thorny issue. The author talks about professional workflows struggling with tables and such, but I’ve found it challenging to get clean copies of long documents (books). The hybrid workflow (layout then OCR) sounds promising.
ChrisKnott: Is there a SOTA OCR model that prioritises failing in a debuggable way?What I want is an output that records which sections of the image have contributed to each word/letter, preferably with per word confidence levels and user correctable identification information.I should be able to build a UI to say: no, this section is red-on-green vertically aligned Cyrillic characters; try again.
bobajeff: It's very surprising to me that the state of the art tools for data entry and digitizing still require a lot of supervision. From the article it's not that surprising that handwritten documents are harder for old-school OCR or AI as that can be hard even for humans in some cases. But tables and different layouts seem like low hanging fruit for vision models.