Of course this introduces another subtle error cause into the mix: sometimes tools fail to run due to external circumstances, e.g. ![]() "textify/OCR" process, then there's no reason to expect it to do better next time. The premise here being that once you've failed each stage in the text extraction a.k.a. This user-observed behaviour has been forcibly stop gapped by me with those "fake words" being injected into the output when, at the end of all the things we tried in that workflow, there still is nothing to report home. The "curious" bit of Qiqqa was (and in ways still is), at least from a user perspective, that it keeps re-trying the text extraction/OCR business an infinite number of runs, when the entire workflow does not succeed in delivering any words for a given page. there's no sanity check on the mupdf output, which in some very peculiar "obfuscated" PDFs can lead to very interesting results.Īnyway, that's about the list of causes I can come up with, in order of decreasing horribleness. ![]()
0 Comments
Leave a Reply. |