Talk:Comparison of optical character recognition software
This article is rated List-class on Wikipedia's content assessment scale. It is of interest to the following WikiProjects: | ||||||||||||||
|
Text and/or other creative content from this version of Optical character recognition was copied or moved into List of optical character recognition software with this edit. The former page's history now serves to provide attribution for that content in the latter page, and it must not be deleted as long as the latter page exists. |
Text and/or other creative content from this version of OCR Software was copied or moved into List of optical character recognition software with this edit. The former page's history now serves to provide attribution for that content in the latter page, and it must not be deleted as long as the latter page exists. |
Text and/or other creative content from this version of OCR SDK was copied or moved into List of optical character recognition software with this edit. The former page's history now serves to provide attribution for that content in the latter page, and it must not be deleted as long as the latter page exists. |
Horribly biased table
editAs it is currently (2010.09.14), the majority of the table is tied up in various secondary concerns. While the operating platform is very important to a user, is it really worth dedicating 7 whole columns to it? While I am a big proponent of open source software (I use GNU/Linux on all boxen both at work and at home), I don't consider that it should be the primary feature (hence the first column that bears a feature) when evaluating which OCR to choose. Why is the programming language it's own column? The real features do not begin until the end of the table, where at least the number of languages and number of fonts supported can be viewed as a differentiator between the softwares. However both of these columns are horribly deficient, which ruins their utility (the number of fonts is listed once besides three that say any). I can see that having the SDK is an important feature for some, but not for typical or casual users (which are the ones coming to this page for information). The only real feature column is the Notes column at the very end, which is also very inconsistent. In one entry it presents an exert of a magazine review. In another it repeats the marketing slogan ("Developed for ultimate accuracy" - Transym). In another it insults it as "not entirely accurate" (as if any of the others are, or ever purport to be).
I do enjoy all the information already here, I am just suggesting that it could be presented in a better way. I still much prefer the current format over a simple list (like List_of_statistical_packages) as I like to sort by Linux to see the 8 (2 commercial) that I can use natively. However there are good examples such as List_of_computer_algebra_systems or Comparison_of_video_codecs. In the example of CAS, the information is split into three tables: general, functionality and OS support. I think that such a breakdown of the tables would prove very useful, making so much information accessible.
A future feature list could include
- images directly from scanner
- images from scanned files (each could list supported images pdf, png, tif)
- export support (searchable pdf, txt, html, .doc, .odf, .tex)
- layout analysis
- number of fonts for English (or maybe all lang.)
- number of languages
- handwritten documents
- spreadsheet support
- mathematical formulas (I know of only InftyReader that can, which needs to be added)
- barcode scanning
- SDK (and the language of the SDK)
Well just my 2 cents, as I think that the current version is not exceptionally useful (except to figure out what you can run on your linux box or OSX box). —Preceding unsigned comment added by 128.143.199.213 (talk) 16:51, 14 September 2010 (UTC)
- I agree that the multi-table format is going to be necessary to fit in all the information that is useful to have. It's certainly feasible to serve multiple audiences - casual users, software developers, OCR researchers, etc. I would also suggest separating "Developer" into a separate column, as I've seen done on a number of pages; that info is currently sneaking into the "Notes" column. The main challenge right now is for us to simply put in the time to research all the interesting aspects not currently covered or filled in. -- Beland (talk) 15:52, 20 May 2013 (UTC)
Incomplete List
editSeveral Vendors are excluded: Sakhr and Novodynamics. —Preceding unsigned comment added by 41.134.83.186 (talk) 14:14, 12 April 2011 (UTC)
Where is Adobe Acrobat, quite possibly the most commonly used commercial OCR program? Zerotalk 13:28, 7 August 2011 (UTC)
- Acrobat, Hyland's OnBase and Perceptive platforms, etc. Mike Moresi (talk) 19:15, 26 August 2022 (UTC)
The list is also missing Easy-OCR, http://code.google.com/p/easy-ocr/ 71.193.217.159 (talk) 03:32, 13 November 2011 (UTC)
- Looks like that's now at http://sourceforge.net/projects/easy-ocr/ -- Beland (talk) 21:24, 20 May 2013 (UTC)
- Also need to add: SanskritOCR, Chitrankan [1] [2]. -- Beland (talk) 20:22, 23 May 2013 (UTC)
- From [3]:
- fuzzyocr - spamassassin plugin to check image attachments
- libhocr0 - Hebrew OCR
Use two tables ?
editMaybe this list should be split into two tables. One showing software that contains an OCR engine. Another that shows frontends,GUIs, etc that use those engines. Jdc843 (talk) 08:03, 21 December 2012 (UTC)
That is exactly what I was going to say. For example FreeOCR or OCRFeeder are just interfaces to Tesseract engine. UserHuge (talk) 20:40, 1 January 2013 (UTC)
- Just interfaces? Hm. Well. I haven't used any of the programs in years so I don't know and won't try to verify that. I assume that you undid my bad change and put correct information there. Jdc843 (talk) 02:35, 6 February 2023 (UTC)
- You realize you are replying to a ten-year-old conversation, right? - Ahunt (talk) 02:37, 6 February 2023 (UTC)
Add cols
edit- mobile OSes: like android (Text Fairy etc, Tesseract) or iOS?
- input: printed letters, handwriting, any alphabet (learns letters) — Preceding unsigned comment added by 77.8.87.46 (talk) 19:18, 8 June 2015 (UTC)
Table headings
editWhat are "Founded Year" and "Release Year"? Founding of the company? Release of the first version or the latest version?
Inclusion criteria
editMinimally, every entry needs to be verified. Reducing the list to only notable entries with their own articles is an option as well. Given the large number of non-notable entries currently, maybe we should include notable entries and entries from notable companies? --Ronz (talk) 15:41, 19 October 2017 (UTC)
- I agree, it would reduce the spamming issue in the tables. - Ahunt (talk) 21:57, 19 October 2017 (UTC)
- Done, plus I have added a hidden note that will be seen by editors before they add more non-notable OCR programs. This article continues to be subject to attempts to use it for WP:PROMOTION and WP:SPAM by purveyors of commercial OCR products. - Ahunt (talk) 11:01, 17 June 2022 (UTC)
Cloud Vision
editadd Google Cloud Vision Genetics4good (talk) 08:02, 6 July 2018 (UTC)