PC from zero: What is OCR ?

Optical character recognition, usually abbreviated to OCR, is the mechanical or electronic translation of images of handwritten, typewritten or printed text (usually captured by a scanner) into machine-editable text.

OCR is a field of research in pattern recognition, artificial intelligence and machine vision. Though academic research in the field continues, the focus on OCR has shifted to implementation of proven techniques. Optical character recognition (using optical techniques such as mirrors and lenses) and digital character recognition (using scanners and computer algorithms) were originally considered separate fields. Because very few applications survive that use true optical techniques, the OCR term has now been broadened to include digital image processing as well.

Early systems required training (the provision of known samples of each character) to read a specific font. "Intelligent" systems with a high degree of recognition accuracy for most fonts are now common. Some systems are even capable of reproducing formatted output that closely approximates the original scanned page including images, columns and other non-textual components.

[via wikipedia]

Here is an excellent online service: http://www.ocrterminal.com/

OCR Terminal is a free online Optical Character Recognition service that allows you to convert scanned images and PDFs into editable and text searchable documents. It accurately preserves formatting and layout of documents.

PC from zero

Home page - Lesson plan - Windows - Word 2007 - Excel 2007 - The Book

giovedì 7 maggio 2009

What is OCR ?