Optical character recognition

related topics
{system, computer, user}
{work, book, publish}
{math, number, function}
{company, market, business}
{@card@, make, design}
{style, bgcolor, rowspan}
{language, word, form}
{service, military, aircraft}
{government, party, election}
{day, year, event}

Optical character recognition, usually abbreviated to OCR, is the mechanical or electronic translation of scanned images of handwritten, typewritten or printed text into machine-encoded text. It is widely used to convert books and documents into electronic files, to computerize a record-keeping system in an office, or to publish the text on a website. OCR makes it possible to edit the text, search for a word or phrase, store it more compactly, display or print a copy free of scanning artifacts, and apply techniques such as machine translation, text-to-speech and text mining to it. OCR is a field of research in pattern recognition, artificial intelligence and computer vision.

OCR systems require calibration to read a specific font; early versions needed to be programmed with images of each character, and worked on one font at a time. "Intelligent" systems with a high degree of recognition accuracy for most fonts are now common. Some systems are capable of reproducing formatted output that closely approximates the original scanned page including images, columns and other non-textual components.

Contents

History

In 1929 Gustav Tauschek obtained a patent on OCR in Germany, followed by Handel who obtained a US patent on OCR in USA in 1933 (U.S. Patent 1,915,993). In 1935 Tauschek was also granted a US patent on his method (U.S. Patent 2,026,329). Tauschek's machine was a mechanical device that used templates and a photodetector.

RCA engineers in 1949 worked on the first primitive computer-type OCR to help blind people for the US Veterans Administration, but instead of converting the printed characters to machine language, their device converted it to machine language and then spoke the letters. It proved far too expensive and was not pursued after testing.[1]

In 1950, David H. Shepard, a cryptanalyst at the Armed Forces Security Agency in the United States, addressed the problem of converting printed messages into machine language for computer processing and built a machine to do this, reported in the Washington Daily News on 27 April 1951 and in the New York Times on 26 December 1953 after his U.S. Patent 2,663,758 was issued. Shepard then founded Intelligent Machines Research Corporation (IMR), which went on to deliver the world's first several OCR systems used in commercial operation.

The first commercial system was installed at the Reader's Digest in 1955. The second system was sold to the Standard Oil Company for reading credit card imprints for billing purposes. Other systems sold by IMR during the late 1950s included a bill stub reader to the Ohio Bell Telephone Company and a page scanner to the United States Air Force for reading and transmitting by teletype typewritten messages. IBM and others were later licensed on Shepard's OCR patents.

Full article ▸

related documents
Jef Raskin
Microsoft Developer Network
People in systems and control
Time-sharing
Xfce
Real-time Transport Protocol
Logic analyzer
Dynamic DNS
Comparator
MPEG-4
NuBus
Repeater
RF modulator
Resource Interchange File Format
Motorola 68000 family
Response time (technology)
Speech coding
Psion Organiser
Encapsulated PostScript
GeForce 256
Communications system
NS320xx
Virtual circuit
Sampling rate
Pulse dialing
Fibre Channel
IEEE 802.15
Forward error correction
Time-division multiplexing
MX record