Convert scanned pdf document to text using Optical Character Recognition (OCR)
- 100+ Recognition Languages
- Multi Column Document Analysis
- 100% FREE, Unlimited Uploads, No RegistrationRead More ...
OCR stands for Optical Character Recognition, which is a technology to recognize text from images of scanned documents and photos. PDF stands for (Portable Document Format), where the document layout looks the same despite the underlying operating system or hardware used to view the document. PDF document can contain text, images, hyperlinks, embedded fonts, videos, forms, and many more. There are 3 types of PDF documents:
Editable PDF: The PDF is created digitally by any software such as MSWord and consists of text and images, where you can search, select, and edit the document in easily using any PDF reader.
Scanned PDF: The PDF consists of images created by either scanning a hard document using a scanning device or an image (jpg, png, tiff) captured by an imaging device such as a mobile or digital camera. You can not search, select, nor edit the document text unless you use an OCR service such as i2OCR.
Searchable PDF: The PDF consists of an image layer of a scanned document and a text layer under it as a result of an OCR service (such as i2OCR) applied to the image layer. You can search, select, and edit the document. This type of PDF is usually called PDF/A, where "A" stands for archiving.
i2OCR converts PDF to text in 2 steps: first, it converts PDF into images, then recognize text of the selected image.