Deep Learning-Based OCR Solution in Python

Leverage docTR to perform accurate text extraction and recognition from images.

What is docTR API for Python?

docTR (Document Text Recognition) is an open-source deep learning-based Optical Character Recognition (OCR) library in Python. It provides state-of-the-art text detection and recognition for scanned documents, images, and PDFs. By leveraging modern deep learning architectures, docTR ensures high accuracy and efficiency in extracting text while preserving document structure.

docTR is widely used for document digitization, automated data extraction, and AI-powered text recognition applications. It supports multiple languages, handwriting recognition, and GPU acceleration for enhanced performance.

GitHub Stats

Name:
Language:
Stars:
Forks:
License:
Repository was last updated at

Key Features of docTR API

Advanced Deep Learning OCR: Uses neural networks for precise text detection and recognition.
Multi-Format Support: Works seamlessly with images, PDFs, and scanned documents.
Handwriting Recognition: Detects and extracts handwritten text with remarkable accuracy.
Multi-Language Recognition: Supports various languages and scripts.
Optimized for Speed: Efficient text extraction with GPU acceleration.
Preserves Document Layout: Retains structure during text recognition.
Scalable and Open Source: Free to use and actively maintained for continuous improvements.

Getting Started with docTR API

To install docTR, use the following pip command:

Install docTR


pip install python-doctr

If you want to enable GPU acceleration for faster processing, install additional dependencies:

Install GPU dependencies


pip install tensorflow-gpu torch torchvision

Code Examples for Text Extraction Using docTR API

Below are several examples demonstrating text extraction from images and documents using docTR.

docTR API for OCR

Example 1: Extracting Text from an Image

This example demonstrates how to load an image, apply OCR with docTR, and extract the text. The extracted text includes its position within the image, making it useful for structured document processing.

Extract Text from Image


from doctr.io import DocumentFile
from doctr.models import ocr_predictor

doc = DocumentFile.from_images("sample.png")
model = ocr_predictor(pretrained=True)
result = model(doc)
print(result.export())

Example 2: Processing a Multi-Page PDF Document

If you need to extract text from a PDF file containing multiple pages, docTR simplifies the process. The example below shows how to extract text from every page efficiently.

Extract Text from PDF


from doctr.io import DocumentFile
from doctr.models import ocr_predictor

doc = DocumentFile.from_pdf("sample.pdf")
model = ocr_predictor(pretrained=True)
result = model(doc)
print(result.export())

Example 3: Recognizing Handwritten Text

docTR can also recognize handwritten text, making it ideal for digitizing handwritten notes, forms, or historical documents. This example demonstrates extracting text from a synthetic handwritten document.

Extract Handwritten Text


from doctr.models import ocr_predictor
from doctr.datasets import synthetic_documents

doc = synthetic_documents()[0]
model = ocr_predictor(pretrained=True)
result = model(doc)
print(result.export())

Conclusion

docTR API is a powerful deep learning-based OCR solution that simplifies text extraction from images, PDFs, and handwritten documents. It ensures high accuracy while preserving document structure, making it a valuable tool for AI-driven document processing, automation, and data extraction.

Whether you're working on document digitization, automated data entry, or AI-based text recognition, docTR provides a flexible and efficient solution tailored to your needs.