Deep Learning-Based OCR Solution in Python
Leverage docTR to perform accurate text extraction and recognition from images.
What is docTR API for Python?
docTR (Document Text Recognition) is an open-source deep learning-based Optical Character Recognition (OCR) library in Python. It provides state-of-the-art text detection and recognition for scanned documents, images, and PDFs. By leveraging modern deep learning architectures, docTR ensures high accuracy and efficiency in extracting text while preserving document structure.
docTR is widely used for document digitization, automated data extraction, and AI-powered text recognition applications. It supports multiple languages, handwriting recognition, and GPU acceleration for enhanced performance.
Key Features of docTR API
- Advanced Deep Learning OCR: Uses neural networks for precise text detection and recognition.
- Multi-Format Support: Works seamlessly with images, PDFs, and scanned documents.
- Handwriting Recognition: Detects and extracts handwritten text with remarkable accuracy.
- Multi-Language Recognition: Supports various languages and scripts.
- Optimized for Speed: Efficient text extraction with GPU acceleration.
- Preserves Document Layout: Retains structure during text recognition.
- Scalable and Open Source: Free to use and actively maintained for continuous improvements.
Getting Started with docTR API
To install docTR, use the following pip command:
Install docTR
pip install python-doctr
If you want to enable GPU acceleration for faster processing, install additional dependencies:
Install GPU dependencies
pip install tensorflow-gpu torch torchvision
Code Examples for Text Extraction Using docTR API
Below are several examples demonstrating text extraction from images and documents using docTR.
Example 1: Extracting Text from an Image
This example demonstrates how to load an image, apply OCR with docTR, and extract the text. The extracted text includes its position within the image, making it useful for structured document processing.
Extract Text from Image
from doctr.io import DocumentFile
from doctr.models import ocr_predictor
doc = DocumentFile.from_images("sample.png")
model = ocr_predictor(pretrained=True)
result = model(doc)
print(result.export())
Example 2: Processing a Multi-Page PDF Document
If you need to extract text from a PDF file containing multiple pages, docTR simplifies the process. The example below shows how to extract text from every page efficiently.
Extract Text from PDF
from doctr.io import DocumentFile
from doctr.models import ocr_predictor
doc = DocumentFile.from_pdf("sample.pdf")
model = ocr_predictor(pretrained=True)
result = model(doc)
print(result.export())
Example 3: Recognizing Handwritten Text
docTR can also recognize handwritten text, making it ideal for digitizing handwritten notes, forms, or historical documents. This example demonstrates extracting text from a synthetic handwritten document.
Extract Handwritten Text
from doctr.models import ocr_predictor
from doctr.datasets import synthetic_documents
doc = synthetic_documents()[0]
model = ocr_predictor(pretrained=True)
result = model(doc)
print(result.export())
Conclusion
docTR API is a powerful deep learning-based OCR solution that simplifies text extraction from images, PDFs, and handwritten documents. It ensures high accuracy while preserving document structure, making it a valuable tool for AI-driven document processing, automation, and data extraction.
Whether you're working on document digitization, automated data entry, or AI-based text recognition, docTR provides a flexible and efficient solution tailored to your needs.