Open Source Python PDF Annotation Library
Try this Free & Open Source Python library for adding and extracting annotations from PDF documents.
What is pypdf?
Pypdf is a free and open-source python library known for it's diverse set of features for handling PDF documents in python environment. This tool does come in handy for various PDF manipulations but we will focus on its annotation related features in this review.
Notable features of pypdf related to annotations include:
- Adding Shape Annotations: We can draw shapes like lines, rectangles, ellipses and polygons etc. on specific areas of PDF pages as annotations.
- Adding Text Annotations: We can add text annotations to specific positions of PDF pages.
- Adding Link Annotations: It's possible to also add link annotations (like a hyperlink) to PDF documents.
- Extracting Annotations: We can iterate and extract information about all annotations in a PDF document using pypdf library.
Getting Started with pypdf
You need Python version 3.6.0 or higher to install and use pypdf. So, first install Python and then use below commands to install pypdf on your machine using pip and virtual environment.
Linux
python3 -m venv venv
source venv/bin/activate
pip install pypdf
MacOS
python -m venv venv
source venv/bin/activate
pip install pypdf
Windows
python3 -m venv venv
venv\Scripts\activate.bat
pip install pypdf
Add Rectangle Annotation to PDF
We can add rectangle annotations to PDF documents using the pypdf library. We will use Rectangle class of the pypdf.annotations module from the pypdf library to define the rectangle. Then we will use the add_annotations method of the PDFWriter class to add rectangle annotation to the PDF.
Check below code snippet for the details:
Output
In below screenshot, you can see that rectangle is added to annotate Open Source word:
Add Text Annotation to PDF
We create text annotations using Text class from the pypdf.annotations module of the pypdf library. After that we can use add_annotations method of the PDFWriter class from the pypdf library to add annotation to the PDF. The text annotation is added as an icon that expands and shows the text when the icon is clicked. Check below code snippet for the details:
Output
As we can see in the below screencast, above code adds an icon to the PDF at specified position which shows text annotation when icon is clicked:
Add Link Annotation to PDF
Link annotations are created using the Link class from the pypdf.annotations module. However, the issue with the link annotation is that it just adds the link but its not visible. To address this issue, we will incorporate a rectangle using the Rectangle class from the pypdf.annotations module as we explained earlier. This way, the user can visually identify the location where the link annotation is added. Check below code snippet for better understanding:
Output
As we can see in the output, the rectangle serves as an area that, when clicked, redirects the user to the specified link.
Extract Annotations from PDF
We can extract annotations from a PDF using the pypdf library. We iterate through all annotations on PDF pages and then use the get_object method to get the annotation object. Then we extract relevent information from the object. Check below code snippet for the details:
Output
As we can see in below screenshot, program has returned the annotation type and the coordinates of the annotations in the PDF document:
Conclusion
Pypdf empowers Python developers to add different types of annotations to PDFs and access essential information about annotations, such as their type and location, making it a practical choice for tasks requiring adding annotations and extracting data about annotations.