Open Source Python PDF Metadata Library
Free & open source Python library to read and update metadata of PDF documents.
What is pypdf?
Pypdf is a versatile open source python library known for its diverse set of features for PDF manipulation. This library does come in handy for various PDF manipulations like PDF parsing and PDF splitting & merging etc. but in this product review, we will only focus on its PDF metadata management features.
Following are the main features of pypdf related to metadata:
- Read PDF Metadata: You can read properties (such as author, creator, producer, title, subject and keywords) of PDF documents using pypdf.
- Update PDF Metadata: You can also update metadata of PDF documents using pypdf.
Getting Started with pypdf
You need Python version 3.6.0 or higher to install and use pypdf. So, first install Python and then use below commands to install pypdf on your machine using pip and virtual environment.
Linux
python3 -m venv venv
source venv/bin/activate
pip install pypdf
MacOS
python -m venv venv
source venv/bin/activate
pip install pypdf
Windows
python3 -m venv venv
venv\Scripts\activate.bat
pip install pypdf
Reading Metadata of PDF
We can read the metadata of a PDF document using the pypdf library. We will get the metadata of a PDF from the metadata property of the PdfReader class in the pypdf library. Check the below code snippet for details:
Output
Below screenshot displays the metadata of the provided PDF file:
Updating Metadata of PDF
We can also update metadata of a PDF document such as author, producer, subject and title etc. using pypdf library. We will pass an object containing the metadata information to the add_metadata method of the PdfWriter class in the pypdf library to update/write metadata of the PDF document. For detail, check the below code snippet:
Conclusion
In conclusion, pypdf proves to be an exceptional Python library for reading and updating metadata of PDF documents. Developers can easily read and update metadata of PDF documents without any issues.