Open Source Python PDF Metadata Library

Free & open source Python library to read and update metadata of PDF documents.

What is pypdf?

Pypdf is a versatile open source python library known for its diverse set of features for PDF manipulation. This library does come in handy for various PDF manipulations like PDF parsing and PDF splitting & merging etc. but in this product review, we will only focus on its PDF metadata management features.

Following are the main features of pypdf related to metadata:

Read PDF Metadata: You can read properties (such as author, creator, producer, title, subject and keywords) of PDF documents using pypdf.
Update PDF Metadata: You can also update metadata of PDF documents using pypdf.

GitHub Stats

Name:
Language:
Stars:
Forks:
License:
Repository was last updated at

Getting Started with pypdf

You need Python version 3.6.0 or higher to install and use pypdf. So, first install Python and then use below commands to install pypdf on your machine using pip and virtual environment.

Linux


python3 -m venv venv
source venv/bin/activate
pip install pypdf

MacOS


python -m venv venv
source venv/bin/activate
pip install pypdf

Windows


python3 -m venv venv
venv\Scripts\activate.bat
pip install pypdf

Reading Metadata of PDF

We can read the metadata of a PDF document using the pypdf library. We will get the metadata of a PDF from the metadata property of the PdfReader class in the pypdf library. Check the below code snippet for details:

Output

Below screenshot displays the metadata of the provided PDF file:

Updating Metadata of PDF

We can also update metadata of a PDF document such as author, producer, subject and title etc. using pypdf library. We will pass an object containing the metadata information to the add_metadata method of the PdfWriter class in the pypdf library to update/write metadata of the PDF document. For detail, check the below code snippet:

Conclusion

In conclusion, pypdf proves to be an exceptional Python library for reading and updating metadata of PDF documents. Developers can easily read and update metadata of PDF documents without any issues.