Open Source Python Metadata Library for PDF Documents
Free & open source Python library to read, edit and update metadata of PDF Files.
What is PikePDF for Python?
PikePDF is a modern Python library tailored for seamless PDF manipulation, with powerful capabilities for working specifically with metadata. Built on the robust QPDF library, PikePDF allows developers to easily add, edit, and remove metadata from PDF files, making it an essential tool for organizing and enriching document information. Whether you need to update titles, authors, subject fields, or custom metadata entries, PikePDF provides a Pythonic and intuitive API for managing these details programmatically. It also supports handling embedded metadata for enhanced document classification and searchability, ensuring compliance with workflows that rely heavily on detailed document descriptions. With its focus on reliability and performance, PikePDF is ideal for automating metadata management tasks in document processing systems or enhancing metadata-driven PDF workflows.
Features of PikePDF API
PikePDF API for Python has rich set of features for working with the metadata of the PDF documents. Some of its features are as listed below.
- PDF Manipulation: Merge, split, rotate, and reorder pages within PDF files.
- Metadata Handling: Add, edit, or remove metadata to enhance PDF organization and information.
- Encryption and Security: Encrypt PDFs with passwords, unlock secured PDFs, and manage security settings.
- Repair Corrupt Files: Detect and fix issues in damaged or corrupt PDF documents.
- PDF/A Conversion: Convert PDFs to PDF/A format for long-term archival and compliance.
- Embedded Font Support: Handle embedded fonts for text consistency and compatibility.
- Performance-Oriented: Optimized for fast and reliable operations with large or complex PDFs.
- Based on QPDF: Leverages the powerful QPDF library for advanced PDF manipulation capabilities.
- Open Source: Free to use and actively maintained by the developer community.
Advantages of using PikePDF API
PikePDF API for Python has the following advantages:
- Metadata Management: Easily add, edit, or remove metadata to enhance PDF organization and searchability.
- PDF/A Support: Convert PDFs to archival formats while preserving or updating metadata.
- Corruption Handling: Repair and restore damaged PDF files without losing metadata.
- Encryption and Security: Manage password protection and encryption while maintaining metadata integrity.
- Custom Metadata: Add custom fields to tailor PDF metadata for specific workflows or business requirements.
- High Performance: Optimized for fast and efficient processing of large and complex PDF files.
- Open Source: Free and actively maintained, offering a reliable and cost-effective solution.
- Based on QPDF: Leverages the powerful features of QPDF for advanced PDF and metadata operations.
Getting Started with PikePDF API for Python
Using PikePDF in your Python applications will require you to install Python 3.9+ version on your system. So, first install Python and then use below commands to install Hachoir API on your machine using pip and virtual environment.
pip install pikepdf
Working with PikePDF API for Python - Examples
You can use PikePDF for reading, writing and updating metadata information of PDF files. The API provides easy to use methods and samples for working with PDF files from within your Python applications.
Read Metadata Information of a File using PikePDF API for Python
Reading metadata information from PDF file using PikePDF file is easy. You can use the following sample code to read the metadata information from any PDF document.
Output
When you execute this code, the output will be somewhat similar to the following:
PDF Metadata:
/Title: Sample PDF Document
/Author: John Doe
/Subject: Example Usage
/Producer: Adobe PDF Library
/CreationDate: D:20241226093000Z
If no metadata information is available in the file, the output will be empty.
Write Metadata Information to a PDF File using PikePDF API for Python
PikePDF can write or update metadata of a PDF file. The library allows you to modify existing metadata fields or add new ones. Here's an example demonstrating how to update metadata in a PDF file:
Here are some common standard fields you can update:
Title: The title of the document.
Author: The author of the document.
Subject: The subject or topic of the document.
Keywords: Keywords associated with the document for search purposes.
Creator: The application that created the document.
Producer: The software that generated the PDF.
CreationDate: The date the document was created.
ModDate: The date the document was last modified.
Conclusion for PikePDF API
PikePDF is a powerful and user-friendly Python library that simplifies the handling of PDF files, especially for metadata management. Built on the robust QPDF library, it offers seamless capabilities to read, write, and update metadata fields, enabling developers to organize, enrich, and customize PDF documents effectively. In addition to metadata operations, PikePDF excels at tasks like repairing corrupt PDFs, managing encryption, and converting files to PDF/A format, making it a versatile tool for a wide range of PDF-related workflows. Its open-source nature, active maintenance, and Pythonic API make it an excellent choice for developers looking for a reliable and efficient solution for PDF processing and metadata management. Whether you’re automating document workflows, ensuring compliance with archival standards, or enhancing PDF metadata for searchability, PikePDF provides the tools you need to work with PDFs effortlessly.