Open Source Python PDF Merger Library
Try this user-friendly, open-source Python library that allows you to effortlessly split, join, rotate, swap, and delete pages, making it a versatile tool for your PDF document needs.
What is PyMuPDF?
PyMuPDF, also known as Fitz, is an open-source Python library that provides a comprehensive set of tools for working with PDF files. With PyMuPDF, users can efficiently perform tasks such as opening PDFs, extracting text and images, manipulating page properties like rotation and cropping, creating new PDF documents, and converting PDF pages to images.
PyMuPDF supports several features. However, in this review our primary focus will be on the PDF splitting, merging and page management features of the library. For an in-depth evaluation of extraction and parsing capabilities, please click here.
Getting Started with PyMuPDF
You need Python version 3.8.0 or higher to install and use PyMuPDF. So, first install Python and then use below commands to install PyMuPDF on your machine using pip and virtual environment.
Linux
python -m venv pymupdf-venv
. pymupdf-venv/bin/activate
pip install pymupdf
MacOS
python -m venv pymupdf-venv
. pymupdf-venv/bin/activate
pip install pymupdf
Windows
python -m venv pymupdf-venv
.\pymupdf-venv\Scripts\activate
pip install pymupdf
Join Multiple PDFs into One
Using PyMuPDF library, we can combine multiple PDFs into a single PDF file in Python. The following code snippet joins two PDF documents together by appending one after the other and save it as a new document:
Split PDF into Multiple Files
It is also possible to split a PDF document into multiple PDFs in Python using PyMuPDF library. The following code snippet splits the first two pages of a document and stores them as a separate PDF:
Rotate PDF Pages
We can also rotate pages of a PDF file using PyMuPDF library. We will use the set_rotation function to rotate the pages in below code snippet:
Output
As we can see, the document is rotated by 90 degrees.
Delete PDF Pages
PyMuPDF can be also used to delete pages of a PDF file. We will use the delete_page function to delete the pages. Following is the document which is the input and the code will delete its second page:
Output
The image below displays the modified PDF file, from which the second page has been removed.
Conclusion
PyMuPDF boasts remarkable strengths in merging and page manipulation within PDF documents. Its flexibility and efficiency in rotating, cropping, resizing, and deleting pages make it a robust choice for PDF modification tasks. Additionally, PyMuPDF's ability to merge multiple PDF documents seamlessly is a notable advantage.
However, its relatively complex API may present a learning curve for newcomers, and there might be some limitations in handling extremely large or complex PDFs, which could impact performance. Nonetheless, its extensive capabilities in these areas make it a valuable tool for those seeking precise control over PDF content.