Open Source Python Metadata Library
Free & open source Python library to read, edit and update metadata of documents.
What is Hachoir-metadata API for Python?
hachoir-metadata is a Python library that is part of the broader Hachoir project, designed for parsing and extracting metadata from a wide variety of file types. It provides tools to read metadata without needing to decompress or fully decode the files, making it lightweight and efficient for basic metadata inspection tasks.
Features of hachoir-metadata API
hachoir-metadata is a powerful API that has rich features as follow:
- File Type Support: Works with many file formats, including images, videos, audio files, archives, and documents.
- Metadata Extraction: Extracts basic metadata such as file size, creation date, modification date, and more format-specific properties (e.g., EXIF for images, codecs for videos, etc.).
- Read-Only Operations: Focuses on reading and inspecting metadata without modifying the original file.
- File Type Agnostic: Automatically detects file types and extracts metadata accordingly.
- Integration: Can be integrated into Python applications for use in workflows like content organization, digital forensics, and archival systems.
Modes of hachoir-metadata API
hachoir-metadata has three modes:
- classic mode: extract metadata, you can use –level=LEVEL to limit quantity of information to display (and not to extract)
- --type: show on one line the file format and most important informations
- --mime: just display file MIME type
Getting Started with Hachoir API for Python
In order to use Hachoir API for Python, you need to install Python 3.6+ version and Hachoir on your system. So, first install Python and then use below commands to install Hachoir API on your machine using pip and virtual environment.
pip install hachoir
1. Checkout the source code from Github repository git clone git://github.com/vstinner/hachoir.git
2. Run setup.py to install the module from source python setup.py install [--user|--prefix=]
Working with hachoir-metadata API for Python - Examples
hachoir-metadata API for Python lets you read the metadata information from media file types. With just a few lines of code, you can develop powerful applications that can read metadata information from different file formats. The following code samples show how the hachoir-metadata API can be used in Python applications.
Working with hachoir-metadata API for Python - Examples
pyExifTool provides you support for reading metadata of a variety of file formats such as PDF, BMP, JPEG, DOCX, XLSX and many others. The API lets you read the metadata information of a file using the get_metadata method. Check the below code snippet where we read the metadata information from a PDF file.
Output
When you execute this code, the output will be somewhat similar to the following (depending upon the inforamtion available in your sample file:
Metadata:
- Duration: 1 min 56 sec 261 ms
- Image width: 1280 pixels
- Image height: 720 pixels
- Creation date: 1904-01-01 00:00:00
- Last modification: 1904-01-01 00:00:00
- Comment: Play speed: 100.0%
- Comment: User volume: 100.0%
- MIME type: video/mp4
- Endianness: Big endian
Conclusion
The hachoir-metadata API offers a powerful yet lightweight solution for extracting metadata from a wide variety of file formats, making it an excellent tool for python developers working in fields like digital forensics, content management, and data analysis. Its ability to parse files without modification ensures data integrity, while its Pythonic interface simplifies integration into applications and workflows. With support for diverse file types and metadata properties, hachoir-metadata is a versatile choice for quick and efficient metadata inspection for both personal as well as professional projects/systems.