Skip to main content

Helps to find structured metadata from a given file.

Project description

Introduction

file-metadata is a python package that aims to analyze files and find metadata that can be used from it.

Installation

Before installing file-metadata, a few dependencies need to be installed. For Ubuntu, these can be installed with:

$ sudo apt-get install perl openjdk-7-jre python-dev pkg-config \
> libfreetype6-dev libpng12-dev liblapack-dev libblas-dev gfortran \
> cmake libboost-python-dev libzbar-dev

Next, use pip to install the library. To install the latest stable version, use:

$ pip install file-metadata

To get development builds from the master branch of the github repo, use:

$ pip install --pre file-metadata

Usage

To use the package, you first need a file which can be any media file.

Let us first download an example qrcode from commons wikimedia:

$ wget https://upload.wikimedia.org/wikipedia/commons/5/5b/Qrcode_wikipedia.jpg -O qrcode.jpg

And now, let us create a File object from this:

>>> from file_metadata.generic_file import GenericFile
>>> qr = GenericFile.create('qrcode.jpg')

Notice that when creating the file, the class automatically finds the best type of class to analyze the file. In this case, it auto detecs that the file is an image file, and uses the ImageFile class:

>>> qr.__class__.__name__
'ImageFile'

Now, to find possible analysis routines supported for the file, help(qr) can be checked. All routines beginning with analyze_ perform analysis. As the example we have is a qrcode, let us use analyze_barcode_zxing():

>>> qr.analyze_barcode_zxing()
{'zxing:Barcodes': [{'data': 'http://www.wikipedia.com',
   'format': 'QR_CODE',
   'points': [(50.0, 316.0), (50.0, 52.0), (314.0, 52.0), (278.0, 280.0)],
   'raw_data': 'http://www.wikipedia.com'}]}

Which tells us the bounding box of the barcode (points) and also the data (http://www.wikipedia.com). It also mentions that the format of the barcode is QR_CODE.

Similarly, to check the mimetype, the analysis routing analyze_mimetype() can be used:

>>> qr.analyze_mimetype()
{'File:MIMEType': 'image/jpeg'}

To perform all the analyze routines on the image, the analyze() method can be used. It runs all the analysis routines on the file and gives back the merged result:

>>> qr.analyze()

Development

Testing

To test the code, install dependencies using:

$ pip install -r test-requirements.txt

and then execute:

$ python -m pytest

Docker

To pull the latest docker image use:

$ docker pull pywikibotcatfiles/file-metadata
Supported tags and respective Dockerfile links:

For more information about this image and its history, please see pywikibotcatfiles/file-metadata (on docker-hub). This image is updated via push to the pywikibot-catfiles/docker-file-metadata GitHub repo or the pywikibot-catfiles/file-metadata GitHub repo (by Triggering builds through the Travis CI API).

Build status

https://travis-ci.org/pywikibot-catfiles/file-metadata.svg?branch=master https://codecov.io/gh/pywikibot-catfiles/file-metadata/branch/master/graph/badge.svg https://travis-ci.org/pywikibot-catfiles/docker-file-metadata.svg?branch=master

Credits

This package has been derived from pywikibot-compat. Specifically, the script catimages.py which can be found at pywikibot-compat/catimages.py. These packages were created by DrTrigon who is the original author of this package.

LICENSE

https://img.shields.io/github/license/pywikibot-catfiles/file-metadata.svg

This code falls under the MIT License. Please note that some files or content may be copied from other places and have their own licenses. Dependencies that are being used to generate the databases also have their own licenses.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

file-metadata-0.2.0.tar.gz (631.1 kB view details)

Uploaded Source

File details

Details for the file file-metadata-0.2.0.tar.gz.

File metadata

File hashes

Hashes for file-metadata-0.2.0.tar.gz
Algorithm Hash digest
SHA256 96a67e8b16f898a0acfaef3948467e94fda0db5dcd99f2a5c9f89766d6097306
MD5 4d545e1375e020a73cca147f092fac19
BLAKE2b-256 230111b9557430c7a53c330e1ea20b05fb785c19169113b6b487456db37175ba

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page