A package to add metadata tags to objects saved in s3
AWS S3 Metadata Tagger
The S3 Metadata tagger adds information in the form of metadata to files saved in S3.
To do this, the central handler takes a file location and a metadata extracting function.
It first checks whether the file already contains the requested information via a
If it does not, it downloads the file, invokes extracting function, and adds the metadata to
the s3 object with a inplace
COPY, MetadataDirective="REPLACE" operation.
This package comes with two optional variants for metadata extraction:
- pdf: for determining the number of pages in a pdf
- picture: for determining the dimension of an image
The entrypoint into the tagger is the
It expects an
object_tagger.S3ObjectPath(key, bucket) and a
object_tagger.MetadataHandler(already_tagged, extraction_function, versioning_tag) object as its parameters.
The parameters of the
MetadataHandler are as follows:
already_tagged: a function which receives the metadata tags already present on the object, and returns a boolean indicating whether the object should be tagged.
extraction_function: a function receiving the path to the downloaded object, and returning a
string -> stringdictionary embodying the metadata to add to the object
string -> stringdictionary which contains further tags to add to the s3 object, which can for example be used for tag versioning
The function tries to extract the metadata and add it to the object for up to three times. On success, the added metadata is returned, upon failure an exception is thrown.
For an example, see the service utilizing this library for automatically tagging pdfs uploaded to s3 via aws lambda in the examples directory.
contains the higher-level orchestration:
object_tagger.pycontains all the logic for checking whether the file has already been tagged, downloading it, invoking the metadata script, creating the tag object, and adding it to the s3 resource.
The metadata scripts are stored in their respective folders
The pdf tagger uses PyPDF2 to determine the amount of pages in a pdf.
Install with the
[pdf] extra option.
Using Pillow, the script gets the
height of the passed image.
Install with the
[picture] extra option.
picture_tagger come with unittests.
There is also an integration test in
tests/test_object_tagger.py, which expects
a localstack instance to run in the background.
Furthermore, the following environment variables need to be set:
LOCALSTACK_S3_ENDPOINT_URL=http://localhost:4566 AWS_ACCESS_KEY_ID=test AWS_SECRET_ACCESS_KEY=test
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Hashes for s3_metadata_tagger-1.0.1-py3-none-any.whl