Skip to main content

A web service for scanning media hosted by a Matrix media repository

Project description

Matrix Content Scanner

A web service for scanning media hosted on a Matrix media repository.

Installation

This project requires libmagic to be installed on the system. On Debian/Ubuntu:

sudo apt install libmagic1

Then, preferably in a virtual environment, install the Matrix Content Scanner:

pip install matrix-content-scanner

Usage

Copy and edit the sample configuration file. Each key is documented in this file.

Then run the content scanner (from within your virtual environment if one was created):

python -m matrix_content_scanner.mcs -c CONFIG_FILE

Where CONFIG_FILE is the path to your configuration file.

Docker

This project provides a Docker image to run it, published as vectorim/matrix-content-scanner.

To use it, copy the sample configuration file into a dedicated directory, edit it accordingly with your requirements, and then mount this directory as /data in the image. Do not forget to also publish the port that the content scanner's Web server is configured to listen on.

For example, assuming the port for the Web server is 8080:

docker run -p 8080:8080 -v /path/to/your/config/directory:/data vectorim/matrix-content-scanner

API

See the API documentation for information about how clients are expected to interact with the Matrix Content Scanner.

Migrating from the legacy Matrix Content Scanner

Because it uses the same APIs and Olm pickle format as the legacy Matrix Content Scanner, this project can be used as a drop-in replacement. The only change (apart from the deployment instructions) is the configuration format:

  • the server section is renamed web
  • scan.tempDirectory is renamed scan.temp_directory
  • scan.baseUrl is renamed download.base_homeserver_url (and becomes optional)
  • scan.doNotCacheExitCodes is renamed result_cache.exit_codes_to_ignore
  • scan.directDownload is removed. Direct download always happens when download.base_homeserver_url is absent from the configuration file, and setting a value for it will always cause files to be downloaded from the server configured.
  • proxy is renamed download.proxy
  • middleware.encryptedBody.pickleKey is renamed crypto.pickle_key
  • middleware.encryptedBody.picklePath is renamed crypto.pickle_path
  • acceptedMimeType is renamed scan.allowed_mimetypes
  • requestHeader is renamed download.additional_headers and turned into a dictionary.

Note that the format of the cryptographic pickle file and key are compatible between this project and the legacy Matrix Content Scanner. If no file exist at that path one will be created automatically.

Development

In a virtual environment with poetry (>=1.8.3) installed, run

poetry install

To run the unit tests, you can use:

tox -e py

To run the linters and mypy type checker, use ./scripts-dev/lint.sh.

Releasing

The exact steps for releasing will vary; but this is an approach taken by the Synapse developers (assuming a Unix-like shell):

  1. Set a shell variable to the version you are releasing (this just makes subsequent steps easier):

    version=X.Y.Z
    
  2. Update setup.cfg so that the version is correct.

  3. Stage the changed files and commit.

    git add -u
    git commit -m v$version -n
    
  4. Push your changes.

    git push
    
  5. When ready, create a signed tag for the release:

    git tag -s v$version
    

    Base the tag message on the changelog.

  6. Push the tag.

    git push origin tag v$version
    
  7. Create a release, based on the tag you just pushed, on GitHub or GitLab.

  8. Create a source distribution and upload it to PyPI:

    python -m build
    twine upload dist/matrix_content_scanner-$version*
    

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

matrix_content_scanner-1.2.1.tar.gz (63.6 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

matrix_content_scanner-1.2.1-cp312-cp312-manylinux_2_39_x86_64.whl (490.6 kB view details)

Uploaded CPython 3.12manylinux: glibc 2.39+ x86-64

File details

Details for the file matrix_content_scanner-1.2.1.tar.gz.

File metadata

  • Download URL: matrix_content_scanner-1.2.1.tar.gz
  • Upload date:
  • Size: 63.6 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.0.1 CPython/3.12.3

File hashes

Hashes for matrix_content_scanner-1.2.1.tar.gz
Algorithm Hash digest
SHA256 c53c00c1f89ed23aeb60b796d52f78badbf8f7fa05ba44388c776265fcc2a83c
MD5 c07fca4d70978ffb4fa06a90b20f3938
BLAKE2b-256 31654a37af872a5a91f27c5ff69f404cc241aac1f32fab5997e3a7eebc7c909f

See more details on using hashes here.

File details

Details for the file matrix_content_scanner-1.2.1-cp312-cp312-manylinux_2_39_x86_64.whl.

File metadata

File hashes

Hashes for matrix_content_scanner-1.2.1-cp312-cp312-manylinux_2_39_x86_64.whl
Algorithm Hash digest
SHA256 b39d3540b21af5b8cd246a6c0952253f39ae8c27e2f865e410f7afbd7a009b48
MD5 4270e2f1b5973ffcf0f24ebaa7eef5a7
BLAKE2b-256 4888391d7c3a404206556684f97755e026aca33c176099060d532fb47a73efb4

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page