Skip to main content

Simple file collector - compress/serve/send/anonymizie files

Project description

filecollector

build

Service for collecting and processing files (with hooks)

Features

  • collect files and compress them (on command)
  • anonymization
  • run custom scripts on output file / processed files
  • start/stop simple fileserver (on collect output location)

Requirements

  • python 2.7+ / python 3.5+
  • pip

Installation

pip install filecollector

Usage

It has 2 main components right now: collector and server. Collector is responsible to collect/anonymize the files and run hook scripts on those. Server is only a browser for the collected files.

At the start you need to create a yaml configuration file for the collector. Only this configuration is required as an input for filecollector.

Start the collector

filecollector collector start --config filecollector.yaml -p /my/pid/dir

Start the server

filecollector server start --config filecollector.yaml -p /my/pid/dir

Configration

Configuration example

server:
    port: 1999
    folder: "../example/files" 
collector:
    files:
    - path: "example/example*.txt"
      label: "example"
    rules:
    - pattern:  \d{4}[^\w]\d{4}[^\w]\d{4}[^\w]\d{4}
      replacement: "[REDACTED]"
    processFileScript: example/scripts/process_file.sh
    compress: true
    useFullPath: true
    outputScript: example/scripts/output_file.sh
    processFilesFolderScript: example/scripts/tmp_folder.sh
    deleteProcessedTemplateFiles: true
    outputLocation: "example/files"

Configuration options

server

The server block, it contains configurations related with the filecollector server component.

server.port

Port that will be used by the filecollector server.

server.folder

The folder that is server by the file server.

collector

The collector block, it contains configurations related with the filecollector collector component.

collector.files

List of files (with name and label) that needs to be collected. The name options can be used as wildcards.

collector.rules

List of anonymization rules that can be run against the file inputs. (pattern field for matching, replacement for the replacement on match)

collector.compress

At the end of the filecollection, the output folder is compressed. The default value is true.

collector.outputLocation

Output location (directory), where the processed file(s) will be stored.

collector.useFullPath

Use full path for processed files (inside outputLocation). Can be useful if because of the wildcard patterns, the base file name are the same for different files from different folders. Default value is true.

collector.processFileScript

Script that runs agains 1 processed file. It gets the filename and the label for a processed file.

collector.processFilesFolderScript

Script that runs once after the files are collected. It gets the folder name (where the files are processed) as an input.

collector.outputScript

Script that runs once with the compressed output file name as an input.

collector.deleteProcessedTemplateFiles

After collection of the files + compression, the collected files are deleted. Can be useful to disable this behaviour compress option is disabled. Default value is true.

Contributing

  1. Fork it
  2. Create your feature branch (git checkout -b my-new-feature)
  3. Commit your changes (git commit -am 'Add some feature')
  4. Push to the branch (git push origin my-new-feature)
  5. Create new Pull Request

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

filecollector-0.0.1.tar.gz (10.7 kB view details)

Uploaded Source

Built Distribution

filecollector-0.0.1-py2.py3-none-any.whl (12.3 kB view details)

Uploaded Python 2 Python 3

File details

Details for the file filecollector-0.0.1.tar.gz.

File metadata

  • Download URL: filecollector-0.0.1.tar.gz
  • Upload date:
  • Size: 10.7 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.1.1 pkginfo/1.5.0.1 requests/2.23.0 setuptools/41.2.0 requests-toolbelt/0.9.1 tqdm/4.45.0 CPython/3.8.2

File hashes

Hashes for filecollector-0.0.1.tar.gz
Algorithm Hash digest
SHA256 db8fe4ffb4cb954fda386bae64fda6437d1fb6fe8cad10e6fedd1a59bd63ebd1
MD5 db0f62b78aa4fc3f5759e1acf3255536
BLAKE2b-256 f45acf012700e6babf3a71d3e906e93c3dd60d6e4a43b974f4f64e853a43925d

See more details on using hashes here.

File details

Details for the file filecollector-0.0.1-py2.py3-none-any.whl.

File metadata

  • Download URL: filecollector-0.0.1-py2.py3-none-any.whl
  • Upload date:
  • Size: 12.3 kB
  • Tags: Python 2, Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.1.1 pkginfo/1.5.0.1 requests/2.23.0 setuptools/41.2.0 requests-toolbelt/0.9.1 tqdm/4.45.0 CPython/3.8.2

File hashes

Hashes for filecollector-0.0.1-py2.py3-none-any.whl
Algorithm Hash digest
SHA256 dd75912b88c17f4832b58d3ec15a67cfec67e51b6d61b62a90b56264171d9c6f
MD5 435deeff4b06e627a74b9865291bfcf9
BLAKE2b-256 e83fb7161e3b36e4bc88745138ad802ad7dcb9de192093b1f956284b7b649089

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page