A pipeline-based file processing library.

These details have not been verified by PyPI

Project links

Project description

filerohr

filerohr is a pipeline-based file processing library and CLI tool.

Users can configure a custom processing pipeline to suit their needs using freely interchangeable tasks.

filerohr comes with a number of built-in tasks that specialize in audio processing, all based on ffmpeg and ffprobe. Adding new task definitions is relatively easy and only requires some knowledge of Python.

NOTE: filerohr currently is a proof-of-concept and doesn’t have any tests yet.

filerohr’s name is a wordplay on the german Fallrohr (literally droppipe).

CLI

filerohr comes with a CLI that can be executed with uv run python -m filerohr

Quick start:

# validate a pipeline config
uv run python -m filerohr validate-config my-pipeline.yaml

# Show available jobs
uv run python -m filerohr list-jobs

# import a file
uv run python -m filerohr import-file --config my-pipeline.yaml ~/Music/song.mp3

Configuration

filerohr itself is mostly configured through environment variables. The pipeline configuration is YAML and is documented below.

Environment variables

Environment variable configuration options include:

TZ : The local timezone (e.g. Europe/Vienna)

FILEROHR_FFMPEG_BIN : Path to the ffmpeg binary

FILEROHR_FFPROBE_BIN : Path to the ffprobe binary

FILEROHR_DATA_DIR : Path to the data directory

FILEROHR_TMP_DIR : Path to the temporary directory

FILEROHR_TASK_MODULES : comma-separated list of Python modules to import as task definitions

FILEROHR_PIPELINE_CONFIG_DIR : Path to the pipeline configuration file directory

Most of these variables will be set to sensible defaults in the upcoming container image.

Pipeline configuration

The following pipeline configuration is based on filerohr’s built-in tasks:

# You can give pipelines a name.
# That makes it easy to reference them in an API or with the CLI.
name: audio
# Pipelines can be marked as default (but only one).
use_as_default: true
jobs:
  # Downloads the file if a remote URL was provided.
  # Skips them otherwise.
  - job: download_file
    match_content_type: ["audio/*", "video/*", "pass_unset"]
    max_download_size_mib: 25
  # Logs the file size.
  - job: log_file_size
  # Sanitizes the file as audio/video.
  # Automatically skips the file if it is not audio/video.
  - job: sanitize_av
  # Extracts all audio streams as separate files.
  - job: extract_audio
  # Extract metadata from the audio files.
  - job: extract_audio_metadata
  # Normalize the audio file with the podcast preset.
  - job: normalize_audio
    preset: podcast
  # Converts the audio file (if necessary).
  # Only allow opus and flac and convert to flac as needed.
  - job: convert_audio
    allowed_formats: ["opus", "flac"]
    fallback_format: flac
  # Now that the last job that could have changed the audio format is finished,
  # we can extract the mime type from the audio file.
  - job: extract_mime_type
  # Log the file size again.
  # This time it will include the reduction in file size in percent.
  - job: log_file_size
  # Hash the file content with SHA3 512.
  # This will be re-used in store_by_content_hash.
  - job: hash_file
    alg: sha3_512
  # Store the files by their content hash.
  - job: store_by_content_hash
    storage_dir: /home/you/data/audio/by-hash
    # Emit the stored file in the hash-based directory as a pipeline result.
    emit: true
  # Additionally, store by upload date, but only symlink to files in hash storage.
  - job: store_by_creation_date
    storage_dir: /home/you/data/audio/by-hash
    symlink: true
    # Do not emit but keep the file in the upload-date-based directory.
    keep: true

Not that you can include jobs multiple times. This can be helpful e.g., if you do file size logging, before and after converting audio streams.

Built-in tasks

convert_audio

Ensures an audio file matches the allowed formats.

Audio files that don’t match the formats will be converted to the fallback format.

Note: This job must be placed after an extract_audio job.

Configuration options:

skip: `boolean`, default: `False`

allowed_formats: `string[]`, required

List of allowed audio formats. Use 'any' to allow any audio format.

Examples:

['vorbis', 'flac', 'opus']

fallback_format: `string`, default: `'flac'`

The format to fallback to if the detected format is not allowed. ffmpeg often has different encoders for the same audio format. Some of these encoders are experimental, e.g. you probably want to uselibopus over opus. If you select an experimental encoder, you might have to add ['-strict', '-2'] to fallback_format_encoder_args to avoid errors.

fallback_format_ext: `string | null`, default: `null`

The file extension of the fallback format. This is automatically inferred for the most common audio formats.

fallback_format_encoder_args: `string[]`, required

Additional arguments passed to ffmpeg when encoding to the fallback format.

discard_remote

Stops the pipeline if the current file is not a local file.

Configuration options:

skip: `boolean`, default: `False`

download_file

Downloads files from remote sources.

When called with a local file path, the job will simply be skipped.

Configuration options:

skip: `boolean`, default: `False`

storage_dir: `string | null`, default: `null`

Base directory to store downloaded files in.

max_download_size_mib: `number`, default: `1024`

Maximum size allowed for downloaded files in MiB (megabytes). Use 0 for no limit.

allow_streams: `boolean`, default: `False`

Whether to download files from sources that are streaming responses and do not have a fixed file size.

allowed_protocols: `('http' | 'https')[]`, default: `['http', 'https']`

List of allowed protocols to download files from.

timeout_seconds: `integer`, default: `600`

Timeout in seconds for downloading.

follow_redirects: `boolean`, default: `True`

Follow redirects when downloading.

match_content_type: `(string | 'pass_unset')[] | null`, default: `null`

List of mime types to check. Supports glob patterns like audio/*. Add pass_unset to the list if you want this check to pass in case no Content-Type was given in the server’s response headers. Setting match_content_type to null will allow all content types.

download_chunk_size_mib: `number`, default: `1`

Size of chunks to download in MiB (megabytes).

extract_audio

Extracts all audio streams in the job’s file and saves them as separate files.

In case more than one stream is found, this job will spawn a subsequent job for each stream file.

If you only want to extract a single stream, set count: 1 in the job configuration.

Configuration options:

skip: `boolean`, default: `False`

count: `integer`, default: `0`

Maximum number of audio streams to extract. This is useful for files that contain multiple audio streams. Extracts all audio streams if number is 0.

extract_audio_metadata

Extracts metadata, like duration, artist, title, album from the job’s file.

Configuration options:

skip: `boolean`, default: `False`

extract_mime_type

Extracts the mime type of the job’s file.

Configuration options:

skip: `boolean`, default: `False`

ffmpeg

Run a custom ffmpeg command with args specified in the job configuration. Ensure that you include {input_file} and {output_file} placeholders.

Configuration options:

skip: `boolean`, default: `False`

args: `string[]`, required

FFMPEG command line arguments. Must contain {input_file} and {output_file} as literal placeholder strings for actual file paths.

Examples:

['-i', '{input_file}', '-vn', '{output_file}']

output_format: `string`, required

Output file format. Must be a valid file extension understood by ffmpeg.

Examples:

'.mp3'
'.flac'

hash_file

Hash the file content.

Configuration options:

skip: `boolean`, default: `False`

alg: `'blake2b' | 'blake2s' | 'md5' | 'sha1' | 'sha224' | 'sha256' | 'sha384' | 'sha3_224' | 'sha3_256' | 'sha3_384' | 'sha3_512' | 'sha512' | 'shake_128' | 'shake_256'`, default: `'sha256'`

Hash algorithm to use.

keep_file

Keep the current job file and do not delete it after the pipeline finishes.

This can be helpful to debug intermediate job output.

Configuration options:

skip: `boolean`, default: `False`

log_file_size

Logs the file size in MiB.

If used multiple times in the same pipeline config, it will also display the change in file size compared to the reference.

Configuration options:

skip: `boolean`, default: `False`

force_update_reference: `boolean`, default: `False`

If set to true, the reference used to calculate changes in the file size between jobs will be updated. Implicitly true on first call.

normalize_audio

Normalizes the audio stream with ffmpeg-normalize.

If no additional options are given, the podcast preset will be used.

Note: This job must be placed after an extract_audio job.

Configuration options:

skip: `boolean`, default: `False`

preset: `'music' | 'podcast' | 'streaming-video' | null`, default: `null`

The audio normalization preset to use. See https://slhck.info/ffmpeg- normalize/usage/presets/ for available presets. Mutually exclusive with args.

args: `string[]`, required

ffmpeg-normalize command line arguments. See https://slhck.info/ffmpeg-normalize/usage/cli-options/ for available options. Mutually exclusive with preset.

sanitize_av

Sanitizes the job file as an audio/video stream. This performs a simple copy operation for all streams in the job’s file and ensures that the resulting file is playable.

If the file is not an audio file, it will be skipped by default.

Configuration options:

skip: `boolean`, default: `False`

error_handling: `'ignore' | 'ignore_minor'`, default: `'ignore'`

How to handle errors during file sanitization of broken files. ignore will ignore all but critical errors in the audio and corresponds to ffmpeg’s -err_detect ignore_err. The resulting file will play, but may not be pleasant to listen to. ignore_minor will let minor errors pass and corresponds to ffmpeg’s -err_detect careful.

Examples:

'ignore'
'ignore_minor'

skip_invalid: `boolean`, default: `True`

Silently skip files that do not contain audio.

store_by_content_hash

Stores the job file in a directory-tree based on the file’s content hash.

The generated directory structure looks like this:

audio_dir/
  8d/
    a9/
      8da9bce68a6aebdcba325cf21402c78c1628c9da1278b817a600cdd92b720653.flac
      8da9fc4939da378a720ba1ba310d3d7a1a85e44b79cd4c68bfc4bd3081f01062.flac
  e0/
    0a/
      e00afc4939da378a720ba1ba310d3d7a1a85e44b79cd4c68bfc4bd3081f01062.flac

Configuration options:

skip: `boolean`, default: `False`

storage_dir: `string | null`, default: `null`

Base directory to store files in.

symlink: `boolean`, default: `False`

If set to true, a symlink is created to the file from the previous job. In that case, the previous job must create the file in a persistent storage location.

relative: `boolean`, default: `True`

If set to true, the symlink will be relative to the storage path.

keep: `boolean`, default: `False`

If set to true, the file is kept after the pipeline finishes.

emit: `boolean`, default: `False`

If set to true, the file is emitted as a pipeline result. Implies keep.

alg: `'blake2b' | 'blake2s' | 'md5' | 'sha1' | 'sha224' | 'sha256' | 'sha384' | 'sha3_224' | 'sha3_256' | 'sha3_384' | 'sha3_512' | 'sha512' | 'shake_128' | 'shake_256' | null`, default: `null`

Hash algorithm to use. If unset the job will try to re-use an existing file hash calculated in a hash_file job. Otherwise, a new hash will be calculated and stored along with the file. In case the file already has a hash but uses a different algorithm, a new hash will be calculated for storage but the hash on the file will be kept.

levels: `integer`, default: `2`

Number of directory levels that should be created for stored files.

chars_per_level: `integer`, default: `2`

Number of hash-characters per level.

store_by_creation_date

Stores the job file in a directory-tree based on the creation date.

The generated directory structure looks like this:

audio_dir/
  2025/
    11/
      20/
        filename2.flac
        filename3.flac
    10/
      16/
        filename1.flac

Configuration options:

skip: `boolean`, default: `False`

storage_dir: `string | null`, default: `null`

Base directory to store files in.

symlink: `boolean`, default: `False`

If set to true, a symlink is created to the file from the previous job. In that case, the previous job must create the file in a persistent storage location.

relative: `boolean`, default: `True`

If set to true, the symlink will be relative to the storage path.

keep: `boolean`, default: `False`

If set to true, the file is kept after the pipeline finishes.

emit: `boolean`, default: `False`

If set to true, the file is emitted as a pipeline result. Implies keep.

month: `boolean`, default: `True`

Include month in storage subdirectories.

day: `boolean`, default: `True`

Include day in storage subdirectories.

Custom tasks

You can define custom tasks by creating a python module/file. After that you need to ensure set the FILEROHR_TASK_MODULES environment variable to the name of your module. If you have created multiple modules, you can separate them by a comma.

You can find examples for tasks in the my-custom-tasks.py file or in filerohr’s own filerohr/tasks/ directory.

When implementing a custom task, there are three important guidelines:

DO NOT block the loop.

All code must run asynchronously, also known as cooperative multitasking. filerohr includes the aiofiles and pebble libraries and some additional helpers in filerohr.utils. Use these to do blocking IO or CPU-intensive work.
Call job.next() when you are done with the task.

That is, if you want the pipeline to move on. If you don’t call job.next() the pipeline will stop after your job finished. Sometimes a job might do that intentionally, but most of the time you don’t want that.

You may also call job.next() multiple times, if you want to enqueue multiple follow-up jobs. This may be something you do, when you create multiple new files in your job (like filerohr.tasks.av.extract_audio does).
Don’t modify the job.file directly.

Instead, call job.next(file.clone(path=...)).

Project details

These details have not been verified by PyPI

Project links

Release history Release notifications | RSS feed

0.8.0

Feb 25, 2026

0.7.2

Feb 24, 2026

0.7.1

Feb 24, 2026

0.7.0

Feb 24, 2026

This version

0.6.0

Feb 21, 2026

0.5.2

Feb 19, 2026

0.5.1

Feb 19, 2026

0.5.0

Feb 19, 2026

0.4.0

Feb 18, 2026

0.3.0

Feb 17, 2026

0.2.2

Feb 8, 2026

0.2.1

Feb 8, 2026

0.2.0

Feb 8, 2026

0.1.0

Jan 23, 2026

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

filerohr-0.6.0.tar.gz (65.8 kB view details)

Uploaded Feb 21, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

filerohr-0.6.0-py3-none-any.whl (45.3 kB view details)

Uploaded Feb 21, 2026 Python 3

File details

Details for the file filerohr-0.6.0.tar.gz.

File metadata

Download URL: filerohr-0.6.0.tar.gz
Upload date: Feb 21, 2026
Size: 65.8 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: uv/0.10.2 {"installer":{"name":"uv","version":"0.10.2","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"Arch Linux","version":null,"id":null,"libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":null}

File hashes

Hashes for filerohr-0.6.0.tar.gz
Algorithm	Hash digest
SHA256	`785d5dc69c298e409fba6ad3ff12066abc333f064297e5e3fa7576e3c8eb154b`
MD5	`7bfb2c7211da383e85a06bb0224f6895`
BLAKE2b-256	`099cb277edd8afec7bb7899a571cb474ae608bb664ea3f12c09d937788764e46`

See more details on using hashes here.

File details

Details for the file filerohr-0.6.0-py3-none-any.whl.

File metadata

Download URL: filerohr-0.6.0-py3-none-any.whl
Upload date: Feb 21, 2026
Size: 45.3 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: uv/0.10.2 {"installer":{"name":"uv","version":"0.10.2","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"Arch Linux","version":null,"id":null,"libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":null}

File hashes

Hashes for filerohr-0.6.0-py3-none-any.whl
Algorithm	Hash digest
SHA256	`e3822442dabfed0c1a4494dbd00aa95491b94bf3f0a811f465c95dab25bc4cda`
MD5	`0ee05c821ec517169da6a711ca258d68`
BLAKE2b-256	`1c7745f4312889f5db9d055e25304bfe2600625bac73be16b16773c3b3381322`

See more details on using hashes here.

filerohr 0.6.0

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Project description

filerohr

CLI

Configuration

Environment variables

Pipeline configuration

Built-in tasks

convert_audio

Configuration options:

skip: boolean, default: False

allowed_formats: string[], required

fallback_format: string, default: 'flac'

fallback_format_ext: string | null, default: null

fallback_format_encoder_args: string[], required

discard_remote

Configuration options:

skip: boolean, default: False

download_file

Configuration options:

skip: boolean, default: False

storage_dir: string | null, default: null

max_download_size_mib: number, default: 1024

allow_streams: boolean, default: False

allowed_protocols: ('http' | 'https')[], default: ['http', 'https']

timeout_seconds: integer, default: 600

follow_redirects: boolean, default: True

match_content_type: (string | 'pass_unset')[] | null, default: null

download_chunk_size_mib: number, default: 1

extract_audio

Configuration options:

skip: boolean, default: False

count: integer, default: 0

extract_audio_metadata

Configuration options:

skip: boolean, default: False

extract_mime_type

Configuration options:

skip: boolean, default: False

ffmpeg

Configuration options:

skip: boolean, default: False

args: string[], required

output_format: string, required

hash_file

Configuration options:

skip: boolean, default: False

alg: 'blake2b' | 'blake2s' | 'md5' | 'sha1' | 'sha224' | 'sha256' | 'sha384' | 'sha3_224' | 'sha3_256' | 'sha3_384' | 'sha3_512' | 'sha512' | 'shake_128' | 'shake_256', default: 'sha256'

keep_file

Configuration options:

skip: boolean, default: False

log_file_size

Configuration options:

skip: boolean, default: False

force_update_reference: boolean, default: False

normalize_audio

Configuration options:

skip: boolean, default: False

preset: 'music' | 'podcast' | 'streaming-video' | null, default: null

args: string[], required

sanitize_av

Configuration options:

skip: boolean, default: False

error_handling: 'ignore' | 'ignore_minor', default: 'ignore'

skip_invalid: boolean, default: True

store_by_content_hash

Configuration options:

skip: boolean, default: False

storage_dir: string | null, default: null

symlink: boolean, default: False

relative: boolean, default: True

keep: boolean, default: False

emit: boolean, default: False

alg: 'blake2b' | 'blake2s' | 'md5' | 'sha1' | 'sha224' | 'sha256' | 'sha384' | 'sha3_224' | 'sha3_256' | 'sha3_384' | 'sha3_512' | 'sha512' | 'shake_128' | 'shake_256' | null, default: null

skip: `boolean`, default: `False`

allowed_formats: `string[]`, required

fallback_format: `string`, default: `'flac'`

fallback_format_ext: `string | null`, default: `null`

fallback_format_encoder_args: `string[]`, required

skip: `boolean`, default: `False`

skip: `boolean`, default: `False`

storage_dir: `string | null`, default: `null`

max_download_size_mib: `number`, default: `1024`

allow_streams: `boolean`, default: `False`

allowed_protocols: `('http' | 'https')[]`, default: `['http', 'https']`

timeout_seconds: `integer`, default: `600`

follow_redirects: `boolean`, default: `True`

match_content_type: `(string | 'pass_unset')[] | null`, default: `null`

download_chunk_size_mib: `number`, default: `1`

skip: `boolean`, default: `False`

count: `integer`, default: `0`

skip: `boolean`, default: `False`

skip: `boolean`, default: `False`

skip: `boolean`, default: `False`

args: `string[]`, required

output_format: `string`, required

skip: `boolean`, default: `False`

alg: `'blake2b' | 'blake2s' | 'md5' | 'sha1' | 'sha224' | 'sha256' | 'sha384' | 'sha3_224' | 'sha3_256' | 'sha3_384' | 'sha3_512' | 'sha512' | 'shake_128' | 'shake_256'`, default: `'sha256'`

skip: `boolean`, default: `False`

skip: `boolean`, default: `False`

force_update_reference: `boolean`, default: `False`

skip: `boolean`, default: `False`

preset: `'music' | 'podcast' | 'streaming-video' | null`, default: `null`

args: `string[]`, required

skip: `boolean`, default: `False`

error_handling: `'ignore' | 'ignore_minor'`, default: `'ignore'`

skip_invalid: `boolean`, default: `True`

skip: `boolean`, default: `False`

storage_dir: `string | null`, default: `null`

symlink: `boolean`, default: `False`

relative: `boolean`, default: `True`

keep: `boolean`, default: `False`

emit: `boolean`, default: `False`

alg: `'blake2b' | 'blake2s' | 'md5' | 'sha1' | 'sha224' | 'sha256' | 'sha384' | 'sha3_224' | 'sha3_256' | 'sha3_384' | 'sha3_512' | 'sha512' | 'shake_128' | 'shake_256' | null`, default: `null`

levels: `integer`, default: `2`

chars_per_level: `integer`, default: `2`

skip: `boolean`, default: `False`

storage_dir: `string | null`, default: `null`

symlink: `boolean`, default: `False`

relative: `boolean`, default: `True`

keep: `boolean`, default: `False`

emit: `boolean`, default: `False`

month: `boolean`, default: `True`

day: `boolean`, default: `True`