Default parser which can be used by Paperless-ngx if there is no other suitable parser found for a given mime type.

These details have not been verified by PyPI

Project links

Development Status
- 5 - Production/Stable
Operating System
- OS Independent
Programming Language
- Python :: 3

Project description

Default parser for Paperless-ngx

This is a default parser which can be used by Paperless-ngx if there is no other suitable parser found for a given mime type.

It allows to archive documents of all mime types, which are defined in /etc/mime.types. For every document consumed by this parser, the original file gets archived and a PDF as well as a thumbnail are generated.

If a file with known encoding is parsed, the content of this file is read and stored in the document's content metadata. Furthermore a PDF showing this content is generated. Otherwise the content metadata is left empty and a PDF containing the following note is generated:

This document was archived by a default parser for Paperless-ngx. 

original file name: $file_name
mime type: $mime_type

Download original file to work with it.

Prerequisites

This parser requires Gotenberg to be configured for Paperless-ngx.

Installation

Install using PyPI

pip install paperlessngx-default-parser

For docker based installations use custom container initialization as described here: https://docs.paperless-ngx.com/advanced_usage/#custom-container-initialization

Place a script with the following content in the directory for your container initialization scripts and make it executable:
```
#!/bin/bash
pip install paperlessngx-default-parser
```
Add this parser to the PAPERLESS_APPS environment variable, e.g. in your paperless.conf: PAPERLESS_APPS="paperlessngx-default-parser.apps.DefaultParserConfig"

FAQ

Error: File type {mime-type} not supported

Paperless-ngx uses magic numbers to identify the mime type of a file which should be consumed/archived.

On the other hand Paperless-ngx currently requires a custom parser to define a dictionary of mime-types and one default extension per mime type it supports, see also Support for arbitrary binary files? #805 for a proposal to change this behaviour.

This default parser registers itself for all mime types defined in /etc/mime.types. It uses the first file extension defined in /etc/mime for a given mime type as the default extension for this mime type - or an empty string, if there is no extension defined at all.

Since the magic numbers database and /etc/mime.types don't have to be - and in fact are not - in sync, the following situation might occur:

Paperless-ngx identifies - by using magic numbers - a mime type which is not listed in /etc/mime.types. This results in the error File type {mime-type} not supported because the default parser could not register itself for this mime type.

Solution: Add the missing mime type to /etc/mime.types.

Error: Not consuming file {filepath}: Unknown file extension.

Paperless-ngx at the moment handles files differently if they are imported via the consumption directory or via UI.

When importing a file via UI, Paperless-ngx (solely) checks the mime type of the file using magic numbers and checks if there is a parser registered for this mime type.

When importing a file through the consumption directory an additional check is done at first:

Paperless-ngx collects all file extensions for the given mime type by looking at

/etc/mime.types and
the default extension a parser for this mime type declares.

A file in the consumption directory then is only consumed if its file extension matches one of theses extensions.

For example:

Given a file test.yaml which has mime type text/plain.

Importing via UI successfully archives the document. Importing the same document via the consumption directory leads to error Not consuming file /usr/src/paperless/consume/test.yaml: Unknown file extension.

Solution: Either import the file via UI or add the unknown file extension to the file extensions for this mime type in /etc/mime.types.

File extension when downloading original file

At the moment Paperless-ngx uses a default extension per mime type when downloading an original file.

For example: files of mime type application/octet-stream will get file extension .bin, those with mime-type text/plain will get extension .txt when downloaded.

Solution: In order to use a file with a program it originates of, you may therefore have to change the file extension of the downloaded file manually.

How to modify /etc/mime.types used by Paperless-ngx

For example:

Add missing mime types using add_missing_mime_types.sh (see examples there)
Create your own custom container initialization script to add/modify mime types.
Use your own mime.types file and bind it to /etc/mime.types

Project details

These details have not been verified by PyPI

Project links

Development Status
- 5 - Production/Stable
Operating System
- OS Independent
Programming Language
- Python :: 3

Release history Release notifications | RSS feed

This version

2.0.0

Mar 3, 2026

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

paperlessngx_default_parser-2.0.0.tar.gz (18.6 kB view details)

Uploaded Mar 3, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

paperlessngx_default_parser-2.0.0-py3-none-any.whl (19.0 kB view details)

Uploaded Mar 3, 2026 Python 3

File details

Details for the file paperlessngx_default_parser-2.0.0.tar.gz.

File metadata

Download URL: paperlessngx_default_parser-2.0.0.tar.gz
Upload date: Mar 3, 2026
Size: 18.6 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.12.3

File hashes

Hashes for paperlessngx_default_parser-2.0.0.tar.gz
Algorithm	Hash digest
SHA256	`9aa8d75c66e1994cdb588d31db1ea150e003aeb0ef7059e5b239f6e3400dcb13`
MD5	`b9c1c2e81891af06dadb24bdbb0a4f92`
BLAKE2b-256	`8aa21c22d664c4b686ba7c8f37a540e87cb3534192dcce687d01f1e5cdff2957`

See more details on using hashes here.

File details

Details for the file paperlessngx_default_parser-2.0.0-py3-none-any.whl.

File metadata

Download URL: paperlessngx_default_parser-2.0.0-py3-none-any.whl
Upload date: Mar 3, 2026
Size: 19.0 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.12.3

File hashes

Hashes for paperlessngx_default_parser-2.0.0-py3-none-any.whl
Algorithm	Hash digest
SHA256	`0bea625d5fc8144245f0b6c2816edd61f1c133aad81faab9df305f1dc916d49d`
MD5	`a9361a300b1463c3ca51c1227821f4ca`
BLAKE2b-256	`28fc225bcaf271c5e518fe38f5a0b9d159b8015d5dbbb892b4b2c92cb430648e`

See more details on using hashes here.

paperlessngx-default-parser 2.0.0

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

Default parser for Paperless-ngx

Prerequisites

Installation

FAQ

Error: File type {mime-type} not supported

Error: Not consuming file {filepath}: Unknown file extension.

File extension when downloading original file

How to modify /etc/mime.types used by Paperless-ngx

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes