Default parser which can be used by Paperless-ngx if there is no other suitable parser found for a given mime type.
Project description
Default parser for Paperless-ngx
This is a default parser which can be used by Paperless-ngx if there is no other suitable parser found for a given mime type.
It allows to archive documents of all mime types, which are defined in /etc/mime.types. For every document consumed by this parser, the original file gets archived and a PDF as well as a thumbnail are generated.
If a file with known encoding is parsed, the content of this file is read and stored in the document's content metadata. Furthermore a PDF showing this content is generated. Otherwise the content metadata is left empty and a PDF containing the following note is generated:
This document was archived by a default parser for Paperless-ngx.
original file name: $file_name
mime type: $mime_type
Download original file to work with it.
Prerequisites
This parser requires Gotenberg to be configured for Paperless-ngx.
Installation
-
Install using PyPI
pip install paperlessngx-default-parserFor docker based installations use custom container initialization as described here: https://docs.paperless-ngx.com/advanced_usage/#custom-container-initialization
Place a script with the following content in the directory for your container initialization scripts and make it executable:
#!/bin/bash pip install paperlessngx-default-parser -
Add this parser to the
PAPERLESS_APPSenvironment variable, e.g. in yourpaperless.conf:PAPERLESS_APPS="paperlessngx-default-parser.apps.DefaultParserConfig"
FAQ
Error: File type {mime-type} not supported
Paperless-ngx uses magic numbers to identify the mime type of a file which should be consumed/archived.
On the other hand Paperless-ngx currently requires a custom parser to define a dictionary of mime-types and one default extension per mime type it supports, see also Support for arbitrary binary files? #805 for a proposal to change this behaviour.
This default parser registers itself for all mime types defined in /etc/mime.types. It uses the first file extension defined in /etc/mime for a given mime type as the default extension for this mime type - or an empty string, if there is no extension defined at all.
Since the magic numbers database and /etc/mime.types don't have to be - and in fact are not - in sync, the following situation might occur:
Paperless-ngx identifies - by using magic numbers - a mime type which is not listed in /etc/mime.types. This results in the error File type {mime-type} not supported because the default parser could not register itself for this mime type.
Solution: Add the missing mime type to /etc/mime.types.
Error: Not consuming file {filepath}: Unknown file extension.
Paperless-ngx at the moment handles files differently if they are imported via the consumption directory or via UI.
When importing a file via UI, Paperless-ngx (solely) checks the mime type of the file using magic numbers and checks if there is a parser registered for this mime type.
When importing a file through the consumption directory an additional check is done at first:
Paperless-ngx collects all file extensions for the given mime type by looking at
- /etc/mime.types and
- the default extension a parser for this mime type declares.
A file in the consumption directory then is only consumed if its file extension matches one of theses extensions.
For example:
Given a file test.yaml which has mime type text/plain.
Importing via UI successfully archives the document. Importing the same document via the consumption directory leads to error Not consuming file /usr/src/paperless/consume/test.yaml: Unknown file extension.
Solution: Either import the file via UI or add the unknown file extension to the file extensions for this mime type in /etc/mime.types.
File extension when downloading original file
At the moment Paperless-ngx uses a default extension per mime type when downloading an original file.
For example: files of mime type application/octet-stream will get file extension .bin, those with mime-type text/plain will get extension .txt when downloaded.
Solution: In order to use a file with a program it originates of, you may therefore have to change the file extension of the downloaded file manually.
How to modify /etc/mime.types used by Paperless-ngx
For example:
- Add missing mime types using
add_missing_mime_types.sh(see examples there) - Create your own custom container initialization script to add/modify mime types.
- Use your own mime.types file and bind it to /etc/mime.types
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file paperlessngx_default_parser-2.0.0.tar.gz.
File metadata
- Download URL: paperlessngx_default_parser-2.0.0.tar.gz
- Upload date:
- Size: 18.6 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.12.3
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
9aa8d75c66e1994cdb588d31db1ea150e003aeb0ef7059e5b239f6e3400dcb13
|
|
| MD5 |
b9c1c2e81891af06dadb24bdbb0a4f92
|
|
| BLAKE2b-256 |
8aa21c22d664c4b686ba7c8f37a540e87cb3534192dcce687d01f1e5cdff2957
|
File details
Details for the file paperlessngx_default_parser-2.0.0-py3-none-any.whl.
File metadata
- Download URL: paperlessngx_default_parser-2.0.0-py3-none-any.whl
- Upload date:
- Size: 19.0 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.12.3
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
0bea625d5fc8144245f0b6c2816edd61f1c133aad81faab9df305f1dc916d49d
|
|
| MD5 |
a9361a300b1463c3ca51c1227821f4ca
|
|
| BLAKE2b-256 |
28fc225bcaf271c5e518fe38f5a0b9d159b8015d5dbbb892b4b2c92cb430648e
|