A probability-based anime filename parser
Project description
Aniparse
A probability-based anime filename parser for Python.
Aniparse parses anime video filenames into structured metadata. Unlike regex-based approaches, it uses a scoring engine where confidence accumulates from multiple signals — position, context, keywords, and patterns — so it handles the wild variety of fansub naming conventions gracefully.
Based on the C++ library Anitomy, redesigned from the ground up in v2.
Installation
pip install aniparse
Usage
import aniparse
aniparse.parse('[TaigaSubs]_Toradora!_(2008)_-_01v2_-_Tiger_and_Dragon_[1280x720_H.264_FLAC][1234ABCD].mkv')
{
'file_name': '[TaigaSubs]_Toradora!_(2008)_-_01v2_-_Tiger_and_Dragon_[1280x720_H.264_FLAC][1234ABCD].mkv',
'audio_term': ['FLAC'],
'file_extension': 'mkv',
'file_checksum': '1234ABCD',
'video_resolution': [{'video_height': 720, 'video_width': 1280}],
'release_version': ['2'],
'release_group': ['TaigaSubs'],
'series': [{
'title': 'Toradora!',
'year': [{'number': 2008}],
'episode': [{'number': 1, 'release_version': '2', 'title': 'Tiger and Dragon'}],
}],
'video_term': ['H.264'],
}
The parse function returns a dict with all identified metadata, or None if the input is empty.
Alternative titles
Pipe | is a first-class separator. Each segment after the first becomes an alternative series entry:
aniparse.parse('[TROLLORANGE] Hell Girl Season 4 (CR WEB-DL 1080p x264 AAC) | Hell Girl: Fourth Twilight')
{
'file_name': '[TROLLORANGE] Hell Girl Season 4 (CR WEB-DL 1080p x264 AAC) | Hell Girl: Fourth Twilight',
'audio_term': ['AAC'],
'video_resolution': [{'video_height': 1080, 'scan_method': 'p'}],
'release_group': ['TROLLORANGE'],
'release_information': ['CR'],
'series': [
{'title': 'Hell Girl', 'season': [{'number': 4}]},
{'title': 'Hell Girl: Fourth Twilight'},
],
'source': ['WEB-DL'],
'video_term': ['x264'],
}
Path and folder context
# Parse from a full path
aniparse.parse('', path='/anime/Toradora/[Group] Toradora! - 01.mkv')
# Or pass folder separately
aniparse.parse('[Group] Toradora! - 01.mkv', folder='/anime/Toradora')
# aniparse.parse('[Group] Toradora! - 01.mkv', folder='/anime/Toradora')
{
'file_name': '[Group] Toradora! - 01.mkv',
'file_extension': 'mkv',
'release_group': ['Group'],
'series': [
{'title': 'Toradora!', 'episode': [{'number': 1}]},
],
'folder_name': '/anime/Toradora',
}
# aniparse.parse('01 - surge.mkv', path='series s1+s2+s3/season1/01 - surge.mkv')
{
'file_name': '01 - surge.mkv',
'file_extension': 'mkv',
'series': [{
'episode': [{'number': 1, 'title': 'surge'}],
'title': 'series',
'season': [{'number': 1}],
}],
'folder_name': 'series s1+s2+s3/season1',
}
Custom instance
For repeated parsing with custom settings:
from aniparse import Aniparse, ParserConfig
parser = Aniparse(config=ParserConfig(fuzzy=True))
result = parser.parse('[Group] Title - 01 [1080p].mkv')
Custom keywords
Provide your own WordListManager to extend or replace the built-in keyword lists:
from aniparse import Aniparse, WordListManager
parser = Aniparse(wordlist_provider=my_wordlist_manager)
result = parser.parse(filename)
Debug mode
Pass debug=True to include the token scoring breakdown in the output:
aniparse.parse(filename, debug=True)
Output structure
The output is a flat dict with these top-level keys:
| Key | Type | Description |
|---|---|---|
file_name |
str |
Original input filename |
file_extension |
str |
File extension |
file_checksum |
str |
CRC32 checksum (e.g. 1234ABCD) |
file_index |
int |
File index number |
series |
list[SeriesInfo] |
Series metadata (title, episodes, seasons, etc.) |
audio_term |
list[str] |
Audio codec terms (FLAC, AAC, etc.) |
video_term |
list[str] |
Video codec terms (H.264, x265, etc.) |
video_resolution |
list[VideoResolution] |
Resolution info (height, width, scan method) |
source |
list[str] |
Source terms (Blu-ray, WEB-DL, etc.) |
release_group |
list[str] |
Release group names |
release_information |
list[str] |
Release info (BATCH, REMASTER, etc.) |
release_version |
list[str] |
Version strings |
language |
list[str] |
Language tags |
subs_term |
list[str] |
Subtitle terms (Subbed, Hardsub, etc.) |
device_compatibility |
list[str] |
Device compatibility tags |
Each SeriesInfo contains:
| Key | Type | Description |
|---|---|---|
title |
str |
Series title |
type |
str |
Series type (OVA, Movie, TV, Special, etc.) |
year |
list[Sequence] |
Year(s) |
season |
list[Sequence] |
Season number(s) |
episode |
list[Sequence] |
Episode number(s), with optional title, release_version, part |
volume |
list[Sequence] |
Volume number(s) |
content_type |
list[Sequence] |
Content type (NCOP, NCED, PV, etc.) with optional identifier |
Episode/season/volume entries support ranges (start/end), totals (number, total for "X of Y"), and alternatives.
Only present keys are included — None values are omitted.
Configuration
ParserConfig options:
| Attribute | Type | Default | Description |
|---|---|---|---|
year_min |
int |
1900 |
Minimum valid year |
year_max |
int |
2099 |
Maximum valid year |
range_total |
set[str] |
{"of"} |
Connectors for "X of Y" patterns |
range_separator |
set[str] |
{"-", "~", "&", "+"} |
Range delimiters |
fuzzy |
bool |
False |
Enable fuzzy keyword matching |
fuzzy_threshold |
float |
0.8 |
Fuzzy match threshold |
How does it work?
Aniparse processes filenames through a six-stage pipeline:
- Tokenize — Split input into tokens, detecting brackets, delimiters, and text boundaries
- Identify — Match tokens against keyword lists, assigning initial possibilities with base scores
- Expand — Pattern-based rules add new possibilities (checksums, numbers, years, titles, etc.)
- Score — Context-aware rules adjust confidence based on position, neighbors, brackets, and structural zones
- Resolve — Pick the winning possibility per token based on highest score
- Compose — Assemble tokens into the final metadata dict
This approach avoids hardcoded rules like "first bracket = release group". Instead, each token accumulates evidence from multiple signals, and the highest-confidence interpretation wins.
Why use Aniparse?
Anime filenames are notoriously inconsistent:
- Element order varies between groups
- Brackets and parentheses may be metadata containers or part of the title
- Multiple delimiter styles coexist in a single filename
- Numbers are ambiguous (episode? season? year? resolution?)
Regex-based parsers can't cover the combinatorial explosion of conventions. Aniparse's scoring approach handles tens of thousands of filenames with high accuracy.
Known limitations
- Single-letter "E" episode prefix can be too aggressive in brackets
- Number-dash-number in titles (e.g.,
009-1) may be parsed as episode ranges - CJK language descriptors may be included in the title
- Parenthesized alternative series after metadata may not be detected
License
Aniparse is licensed under Mozilla Public License 2.0.
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file aniparse-2.0.0.tar.gz.
File metadata
- Download URL: aniparse-2.0.0.tar.gz
- Upload date:
- Size: 88.5 kB
- Tags: Source
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.12.9
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
5630844597904415b5f0f9aa20d627f4eaa72599db0b89cdb044b4edbe47e801
|
|
| MD5 |
6652297b90a2b8b96d186fac724c7a28
|
|
| BLAKE2b-256 |
6e78593ec9b12630da151bf5e70bda6754d21445c11ed9799529bdda1e6569df
|
Provenance
The following attestation bundles were made for aniparse-2.0.0.tar.gz:
Publisher:
python-publish.yml on MeGaNeKoS/Aniparse
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
aniparse-2.0.0.tar.gz -
Subject digest:
5630844597904415b5f0f9aa20d627f4eaa72599db0b89cdb044b4edbe47e801 - Sigstore transparency entry: 983465355
- Sigstore integration time:
-
Permalink:
MeGaNeKoS/Aniparse@a95d0cee486db93ac5570bfd6136bc992920698e -
Branch / Tag:
refs/tags/v2.0.0 - Owner: https://github.com/MeGaNeKoS
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
python-publish.yml@a95d0cee486db93ac5570bfd6136bc992920698e -
Trigger Event:
release
-
Statement type:
File details
Details for the file aniparse-2.0.0-py3-none-any.whl.
File metadata
- Download URL: aniparse-2.0.0-py3-none-any.whl
- Upload date:
- Size: 86.5 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.12.9
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
2ded80e9d0975a2bbd4bb099cf4a42cf098fad60bc363d170bbe42dc38978756
|
|
| MD5 |
b77f4b0bd8c7407892a9f7932dc8fe40
|
|
| BLAKE2b-256 |
5bd87e2bc8e830226f60093a6aec65e45db77ffae2e6c786c34a99b95e0c78c0
|
Provenance
The following attestation bundles were made for aniparse-2.0.0-py3-none-any.whl:
Publisher:
python-publish.yml on MeGaNeKoS/Aniparse
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
aniparse-2.0.0-py3-none-any.whl -
Subject digest:
2ded80e9d0975a2bbd4bb099cf4a42cf098fad60bc363d170bbe42dc38978756 - Sigstore transparency entry: 983465361
- Sigstore integration time:
-
Permalink:
MeGaNeKoS/Aniparse@a95d0cee486db93ac5570bfd6136bc992920698e -
Branch / Tag:
refs/tags/v2.0.0 - Owner: https://github.com/MeGaNeKoS
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
python-publish.yml@a95d0cee486db93ac5570bfd6136bc992920698e -
Trigger Event:
release
-
Statement type: