Skip to main content

A probability-based anime filename parser

Project description

Aniparse

A probability-based anime filename parser for Python.

Aniparse parses anime video filenames into structured metadata. Unlike regex-based approaches, it uses a scoring engine where confidence accumulates from multiple signals — position, context, keywords, and patterns — so it handles the wild variety of fansub naming conventions gracefully.

Based on the C++ library Anitomy, redesigned from the ground up in v2.

Installation

pip install aniparse

Usage

import aniparse

aniparse.parse('[TaigaSubs]_Toradora!_(2008)_-_01v2_-_Tiger_and_Dragon_[1280x720_H.264_FLAC][1234ABCD].mkv')
{
    'file_name': '[TaigaSubs]_Toradora!_(2008)_-_01v2_-_Tiger_and_Dragon_[1280x720_H.264_FLAC][1234ABCD].mkv',
    'audio_term': ['FLAC'],
    'file_extension': 'mkv',
    'file_checksum': '1234ABCD',
    'video_resolution': [{'video_height': 720, 'video_width': 1280}],
    'release_version': ['2'],
    'release_group': ['TaigaSubs'],
    'series': [{
        'title': 'Toradora!',
        'year': [{'number': 2008}],
        'episode': [{'number': 1, 'release_version': '2', 'title': 'Tiger and Dragon'}],
    }],
    'video_term': ['H.264'],
}

The parse function returns a dict with all identified metadata, or None if the input is empty.

Alternative titles

Pipe | is a first-class separator. Each segment after the first becomes an alternative series entry:

aniparse.parse('[TROLLORANGE] Hell Girl Season 4 (CR WEB-DL 1080p x264 AAC) | Hell Girl: Fourth Twilight')
{
    'file_name': '[TROLLORANGE] Hell Girl Season 4 (CR WEB-DL 1080p x264 AAC) | Hell Girl: Fourth Twilight',
    'audio_term': ['AAC'],
    'video_resolution': [{'video_height': 1080, 'scan_method': 'p'}],
    'release_group': ['TROLLORANGE'],
    'release_information': ['CR'],
    'series': [
        {'title': 'Hell Girl', 'season': [{'number': 4}]},
        {'title': 'Hell Girl: Fourth Twilight'},
    ],
    'source': ['WEB-DL'],
    'video_term': ['x264'],
}

Path and folder context

# Parse from a full path
aniparse.parse('', path='/anime/Toradora/[Group] Toradora! - 01.mkv')

# Or pass folder separately
aniparse.parse('[Group] Toradora! - 01.mkv', folder='/anime/Toradora')
# aniparse.parse('[Group] Toradora! - 01.mkv', folder='/anime/Toradora')
{
    'file_name': '[Group] Toradora! - 01.mkv',
    'file_extension': 'mkv',
    'release_group': ['Group'],
    'series': [
        {'title': 'Toradora!', 'episode': [{'number': 1}]},
    ],
    'folder_name': '/anime/Toradora',
}

# aniparse.parse('01 - surge.mkv', path='series s1+s2+s3/season1/01 - surge.mkv')
{
    'file_name': '01 - surge.mkv',
    'file_extension': 'mkv',
    'series': [{
        'episode': [{'number': 1, 'title': 'surge'}],
        'title': 'series',
        'season': [{'number': 1}],
    }],
    'folder_name': 'series s1+s2+s3/season1',
}

Custom instance

For repeated parsing with custom settings:

from aniparse import Aniparse, ParserConfig

parser = Aniparse(config=ParserConfig(fuzzy=True))
result = parser.parse('[Group] Title - 01 [1080p].mkv')

Custom keywords

Provide your own WordListManager to extend or replace the built-in keyword lists:

from aniparse import Aniparse, WordListManager

parser = Aniparse(wordlist_provider=my_wordlist_manager)
result = parser.parse(filename)

Debug mode

Pass debug=True to include the token scoring breakdown in the output:

aniparse.parse(filename, debug=True)

Output structure

The output is a flat dict with these top-level keys:

Key Type Description
file_name str Original input filename
file_extension str File extension
file_checksum str CRC32 checksum (e.g. 1234ABCD)
file_index int File index number
series list[SeriesInfo] Series metadata (title, episodes, seasons, etc.)
audio_term list[str] Audio codec terms (FLAC, AAC, etc.)
video_term list[str] Video codec terms (H.264, x265, etc.)
video_resolution list[VideoResolution] Resolution info (height, width, scan method)
source list[str] Source terms (Blu-ray, WEB-DL, etc.)
release_group list[str] Release group names
release_information list[str] Release info (BATCH, REMASTER, etc.)
release_version list[str] Version strings
language list[str] Language tags
subs_term list[str] Subtitle terms (Subbed, Hardsub, etc.)
device_compatibility list[str] Device compatibility tags

Each SeriesInfo contains:

Key Type Description
title str Series title
type str Series type (OVA, Movie, TV, Special, etc.)
year list[Sequence] Year(s)
season list[Sequence] Season number(s)
episode list[Sequence] Episode number(s), with optional title, release_version, part
volume list[Sequence] Volume number(s)
content_type list[Sequence] Content type (NCOP, NCED, PV, etc.) with optional identifier

Episode/season/volume entries support ranges (start/end), totals (number, total for "X of Y"), and alternatives.

Only present keys are included — None values are omitted.

Configuration

ParserConfig options:

Attribute Type Default Description
year_min int 1900 Minimum valid year
year_max int 2099 Maximum valid year
range_total set[str] {"of"} Connectors for "X of Y" patterns
range_separator set[str] {"-", "~", "&", "+"} Range delimiters
fuzzy bool False Enable fuzzy keyword matching
fuzzy_threshold float 0.8 Fuzzy match threshold

How does it work?

Aniparse processes filenames through a six-stage pipeline:

  1. Tokenize — Split input into tokens, detecting brackets, delimiters, and text boundaries
  2. Identify — Match tokens against keyword lists, assigning initial possibilities with base scores
  3. Expand — Pattern-based rules add new possibilities (checksums, numbers, years, titles, etc.)
  4. Score — Context-aware rules adjust confidence based on position, neighbors, brackets, and structural zones
  5. Resolve — Pick the winning possibility per token based on highest score
  6. Compose — Assemble tokens into the final metadata dict

This approach avoids hardcoded rules like "first bracket = release group". Instead, each token accumulates evidence from multiple signals, and the highest-confidence interpretation wins.

Why use Aniparse?

Anime filenames are notoriously inconsistent:

  • Element order varies between groups
  • Brackets and parentheses may be metadata containers or part of the title
  • Multiple delimiter styles coexist in a single filename
  • Numbers are ambiguous (episode? season? year? resolution?)

Regex-based parsers can't cover the combinatorial explosion of conventions. Aniparse's scoring approach handles tens of thousands of filenames with high accuracy.

Known limitations

  • Single-letter "E" episode prefix can be too aggressive in brackets
  • Number-dash-number in titles (e.g., 009-1) may be parsed as episode ranges
  • CJK language descriptors may be included in the title
  • Parenthesized alternative series after metadata may not be detected

License

Aniparse is licensed under Mozilla Public License 2.0.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

aniparse-2.0.0.tar.gz (88.5 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

aniparse-2.0.0-py3-none-any.whl (86.5 kB view details)

Uploaded Python 3

File details

Details for the file aniparse-2.0.0.tar.gz.

File metadata

  • Download URL: aniparse-2.0.0.tar.gz
  • Upload date:
  • Size: 88.5 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.12.9

File hashes

Hashes for aniparse-2.0.0.tar.gz
Algorithm Hash digest
SHA256 5630844597904415b5f0f9aa20d627f4eaa72599db0b89cdb044b4edbe47e801
MD5 6652297b90a2b8b96d186fac724c7a28
BLAKE2b-256 6e78593ec9b12630da151bf5e70bda6754d21445c11ed9799529bdda1e6569df

See more details on using hashes here.

Provenance

The following attestation bundles were made for aniparse-2.0.0.tar.gz:

Publisher: python-publish.yml on MeGaNeKoS/Aniparse

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file aniparse-2.0.0-py3-none-any.whl.

File metadata

  • Download URL: aniparse-2.0.0-py3-none-any.whl
  • Upload date:
  • Size: 86.5 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.12.9

File hashes

Hashes for aniparse-2.0.0-py3-none-any.whl
Algorithm Hash digest
SHA256 2ded80e9d0975a2bbd4bb099cf4a42cf098fad60bc363d170bbe42dc38978756
MD5 b77f4b0bd8c7407892a9f7932dc8fe40
BLAKE2b-256 5bd87e2bc8e830226f60093a6aec65e45db77ffae2e6c786c34a99b95e0c78c0

See more details on using hashes here.

Provenance

The following attestation bundles were made for aniparse-2.0.0-py3-none-any.whl:

Publisher: python-publish.yml on MeGaNeKoS/Aniparse

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page