Skip to main content

An anime video filename parser

Project description

Aniparse

Aniparse is a Python library for parsing anime video filenames. It's simple to use, and it's based on the C++ library Anitomy <https://github.com/erengy/anitomy> with a lot of improvement.

Example

The following filename

[TaigaSubs]_Toradora!_(2008)_-_01v2_-_Tiger_and_Dragon_[1280x720_H.264_FLAC][1234ABCD].mkv
Toradora! S01E03-Your Song.mkv

can be parsed using the following code:

import aniparse

aniparse.parse('[TaigaSubs]_Toradora!_(2008)_-_01v2_-_Tiger_and_Dragon_[1280x720_H.264_FLAC][1234ABCD].mkv')
{
    'anime_title': 'Toradora!',
    'anime_year': 2008,
    'audio_term': 'FLAC',
    'episode_number': 1,
    'episode_title': 'Tiger and Dragon',
    'file_checksum': '1234ABCD',
    'file_extension': 'mkv',
    'file_name': '[TaigaSubs]_Toradora!_(2008)_-_01v2_-_Tiger_and_Dragon_[1280x720_H.264_FLAC][1234ABCD].mkv',
    'release_group': 'TaigaSubs',
    'release_version': 2,
    'video_resolution': '1280x720',
    'video_term': 'H.264'
}

aniparse.parse("Toradora! S01E03-Your Song.mkv")
{
    'anime_season': 1,
    'anime_season_prefix': 'S',
    'anime_title': 'Toradora!',
    'episode_number': 3,
    'episode_prefix': 'E',
    'episode_title': 'Your Song',
    'file_extension': 'mkv',
    'file_name': 'Toradora! S01E03-Your Song.mkv'
}

The parse function receives a string and returns a dictionary containing all found elements. It can also receive parsing options and keyword_manager, this will be explained below.

How does it work?

Suppose that we're working on the following filename:

"Aim_For_The_Top!_Gunbuster-ep1.BD(H264.FLAC.10bit)[KAA][69ECCDCF].mkv"

The filename is first stripped off of its extension and split into groups. Groups are determined by the position of brackets:

"Aim_For_The_Top!_Gunbuster-ep1.BD", "H264.FLAC.10bit", "KAA", "69ECCDCF"

Each group is then split into tokens. In our current example, the delimiter for the enclosed group is ., while the words in other groups are separated by _:

"Aim", "For", "The", "Top!", "Gunbuster-ep1", "BD", "H264", "FLAC", "10bit", "KAA", "69ECCDCF"

Note: the brackets and delimiter are stored as token with category Delimiter and Bracket. And each token remembers if it enclosed or not.

Once the tokenizer is done, the parser comes into effect. First, all tokens are compared against a set of known keywords. In this case, the tokens BD, H264, FLAC, 10bit, and 69ECCDCF are recognized as keywords, and are assigned the category Source, VideoTerm, AudioTerm, VideoResolution, and FileChecksum respectively.

"Aim", "For", "The", "Top!", "Gunbuster-ep1", "KAA"

The next step is to look for the episode number. Each token that contains a number is analyzed. Here. Gunbuster-ep1 contains number, but it doesn't match the episode number pattern. In this case, the token checked againts buggy dash pattern. So, Gunbuster-ep1 will be split into Gunbuster and ep1. After that, it will check and ep1 is recognized as an episode number. The category EpisodeNumber is assigned to it and the changes is saved.

"Aim", "For", "The", "Top!", "Gunbuster", "KAA"

The next step is to look for the anime title. The parser will try to find unknown token before the episode number and not inside a bracket. In this case, Aim, For, The, Top!, and Gunbuster are unknown tokens, they are not inside a bracket, so it assigned to the AnimeTitle category.

"KAA"

the next step is to look for the release group. The parser will try to find unknown token after the episode number and inside a bracket. In this case, KAA is unknown token, and it inside a bracket, so it assigned to the ReleaseGroup category.

</code></pre>
<p>the next step is to look for the episode title. The parser will try to find unknown token after the episode number and
not inside a bracket.
In this case, no more unknown token left, so it leave it empty</p>
<pre lang="text"><code>

lastly, the parser will try to find any unknown token and assign it to each category or to Others if it is not recognized.

Why should I use it?

Anime video files are commonly named in a format where the anime title is followed by the episode number, and all the technical details are enclosed within brackets. However, fansub groups tend to use their own naming conventions, and the problem is more complicated than it first appears:

Element order is not always the same.
Technical information is not guaranteed to be enclosed.
Brackets and parentheses may be grouping symbols or a part of the anime/episode title.
Space and underscore are not the only delimiters in use.
A single filename may contain multiple delimiters.

There are so many cases to cover that it's simply not possible to parse all filenames solely with regular expressions. Aniparse tries a different approach, and it succeeds: It's able to parse tens of thousands of filenames, with great accuracy.

Are there any exceptions?

Yes, unfortunately. Aniparse fails to identify the anime title and episode number on rare occasions, mostly due to bad naming conventions. See the examples below.

Arigatou.Shuffle!.Ep08.[x264.AAC][D6E43829].mkv

Here, Aniparse would report that this file is the 8th episode of Arigatou Shuffle!, where Arigatou is actually the name of the fansub group.

Spice and Wolf 2

Is this the 2nd episode of Spice and Wolf, or a batch release of Spice and Wolf 2? with a text after number, there's no way to know. It's up to you consider both cases.

Suggestions to fansub groups

Please consider abiding by these simple rules before deciding on your naming convention:

  • Don't enclose anime title, episode number and episode title within brackets. Enclose everything else, including the name of your group.
  • Don't use parentheses to enclose release information; use square brackets instead. Parentheses should only be used if they are a part of the anime/episode title.
  • Don't use multiple delimiters in a single filename. If possible, stick with either space or underscore.
  • Use a separator (e.g. a dash) between anime title and episode number. There are anime titles that end with a number, which creates ambiguity.
  • Indicate the episode interval in batch releases.

Installation

To install Aniparse, simply use pip:

pip install aniparse

Or download the source code and inside the source code's folder run:

python setup.py install

Options

The parse function can receive the options parameter. E.g.:

import aniparse

aniparse_options = {'allowed_delimiters': ' '}
aniparse.parse('DRAMAtical Murder Episode 1 - Data_01_Login', options=aniparse_options)
{
    'anime_title': 'DRAMAtical Murder',
    'episode_prefix': 'Episode',
    'episode_number': '1',
    'episode_title': 'Data_01_Login',
    'file_name': 'DRAMAtical Murder Episode 1 - Data_01_Login'
}

If the default options had been used, the parser would have considered _ as a delimiter and replaced it with space in the episode title.

The options contain the following attributes:

Attribute name Type Description Default value
allowed_delimiters string The list of character to be considered as delimiters. ' _.&+,|'
check_title_enclosed boolean Check the anime title in enclosed if no title found True
eps_lower_than_alt boolean Set episode number to the lowest and the alt to be the highest True
ignored_dash boolean If the dash in anime/episode title should be ignored or not. True
ignored_strings list of strings A list of strings to be removed from the filename during parse. []
keep_delimiters boolean If the delimiters should be kept or not in anime/episode title. False
max_extension_length integer Maximum extension length. 4
title_before_episode boolean If the anime title should be before the episode number or not. True

License

Aniparse is licensed under Mozilla Public License 2.0.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

aniparse-1.0.tar.gz (42.2 kB view details)

Uploaded Source

Built Distribution

aniparse-1.0-py3-none-any.whl (43.5 kB view details)

Uploaded Python 3

File details

Details for the file aniparse-1.0.tar.gz.

File metadata

  • Download URL: aniparse-1.0.tar.gz
  • Upload date:
  • Size: 42.2 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.1 CPython/3.9.15

File hashes

Hashes for aniparse-1.0.tar.gz
Algorithm Hash digest
SHA256 66b0fe934bebd5c7027fb323f57a487fa2fcd322ce199b845bdcb1b6afc29065
MD5 2326cc68edbbdc6523144baf75d907e9
BLAKE2b-256 29340a4f711e7fed48b3cc66ba01f1c85623b9520206fe03564c7f9aea6902da

See more details on using hashes here.

File details

Details for the file aniparse-1.0-py3-none-any.whl.

File metadata

  • Download URL: aniparse-1.0-py3-none-any.whl
  • Upload date:
  • Size: 43.5 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.1 CPython/3.9.15

File hashes

Hashes for aniparse-1.0-py3-none-any.whl
Algorithm Hash digest
SHA256 282efb210bfe01972b97f0635807c98f517604793f80327cc4f2214b640ce174
MD5 d2682d15795f5cb9d7e212dfd1a6b278
BLAKE2b-256 ad52be610e7f4e027154e1ad4932297a1d81101fcb3be3d7e0907b4255d32c47

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page