Skip to main content

Simple minded facilities for media information inferred from filenames. This contains mostly lexical functions for extracting information from strings or constructing media filenames from metadata and a few classes like `EpisodeInfo` and `SeriesEpisodeInfo` for common descriptions.

Project description

Simple minded facilities for media information inferred from filenames. This contains mostly lexical functions for extracting information from strings or constructing media filenames from metadata and a few classes like EpisodeInfo and SeriesEpisodeInfo for common descriptions.

Latest release 20260531: Bugfix a parse return value, small doc updates.

The default filename parsing rules are based on my personal convention, which is to name media files as:

series_name--episode_info--title--source--etc-etc.ext

where the components are:

  • series_name: the programme series name downcased and with whitespace replaced by dashes; in the case of standalone items like movies this is often the studio.
  • episode_info: a structured field with episode information: sn is a series/season, enis an episode number within the season,x_n_ is a "extra" - addition material supplied with the season, etc.
  • title: the episode title downcased and with whitespace replaced by dashes.
  • source: the source of the media.
  • ext: filename extension such as mp4.

As you may imagine, as a rule I dislike mixed case filenames and filenames with embedded whitespace. I also like a media filename to contain enough information to identify the file contents in a compact and human readable form.

Short summary:

  • EpisodeDatumDefn: An EpisodeInfo marker definition with the following components: - name: the marker name, such as "series" or "episode" - prefix: the stub used in a filename, such as "s" or "e" - re: a regular expression to match the prefix an some digits.
  • EpisodeInfo: Trite class for episodic information, used to store, match or transcribe series/season, episode, etc values.
  • main: Main command line running some test code.
  • parse_name: Parse the descriptive part of a filename (the portion remaining after stripping the file extension) and yield (part,fields) for each part as delineated by sep.
  • part_to_title: Convert a filename part into a title string.
  • pathname_info: Parse information from the basename of a file pathname. Return a mapping of field => values in the order parsed.
  • scrub_title: Strip redundant text from the start of an episode title.
  • SeriesEpisodeInfo: Episode information from a TV series episode.
  • title_to_part: Convert a title string into a filename part. This is lossy; the part_to_title function cannot completely reverse this.

Module contents:

  • class EpisodeDatumDefn(EpisodeDatumDefn): An EpisodeInfo marker definition with the following components:
    • name: the marker name, such as "series" or "episode"
    • prefix: the stub used in a filename, such as "s" or "e"
    • re: a regular expression to match the prefix an some digits

EpisodeDatumDefn.parse(self, s, offset=0): Parse an episode datum from a string, return the value and new offset. Raise ValueError if the string doesn't match this definition.

Parameters:

  • s: the string
  • offset: parse offset, default 0
  • class EpisodeInfo(types.SimpleNamespace): Trite class for episodic information, used to store, match or transcribe series/season, episode, etc values.

EpisodeInfo.__getitem__(self, name): We can look up values by name.

EpisodeInfo.as_dict(self): Return the episode info as a dict.

EpisodeInfo.as_tags(self, prefix=None): Generator yielding the episode info as Tags.

EpisodeInfo.from_filename_part(s, offset=0): Factory to return an EpisodeInfo from a filename episode field.

Parameters:

  • s: the string containing the episode information
  • offset: the start of the episode information, default 0

The episode information must extend to the end of the string because the factory returns just the information. See the parse_filename_part class method for the core parse.

EpisodeInfo.get(self, name, default=None): Look up value by name with default.

EpisodeInfo.parse_filename_part(s, offset=0): Parse episode information from a string, returning the matched fields and the new offset.

Parameters: s: the string containing the episode information. offset: the starting offset of the information, default 0.

EpisodeInfo.season: .season property, synonym for .series

  • main(argv=None): Main command line running some test code.

  • parse_name(name, sep='--'): Parse the descriptive part of a filename (the portion remaining after stripping the file extension) and yield (part,fields) for each part as delineated by sep.

  • part_to_title(part): Convert a filename part into a title string.

    Example:

    >>> part_to_title('episode-name')
    'Episode Name'
    
  • pathname_info(pathname): Parse information from the basename of a file pathname. Return a mapping of field => values in the order parsed.

  • scrub_title(title: str, *, season=None, episode=None) -> str: Strip redundant text from the start of an episode title.

    I frequently get "title" strings with leading season/episode information. This function cleans up these strings to return the unadorned title.

  • class SeriesEpisodeInfo(cs.deco.Promotable): Episode information from a TV series episode.

SeriesEpisodeInfo.as_dict(self): Return the non-None values as a dict. Note that this uses dataclasses.asdict() and as such is a deep copy.

SeriesEpisodeInfo.from_str(episode_title: str, series=None): Infer a SeriesEpisodeInfo from an episode title.

This recognises the common 'sSSeEE - Episode Title' format and variants like Series Name - sSSeEE - Episode Title' or 'sSSeEE - Episode Title - Part: One'.

  • title_to_part(title): Convert a title string into a filename part. This is lossy; the part_to_title function cannot completely reverse this.

    Example:

    >>> title_to_part('Episode Name')
    'episode-name'
    

Release Log

Release 20260531: Bugfix a parse return value, small doc updates.

Release 20240519: Initial PyPI release, particularly for SeriesEpisodeInfo which I use in cs.app.playon.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

cs_mediainfo-20260531.tar.gz (6.1 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

cs_mediainfo-20260531-py2.py3-none-any.whl (7.7 kB view details)

Uploaded Python 2Python 3

File details

Details for the file cs_mediainfo-20260531.tar.gz.

File metadata

  • Download URL: cs_mediainfo-20260531.tar.gz
  • Upload date:
  • Size: 6.1 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.13.1

File hashes

Hashes for cs_mediainfo-20260531.tar.gz
Algorithm Hash digest
SHA256 a1a5c99883066ace72bbc657d7f2bc32d0c8d151e540c3714d5f4e678cec402c
MD5 28635c390c8cbdcc52c92913fee2da1f
BLAKE2b-256 1af810e7dcf73c9b857be92b80ae99e0edb27902fbbc1ef7ba6db4209f07e2b7

See more details on using hashes here.

File details

Details for the file cs_mediainfo-20260531-py2.py3-none-any.whl.

File metadata

File hashes

Hashes for cs_mediainfo-20260531-py2.py3-none-any.whl
Algorithm Hash digest
SHA256 7737c786eb3c72cfd2ba1dce51496121773d3af47bdb765f62de6c42d6612c62
MD5 f64dc5f8194b17a23548dccc59788060
BLAKE2b-256 68b92fb900c54f3c2c0aa67f5ccff3ef529a8f60a80dee14b4d36a439b9308a1

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page