GuessIt - a library for guessing information from video files.
Project description
GuessIt
GuessIt is a python library that extracts as much information as possible from a video file.
It has a very powerful filename matcher that allows to guess a lot of metadata from a video using its filename only. This matcher works with both movies and tv shows episodes.
For example, GuessIt can do the following:
$ guessit "Treme.1x03.Right.Place,.Wrong.Time.HDTV.XviD-NoTV.avi" For: Treme.1x03.Right.Place,.Wrong.Time.HDTV.XviD-NoTV.avi GuessIt found: { [1.00] "mimetype": "video/x-msvideo", [0.80] "episodeNumber": 3, [0.80] "videoCodec": "XviD", [1.00] "container": "avi", [1.00] "format": "HDTV", [0.70] "series": "Treme", [0.50] "title": "Right Place, Wrong Time", [0.80] "releaseGroup": "NoTV", [0.80] "season": 1, [1.00] "type": "episode" }
Install
Installing GuessIt is simple with pip:
$ pip install guessit
or, with easy_install:
$ easy_install guessit
But, you really shouldn’t do that.
You can now launch a demo:
$ guessit -d
and guess your own filename:
$ guessit "Breaking.Bad.S05E08.720p.MP4.BDRip.[KoTuWa].mkv" For: Breaking.Bad.S05E08.720p.MP4.BDRip.[KoTuWa].mkv GuessIt found: { [1.00] "mimetype": "video/x-matroska", [1.00] "episodeNumber": 8, [0.30] "container": "mkv", [1.00] "format": "BluRay", [0.70] "series": "Breaking Bad", [1.00] "releaseGroup": "KoTuWa", [1.00] "screenSize": "720p", [1.00] "season": 5, [1.00] "type": "episode" }
Filename matcher
The filename matcher is based on regular expressions and tree splitting to guess values from input filename.
It is able to find many properties, like title, year, series, episodeNumber, seasonNumber, videoCodec, screenSize, language. Guessed values are cleaned up and given in a readable format which may not match the raw filename.
DVDSCR will be guessed as format = DVD + other = Screener.
1920x1080 will be guessed as screenSize = 1080p.
DD5.1 will be guessed as audioCodec = DolbyDigital + audioChannel = 5.1.
Here’s the exhaustive list of properties that guessit can find:
Main properties
type
Type of the file.
unknown, movie, episode, moviesubtitle, episodesubtitle
title
Title of movie or episode.
container
Container of the file.
3g2, wmv, webm, mp4, avi, mp4a, mpeg, sub, mka, m4v, ts, mkv, ra, rm, wma, ass, mpg, ram, 3gp, ogv, mov, ogm, asf, divx, ogg, ssa, qt, idx, nfo, wav, flv, 3gp2, iso, mk2, srt
date
Date found in filename.
year
Year of movie (or episode).
releaseGroup
Name of (non)scene group that released the file.
website
Name of website contained in the filename.
Episode properties
series
Name of series.
season
Season number.
episodeNumber
Episode number.
episodeList
List of episode numbers if several were found.
note: If several are found, episodeNumber is the first item of this list.
seasonList
List of season numbers if several were found.
note: If several are found, seasonNumber is the first item of this list.
episodeCount
Total number of episodes.
seasonCount
Total number of seasons.
episodeDetails
Some details about the episode.
Bonus Oav Ova Omake Extras Unaired Special Pilot
episodeFormat
Episode format of the series.
Minisode
part
Part number of the episode.
version
Version of the episode.
In anime fansub scene, new versions are released with tag <episode>v[0-9].
Video properties
format
Format of the initial source
HDTV WEB-DL TV VOD BluRay DVD WEBRip Workprint Telecine VHS DVB Telesync HD-DVD PPV Cam
screenSize
Resolution of video. - 720p 1080p 1080i <width>x<height> 4K 360p 368p 480p 576p 900p
videoCodec Codec used for video.
h264 h265 DivX XviD Real Mpeg2
videoProfile Codec profile used for video.
8bit 10bit HP BP MP XP Hi422P Hi444PP
videoApi API used for the video.
DXVA
Audio properties
audioChannels
Number of channels for audio.
1.0 2.0 5.1 7.1
audioCodec Codec used for audio.
DTS TrueHD DolbyDigital AAC AC3 MP3 Flac
audioProfile The codec profile used for audio.
LC HQ HD HE HDMA
Localization properties
Country
Country(ies) of content. Often found in series, Shameless (US) for instance.
[<babelfish.Country>] (This class equals name and iso code)
Language
Language(s) of the audio soundtrack.
[<babelfish.Language>] (This class equals name and iso code)
subtitleLanguage
Language(s) of the subtitles.
[<babelfish.Language>] (This class equals name and iso code)
Other properties
bonusNumber
Bonus number.
bonusTitle
Bonus title.
cdNumber
CD number.
cdNumberTotal
Total number of CD.
crc32
CRC32 of the file.
idNumber
Volume identifier (UUID).
edition
Edition of the movie.
Special Edition, Collector Edition, Director's cut, Criterion Edition, Deluxe Edition
filmNumber
Film number of this movie.
filmSeries
Film series of this movie.
other
Other property will appear under this property.
Fansub, HR, HQ, Netflix, Screener, Unrated, HD, 3D, SyncFix, Bonus, WideScreen, Fastsub, R5, AudioFix, DDC, Trailer, Complete, Limited, Classic, Proper, DualAudio, LiNE
Other features
GuessIt also allows you to compute a whole lof of hashes from a file, namely all the ones you can find in the hashlib python module (md5, sha1, …), but also the Media Player Classic hash that is used (amongst others) by OpenSubtitles and SMPlayer, as well as the ed2k hash.
If you have the ‘guess-language’ python package installed, GuessIt can also analyze a subtitle file’s contents and detect which language it is written in.
If you have the ‘enzyme’ python package installed, GuessIt can also detect the properties from the actual video file metadata.
Usage
guessit can be use from command line:
$ guessit Usage: guessit [options] file1 [file2...] Options: -h, --help show this help message and exit -P SHOW_PROPERTY, --show-property=SHOW_PROPERTY Display the value of a single property (title, series, videoCodec, year, type ...) Naming: -t TYPE, --type=TYPE The suggested file type: movie, episode. If undefined, type will be guessed. -n, --name-only Parse files as name only. Disable folder parsing, extension parsing, and file content analysis. -c, --split-camel Split camel case part of filename. -Y, --date-year-first If short date is found, consider the first digits as the year. -D, --date-day-first If short date is found, consider the second digits as the day. -E, --episode-prefer-number Guess "serie.213.avi" as the episodeNumber 213. Without this option, it will be guessed as season 2, episodeNumber 13 -L ALLOWED_LANGUAGES, --allowed-languages=ALLOWED_LANGUAGES List of allowed languages. Separate languages codes with ";" -C ALLOWED_COUNTRIES, --allowed-countries=ALLOWED_COUNTRIES List of allowed countries. Separate country codes with ";" -S EXPECTED_SERIES, --expected-series=EXPECTED_SERIES List of expected series to parse. Separate series names with ";" -T EXPECTED_TITLE, --expected-title=EXPECTED_TITLE List of expected titles to parse. Separate title names with ";" -G EXPECTED_GROUP, --expected-group=EXPECTED_GROUP List of expected groups to parse. Separate group names with ";" --disabled-transformers=DISABLED_TRANSFORMERS List of transformers to disable. Separate transformers names with ";" Output: -v, --verbose Display debug output -a, --advanced Display advanced information for filename guesses, as json output -y, --yaml Display information for filename guesses as yaml output (like unit-test) -f INPUT_FILE, --input-file=INPUT_FILE Read filenames from an input file. -d, --demo Run a few builtin tests instead of analyzing a file Information: -p, --properties Display properties that can be guessed. -V, --values Display property values that can be guessed. -s, --transformers Display transformers that can be used. --version Display the guessit version. guessit.io: -b, --bug Submit a wrong detection to the guessit.io service Other features: -i INFO, --info=INFO The desired information type: filename, video, hash_mpc or a hash from python's hashlib module, such as hash_md5, hash_sha1, ...; or a list of any of them, comma-separated
It can also be used as a python module:
>>> from guessit import guess_file_info >>> guess_file_info('Treme.1x03.Right.Place,.Wrong.Time.HDTV.XviD-NoTV.avi') {u'mimetype': 'video/x-msvideo', u'episodeNumber': 3, u'videoCodec': u'XviD', u'container': u'avi', u'format': u'HDTV', u'series': u'Treme', u'title': u'Right Place, Wrong Time', u'releaseGroup': u'NoTV', u'season': 1, u'type': u'episode'}
Support
The project website for GuessIt is hosted at ReadTheDocs. There you will also find the User guide and Developer documentation.
This project is hosted on GitHub: https://github.com/wackou/guessit
Please report issues and/or feature requests via the bug tracker.
You can also report issues using the command-line tool:
$ guessit --bug "filename.that.fails.avi"
Contribute
GuessIt is under active development, and contributions are more than welcome!
Check for open issues or open a fresh issue to start a discussion around a feature idea or a bug. There is a Contributor Friendly tag for issues that should be ideal for people who are not very familiar with the codebase yet.
Fork the repository on Github to start making your changes to the master branch (or branch off of it).
Write a test which shows that the bug was fixed or that the feature works as expected.
Send a pull request and bug the maintainer until it gets merged and published. :)
License
GuessIt is licensed under the LGPLv3 license.
History
0.8.1 (unreleased)
Nothing changed yet.
0.8 (2014-07-06)
New webservice that allows to use GuessIt just by sending a POST request to the http://guessit.io/guess url
Command-line util can now report bugs to the http://guessit.io/bugs service by specifying the -b or --bug flag
GuessIt can now use the Enzyme python package to detect metadata out of the actual video file metadata instead of the filename
Finished transition to babelfish.Language and babelfish.Country
New property: duration which returns the duration of the video in seconds This requires the Enzyme package to work
New property: fileSize which returns the size of the file in bytes
Renamed property special to episodeDetails
Added support for Python 3.4
Optimization and bugfixes
0.7.1 (2014-03-03)
New property “special”: values can be trailer, pilot, unaired
New options for the guessit cmdline util: -y, --yaml outputs the result in yaml format and -n, --name-only analyzes the input as simple text (instead of filename)
Added properties formatters and validators
Removed support for python 3.2
A healthy amount of code cleanup/refactoring and fixes :)
0.7 (2014-01-29)
New plugin API that allows to register custom patterns / transformers
Uses Babelfish for language and country detection
Added Quality API to rate file quality from guessed property values
Better and more accurate overall detection
Added roman and word numeral detection
Added ‘videoProfile’ and ‘audioProfile’ property
Moved boolean properties to ‘other’ property value (‘is3D’ became ‘other’ = ‘3D’)
Added more possible values for various properties.
Added command line option to list available properties and values
Fixes for Python3 support
0.6.2 (2013-11-08)
Added support for nfo files
GuessIt can now output advanced information as json (‘-a’ on the command line)
Better language detection
Added new property: ‘is3D’
0.6.1 (2013-09-18)
New property “idNumber” that tries to identify a hash value or a serial number
The usual bugfixes
0.6 (2013-07-16)
Better packaging: unittests and doc included in source tarball
Fixes everywhere: unicode, release group detection, language detection, …
A few speed optimizations
0.5.4 (2013-02-11)
guessit can be installed as a system wide script (thanks @dplarson)
Enhanced logging facilities
Fixes for episode number and country detection
0.5.3 (2012-11-01)
GuessIt can now optionally act as a wrapper around the ‘guess-language’ python module, and thus provide detection of the natural language in which a body of text is written
Lots of fixes everywhere, mostly for properties and release group detection
0.5.2 (2012-10-02)
Much improved auto-detection of filetype
Fixed some issues with the detection of release groups
0.5.1 (2012-09-23)
now detects ‘country’ property; also detect ‘year’ property for series
more patterns and bugfixes
0.5 (2012-07-29)
Python3 compatibility
the usual assortment of bugfixes
0.4.2 (2012-05-19)
added Language.tmdb language code property for TheMovieDB
added ability to recognize list of episodes
bugfixes for Language.__nonzero__ and episode regexps
0.4.1 (2012-05-12)
bugfixes for unicode, paths on Windows, autodetection, and language issues
0.4 (2012-04-28)
much improved language detection, now also detect language variants
supports more video filetypes (thanks to Rob McMullen)
0.3.1 (2012-03-15)
fixed package installation from PyPI
better imports for the transformations (thanks Diaoul!)
some small language fixes
0.3 (2012-03-12)
fix to recognize 1080p format (thanks to Jonathan Lauwers)
0.3b2 (2012-03-02)
fixed the package installation
0.3b1 (2012-03-01)
refactored quite a bit, code is much cleaner now
fixed quite a few tests
re-vamped the documentation, wrote some more
0.2 (2011-05-27)
new parser/matcher completely replaced the old one
quite a few more unittests and fixes
0.2b1 (2011-05-20)
brand new parser/matcher that is much more flexible and powerful
lots of cleaning and a bunch of unittests
0.1 (2011-05-10)
fixed a few minor issues & heuristics
0.1b2 (2011-03-12)
Added PyPI trove classifiers
fixed version number in setup.py
0.1b1 (2011-03-12)
first pre-release version; imported from Smewt with a few enhancements already in there.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.