Skip to main content

GuessIt - a library for guessing information from video files.

Project description

GuessIt
=======

.. image:: http://img.shields.io/pypi/v/guessit.svg
:target: https://pypi.python.org/pypi/guessit
:alt: Latest Version

.. image:: http://img.shields.io/badge/license-LGPLv3-blue.svg
:target: https://pypi.python.org/pypi/guessit
:alt: License

.. image:: http://img.shields.io/travis/wackou/guessit.svg
:target: http://travis-ci.org/wackou/guessit
:alt: Build Status

.. image:: http://img.shields.io/coveralls/wackou/guessit.svg
:target: https://coveralls.io/r/wackou/guessit?branch=master
:alt: Coveralls


GuessIt is a python library that extracts as much information as
possible from a video file.

It has a very powerful filename matcher that allows to guess a lot of
metadata from a video using its filename only. This matcher works with
both movies and tv shows episodes.

For example, GuessIt can do the following::

$ guessit "Treme.1x03.Right.Place,.Wrong.Time.HDTV.XviD-NoTV.avi"
For: Treme.1x03.Right.Place,.Wrong.Time.HDTV.XviD-NoTV.avi
GuessIt found: {
[1.00] "mimetype": "video/x-msvideo",
[0.80] "episodeNumber": 3,
[0.80] "videoCodec": "XviD",
[1.00] "container": "avi",
[1.00] "format": "HDTV",
[0.70] "series": "Treme",
[0.50] "title": "Right Place, Wrong Time",
[0.80] "releaseGroup": "NoTV",
[0.80] "season": 1,
[1.00] "type": "episode"
}


Install
-------

Installing GuessIt is simple with `pip <http://www.pip-installer.org/>`_::

$ pip install guessit

or, with `easy_install <http://pypi.python.org/pypi/setuptools>`_::

$ easy_install guessit

But, you really `shouldn't do that <http://www.pip-installer.org/en/latest/other-tools.html#pip-compared-to-easy-install>`_.

You can now launch a demo::

$ guessit -d

and guess your own filename::

$ guessit "Breaking.Bad.S05E08.720p.MP4.BDRip.[KoTuWa].mkv"
For: Breaking.Bad.S05E08.720p.MP4.BDRip.[KoTuWa].mkv
GuessIt found: {
[1.00] "mimetype": "video/x-matroska",
[1.00] "episodeNumber": 8,
[0.30] "container": "mkv",
[1.00] "format": "BluRay",
[0.70] "series": "Breaking Bad",
[1.00] "releaseGroup": "KoTuWa",
[1.00] "screenSize": "720p",
[1.00] "season": 5,
[1.00] "type": "episode"
}



Filename matcher
----------------

The filename matcher is based on pattern matching and is able to recognize many properties from the filename,
like ``title``, ``year``, ``series``, ``episodeNumber``, ``seasonNumber``,
``videoCodec``, ``screenSize``, ``language``. Guessed values are cleaned up and given in a readable format
which may not match exactly the raw filename.

The full list of available properties can be seen in the
`main documentation <http://guessit.readthedocs.org/en/latest/user/properties.html>`_.


Other features
--------------

GuessIt also allows you to compute a whole lof of hashes from a file,
namely all the ones you can find in the hashlib python module (md5,
sha1, ...), but also the Media Player Classic hash that is used (amongst
others) by OpenSubtitles and SMPlayer, as well as the ed2k hash.

If you have the 'guess-language' python package installed, GuessIt can also
analyze a subtitle file's contents and detect which language it is written in.

If you have the 'enzyme' python package installed, GuessIt can also detect the
properties from the actual video file metadata.


Usage
-----

guessit can be use from command line::

$ guessit
usage: guessit [-h] [-t TYPE] [-n] [-c] [-X DISABLED_TRANSFORMERS] [-v]
[-P SHOW_PROPERTY] [-u] [-a] [-y] [-f INPUT_FILE] [-d] [-p]
[-V] [-s] [--version] [-b] [-i INFO] [-S EXPECTED_SERIES]
[-T EXPECTED_TITLE] [-Y] [-D] [-L ALLOWED_LANGUAGES] [-E]
[-C ALLOWED_COUNTRIES] [-G EXPECTED_GROUP]
[filename [filename ...]]

positional arguments:
filename Filename or release name to guess

optional arguments:
-h, --help show this help message and exit

Naming:
-t TYPE, --type TYPE The suggested file type: movie, episode. If undefined,
type will be guessed.
-n, --name-only Parse files as name only. Disable folder parsing,
extension parsing, and file content analysis.
-c, --split-camel Split camel case part of filename.
-X DISABLED_TRANSFORMERS, --disabled-transformer DISABLED_TRANSFORMERS
Transformer to disable (can be used multiple time)
-S EXPECTED_SERIES, --expected-series EXPECTED_SERIES
Expected series to parse (can be used multiple times)
-T EXPECTED_TITLE, --expected-title EXPECTED_TITLE
Expected title (can be used multiple times)
-Y, --date-year-first
If short date is found, consider the first digits as
the year.
-D, --date-day-first If short date is found, consider the second digits as
the day.
-L ALLOWED_LANGUAGES, --allowed-languages ALLOWED_LANGUAGES
Allowed language (can be used multiple times)
-E, --episode-prefer-number
Guess "serie.213.avi" as the episodeNumber 213.
Without this option, it will be guessed as season 2,
episodeNumber 13
-C ALLOWED_COUNTRIES, --allowed-country ALLOWED_COUNTRIES
Allowed country (can be used multiple times)
-G EXPECTED_GROUP, --expected-group EXPECTED_GROUP
Expected release group (can be used multiple times)

Output:
-v, --verbose Display debug output
-P SHOW_PROPERTY, --show-property SHOW_PROPERTY
Display the value of a single property (title, series,
videoCodec, year, type ...)
-u, --unidentified Display the unidentified parts.
-a, --advanced Display advanced information for filename guesses, as
json output
-y, --yaml Display information for filename guesses as yaml
output (like unit-test)
-f INPUT_FILE, --input-file INPUT_FILE
Read filenames from an input file.
-d, --demo Run a few builtin tests instead of analyzing a file

Information:
-p, --properties Display properties that can be guessed.
-V, --values Display property values that can be guessed.
-s, --transformers Display transformers that can be used.
--version Display the guessit version.

guessit.io:
-b, --bug Submit a wrong detection to the guessit.io service

Other features:
-i INFO, --info INFO The desired information type: filename, video,
hash_mpc or a hash from python's hashlib module, such
as hash_md5, hash_sha1, ...; or a list of any of them,
comma-separated


It can also be used as a python module::

>>> from guessit import guess_file_info
>>> guess_file_info('Treme.1x03.Right.Place,.Wrong.Time.HDTV.XviD-NoTV.avi')
{u'mimetype': 'video/x-msvideo', u'episodeNumber': 3, u'videoCodec': u'XviD', u'container': u'avi', u'format': u'HDTV', u'series': u'Treme', u'title': u'Right Place, Wrong Time', u'releaseGroup': u'NoTV', u'season': 1, u'type': u'episode'}


Support
-------

The project website for GuessIt is hosted at `ReadTheDocs <http://guessit.readthedocs.org/>`_.
There you will also find the User guide and Developer documentation.

This project is hosted on GitHub: `<https://github.com/wackou/guessit>`_

Please report issues and/or feature requests via the `bug tracker <https://github.com/wackou/guessit/issues>`_.

You can also report issues using the command-line tool::

$ guessit --bug "filename.that.fails.avi"


Contribute
----------

GuessIt is under active development, and contributions are more than welcome!

#. Check for open issues or open a fresh issue to start a discussion around a feature idea or a bug.
There is a Contributor Friendly tag for issues that should be ideal for people who are not very
familiar with the codebase yet.
#. Fork `the repository`_ on Github to start making your changes to the **master**
branch (or branch off of it).
#. Write a test which shows that the bug was fixed or that the feature works as expected.
#. Send a pull request and bug the maintainer until it gets merged and published. :)

.. _the repository: https://github.com/wackou/guessit

License
-------

GuessIt is licensed under the `LGPLv3 license <http://www.gnu.org/licenses/lgpl.html>`_.


History
=======

0.10.4 (2015-08-19)
-------------------
* Added ``LD``/``MD`` properties
* Added better support for ``episodeList``
* Added more rules for filetype autodetection
* Added support for ``episodeList`` on weak episode patterns
* Added ``partList`` property (list for ``part`` property)
* Added vob to supported file extensions
* Added more ignore words to language detection
* Added string options support for API methods (will be parsed like command-line)
* Added better subtitle detection (prefix priority over suffix)
* Fixed ``version`` property no detected when detached from ``episodeNumber``
* Fixed ``releaseGroup`` property no detected when prefixed by ``screenSize``
* Fixed single digit detected as an ``episodeNumber``
* Fixed an internal issue in matcher causing absolute and relative group spans confusion
* Fixed an internal issue in properties container causing invalid ordering of found patterns
* Fixed raw value for some properties (--advanced)
* Use pytest as test runner
* Remove support for python 2.6


0.10.3 (2015-04-04)
-------------------

* Fix issues related to unicode encoding/decoding
* Fix possible crashes in guess_video_rexps
* Fix invalid guess result when crc32 contains 6 digits than can be parsed as a date


0.10.2 (2015-03-08)
-------------------

* Use common words to resolve conflicts on strings
* Bump babelfish version
* Fix setuptools deprecation warning
* Package argparse dependency only if python<2.7


0.10.1 (2015-01-05)
-------------------

* Avoid word Stay to be recognized as AY subtitle
* Fixed exception when no unidentified leaves remains
* Avoid usage of deprecated EntryPoint.load() require argument
* Fixed invalid raw data for some properties (title, series and maybe others)


0.10.0 (2014-12-27)
-------------------
* Fixed exception when serie title starts with Ep
* Fixed exception when trying to parse a full length country name
* Removed deprecated optparse module, replaced by argparse


0.9.4 (2014-11-10)
------------------

* Fixed exception when filename contains multiple languages ISO codes
* Fixed transformers initialization logging
* Fixed possible exception in language transformer
* Added more words to common english words


0.9.3 (2014-09-14)
------------------

* Added ``Preair`` and ``Remux`` to ``other`` property
* Better detection of ``audioProfile`` = ``HD`` / ``HDMA`` for ``audioCodec`` = ``DTS``
* Better detection of ``format``` = ``BluRay`` (when followed by Rip)
* Recognize ``RC`` as ``R5``
* Recognize ``WEB-HD```and ``ẀEB`` as ``WEB-DL``


0.9.2 (2014-09-13)
------------------

* Added support of option registration on transformers
* Better detection of ``releaseGroup`` when using ``expected-series`` or ``expected-title`` option
* Better ``audioChannel`` = ``5.1`` / ``7.1`` guessing (``6ch``, ``8ch``)
* Fixed usage not showing when invalid options were passed
* Added ``PAL``, ``SECAM`` and ``NTSC`` to ``other`` possible values
* Recognize DVD-9 and DVD-5 as ``format`` = ``DVD`` property


0.9.1 (2014-09-06)
------------------

* Added ``--unidentified`` option to display unidentified parts of the filename
This option affects command line only - From API `unidentified` properties will
always be grabbed regardless this settings
* Better guessing of ``releaseGroup`` property
* Added ``mHD`` and ``HDLight`` to ``other properties``
* Better guessing of ``format`` = ``DVD`` property (DVD-R pattern)
* Some ``info`` logs changed to ``debug`` for quiet integration
* Small fixes


0.9.0 (2014-09-05)
------------------

* Better auto-detection of anime episodes, containing a ``crc32`` or a digits ``episodeNumber``.
* Better listing of options on ``guessit -h``
* Added ``--allowed-countries`` and ``--allowed-languages`` to avoid two or three
letters words to be guessed as ``country`` or ``language``
* Added ``--disabled-transformers`` option to disable transformer plugin at runtime.
* Added ``--episode-prefer-number`` option, for ``guess -t episode 'serie.123.avi'``
to return ``episodeNumber`` = ``123`` instead of ``season`` = ``1`` + ``episodeNumber`` = 23``
* Added ``--split-camel`` option (now disabled by default)
* Added ``episodeCount`` and ``seasonCount`` properties (x-of-n notation)
* Added ``--date-year-first``` and ``--date-day-first`` options
* Added ``--expected-title``, ``--expected-series`` and ``--expected-groups``
to help finding values when those properties are known
* Added ``10bit`` value to ``videoProfile``
* Added ``--show-property`` option to only show a single property
* Added ``--input-file`` option to parse a list of
* Added ``--version`` option
* Added ``ass```to subtitle extensions
* Added ``Fansub`` value for ``other`` property
* Added more date formats support with ``dateutil`` dependency
* Added customizable ``clean_function`` (API)
* Added ``default_options`` (API)
* Fixed ``--yaml`` option to support ``language`` and ``country``
* Fixed ``transformers.add_transformer()`` function (API)


0.8 (2014-07-06)
----------------

* New webservice that allows to use GuessIt just by sending a POST request to
the http://guessit.io/guess url
* Command-line util can now report bugs to the http://guessit.io/bugs service
by specifying the ``-b`` or ``--bug`` flag
* GuessIt can now use the Enzyme python package to detect metadata out of the
actual video file metadata instead of the filename
* Finished transition to ``babelfish.Language`` and ``babelfish.Country``
* New property: ``duration`` which returns the duration of the video in seconds
This requires the Enzyme package to work
* New property: ``fileSize`` which returns the size of the file in bytes
* Renamed property ``special`` to ``episodeDetails``
* Added support for Python 3.4
* Optimization and bugfixes


0.7.1 (2014-03-03)
------------------

* New property "special": values can be trailer, pilot, unaired
* New options for the guessit cmdline util: ``-y``, ``--yaml`` outputs the
result in yaml format and ``-n``, ``--name-only`` analyzes the input as simple
text (instead of filename)
* Added properties formatters and validators
* Removed support for python 3.2
* A healthy amount of code cleanup/refactoring and fixes :)


0.7 (2014-01-29)
----------------

* New plugin API that allows to register custom patterns / transformers
* Uses Babelfish for language and country detection
* Added Quality API to rate file quality from guessed property values
* Better and more accurate overall detection
* Added roman and word numeral detection
* Added 'videoProfile' and 'audioProfile' property
* Moved boolean properties to 'other' property value ('is3D' became 'other' = '3D')
* Added more possible values for various properties.
* Added command line option to list available properties and values
* Fixes for Python3 support


0.6.2 (2013-11-08)
------------------

* Added support for nfo files
* GuessIt can now output advanced information as json ('-a' on the command line)
* Better language detection
* Added new property: 'is3D'


0.6.1 (2013-09-18)
------------------

* New property "idNumber" that tries to identify a hash value or a
serial number
* The usual bugfixes


0.6 (2013-07-16)
----------------

* Better packaging: unittests and doc included in source tarball
* Fixes everywhere: unicode, release group detection, language detection, ...
* A few speed optimizations


0.5.4 (2013-02-11)
------------------

* guessit can be installed as a system wide script (thanks @dplarson)
* Enhanced logging facilities
* Fixes for episode number and country detection


0.5.3 (2012-11-01)
------------------

* GuessIt can now optionally act as a wrapper around the 'guess-language' python
module, and thus provide detection of the natural language in which a body of
text is written

* Lots of fixes everywhere, mostly for properties and release group detection


0.5.2 (2012-10-02)
------------------

* Much improved auto-detection of filetype
* Fixed some issues with the detection of release groups


0.5.1 (2012-09-23)
------------------

* now detects 'country' property; also detect 'year' property for series
* more patterns and bugfixes


0.5 (2012-07-29)
----------------

* Python3 compatibility
* the usual assortment of bugfixes


0.4.2 (2012-05-19)
------------------

* added Language.tmdb language code property for TheMovieDB
* added ability to recognize list of episodes
* bugfixes for Language.__nonzero__ and episode regexps


0.4.1 (2012-05-12)
------------------

* bugfixes for unicode, paths on Windows, autodetection, and language issues


0.4 (2012-04-28)
----------------

* much improved language detection, now also detect language variants
* supports more video filetypes (thanks to Rob McMullen)


0.3.1 (2012-03-15)
------------------

* fixed package installation from PyPI
* better imports for the transformations (thanks Diaoul!)
* some small language fixes

0.3 (2012-03-12)
----------------

* fix to recognize 1080p format (thanks to Jonathan Lauwers)

0.3b2 (2012-03-02)
------------------

* fixed the package installation

0.3b1 (2012-03-01)
------------------

* refactored quite a bit, code is much cleaner now
* fixed quite a few tests
* re-vamped the documentation, wrote some more

0.2 (2011-05-27)
----------------

* new parser/matcher completely replaced the old one
* quite a few more unittests and fixes


0.2b1 (2011-05-20)
------------------

* brand new parser/matcher that is much more flexible and powerful
* lots of cleaning and a bunch of unittests


0.1 (2011-05-10)
----------------

* fixed a few minor issues & heuristics


0.1b2 (2011-03-12)
------------------

* Added PyPI trove classifiers
* fixed version number in setup.py


0.1b1 (2011-03-12)
------------------

* first pre-release version; imported from Smewt with a few enhancements already
in there.

Project details


Release history Release notifications | RSS feed

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

guessit-0.10.4.tar.gz (146.3 kB view hashes)

Uploaded source

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page