Skip to main content

Lassie is a Python library for retrieving basic content from websites

Project description

Lassie
======

.. image:: https://img.shields.io/pypi/v/lassie.svg?style=flat-square
:target: https://pypi.python.org/pypi/lassie

.. image:: https://img.shields.io/travis/michaelhelmick/lassie.svg?style=flat-square
:target: https://travis-ci.org/michaelhelmick/lassie

.. image:: https://img.shields.io/coveralls/michaelhelmick/lassie/master.svg?style=flat-square
:target: https://coveralls.io/r/michaelhelmick/lassie?branch=master

.. image:: https://img.shields.io/badge/Say%20Thanks!-:)-1EAEDB.svg?style=flat-square
:target: https://saythanks.io/to/michaelhelmick

Lassie is a Python library for retrieving basic content from websites.

.. image:: https://i.imgur.com/QrvNfAX.gif

Usage
-----

.. code-block:: python

>>> import lassie
>>> lassie.fetch('http://www.youtube.com/watch?v=dQw4w9WgXcQ')
{
'description': u'Music video by Rick Astley performing Never Gonna Give You Up. YouTube view counts pre-VEVO: 2,573,462 (C) 1987 PWL',
'videos': [{
'src': u'http://www.youtube.com/v/dQw4w9WgXcQ?autohide=1&version=3',
'height': 480,
'type': u'application/x-shockwave-flash',
'width': 640
}, {
'src': u'https://www.youtube.com/embed/dQw4w9WgXcQ',
'height': 480,
'width': 640
}],
'title': u'Rick Astley - Never Gonna Give You Up',
'url': u'http://www.youtube.com/watch?v=dQw4w9WgXcQ',
'keywords': [u'Rick', u'Astley', u'Sony', u'BMG', u'Music', u'UK', u'Pop'],
'images': [{
'src': u'http://i1.ytimg.com/vi/dQw4w9WgXcQ/hqdefault.jpg?feature=og',
'type': u'og:image'
}, {
'src': u'http://i1.ytimg.com/vi/dQw4w9WgXcQ/hqdefault.jpg',
'type': u'twitter:image'
}, {
'src': u'http://s.ytimg.com/yts/img/favicon-vfldLzJxy.ico',
'type': u'favicon'
}, {
'src': u'http://s.ytimg.com/yts/img/favicon_32-vflWoMFGx.png',
'type': u'favicon'
}],
'locale': u'en_US'
}

Install
-------

Install Lassie via `pip <http://www.pip-installer.org/>`_

.. code-block:: bash

$ pip install lassie

or, with `easy_install <http://pypi.python.org/pypi/setuptools>`_

.. code-block:: bash

$ easy_install lassie

But, hey... `that's up to you <http://www.pip-installer.org/en/latest/other-tools.html#pip-compared-to-easy-install>`_.

Documentation
-------------

Documentation can be found here: https://lassie.readthedocs.org/



.. :changelog:

History
-------

0.11.6 (2018-05-24)
++++++++++++++++++
- Fix issue where AMP images was a list of dictionaries and being identified as an object.

0.11.5 (2017-12-27)
++++++++++++++++++
- Pin requests==2.18.4

0.11.4 (2017-11-01)
++++++++++++++++++
- Always get oembed AND html data.

0.11.3 (2017-11-01)
++++++++++++++++++
- Fix filters.oembed module once lassie is packaged.

0.11.0 (2017-11-01)
++++++++++++++++++
- Add support for OEmbed providers (YouTube)

0.10.1 (2017-06-02)
++++++++++++++++++
- Remove owl emoji from README.rst so installs on Windows don't fail.

0.10.0 (2017-02-03)
++++++++++++++++++
- Fix issue where a website may have malformed HTML and no <html> tag causing soup.html to be None (#60)
- Updated beautifulsoup4 to 4.5.3
- Update html5lib to 1.0b10

0.9.0 (2017-01-29)
++++++++++++++++++
- Added a default fake user agent to use instead of using python-requests/version (some websites will mark certain user agents as bot attempts)
- Updated requests to 2.13.0

0.8.7 (2016-12-21)
++++++++++++++++++
- Fix Python 3 support
- Handle empty AMP image lists

0.8.6 (2016-11-17)
++++++++++++++++++
- Handle AMP image list of strings vs list of objects

0.8.5 (2016-11-03)
++++++++++++++++++
- Handle AMP data that is contained in a list
- Retrieve videos and thumbnails (as images) from AMP VideoObjects

0.8.4 (2016-11-01)
++++++++++++++++++
- Fix issue where AMP images could be lists inside an object

0.8.3 (2016-10-21)
++++++++++++++++++
- Fix issue where some keys returned (i.e. description) would not be retrieved if the key existed with an empty value already

0.8.2 (2016-09-26)
++++++++++++++++++
- Fix issue where AMP images could be images and not objects

0.8.1 (2016-09-26)
++++++++++++++++++
- Add support for AMP "description" attribute
- Fix issue where an error would be thrown if width/height of an image weren't strings
- Fix duplicate AMP title request, should have been url

0.8.0 (2016-09-26)
++++++++++++++++++
- Add support for links that use AMP

0.7.2 (2016-08-01)
++++++++++++++++++
- Add `status_code` to response dictionary (for "file-like" responses, as well)

0.7.1 (2016-07-27)
++++++++++++++++++
- Add support for open graph `site_name`


0.7.0 (2016-07-01)
++++++++++++++++++
- Add `status_code` to response dictionary


0.6.2 (2015-11-11)
++++++++++++++++++
- Pinned `requests` library to version 2.8.1
- Pinned `beautifulsoup4` library to version 4.4.1
- Add Python 3.5 to Travis CI build matrix (officially support 3.5)


0.6.1 (2015-10-30)
++++++++++++++++++
- Catch and raise `LassieError` on HEAD requests when `handle_file_content` is passed to the Lassie API
- Pinned `requests` library to version 2.8.0


0.6.0 (2015-08-19)
++++++++++++++++++
- Support for secure url image and videos from Open Graph
- Simplified `merge_settings` and data updating internally


0.5.3 (2015-07-02)
++++++++++++++++++
- Handle when a website doesn't set a value on the "keywords" meta tag


0.5.2 (2015-04-16)
++++++++++++++++++
- Updated `requests` and `beautifulsoup4` library versions


0.5.1 (2014-08-05)
++++++++++++++++++
- Fix issue where headers didn't always have 'Content-Type' key


0.5.0 (2014-06-23)
++++++++++++++++++
- Added ability to `fetch` links that are image files (jpg, gif, png, bmp)
- Renamed `_retreive_content` to `_retrieve_content` because I evidently don't know how to spell correctly


0.4.0 (2013-09-30)
++++++++++++++++++
- Updated `requests` and `beautifulsoup4` library versions
- Added support for manipulating the request, see Advanced Usage docs
- Fixed issue where `lassie.fetch` would break if the page had no title
- Lassie is now more lenient when it comes to width and height values of images (now accepts integers (100) or integer with px (100px)
- Image URLs for all images are now absolute

0.3.0 (2013-08-15)
++++++++++++++++++

- Added support for `locale` to be returned. If `lang` is specified in the `html` tag and it normalizes to an actual locale, it will be added to the returned data.
- Fixed bug where height was not being returned for body images
- Added test coverage, we're 100% covered! :D


0.2.1 (2013-08-13)
++++++++++++++++++

- Remove spaces from the returned keywords list
- Fixed issue where favicon was not being retrieved
- Fixed priority for class level vs method level params


0.2.0 (2013-08-06)
++++++++++++++++++

- Fix package error when importing


0.1.0 (2013-08-05)
++++++++++++++++++

- Initial Release

Project details


Release history Release notifications

This version
History Node

0.11.6

History Node

0.11.5

History Node

0.11.4

History Node

0.11.3

History Node

0.11.0

History Node

0.10.1

History Node

0.10.0

History Node

0.9.0

History Node

0.8.7

History Node

0.8.6

History Node

0.8.5

History Node

0.8.4

History Node

0.8.3

History Node

0.8.2

History Node

0.8.1

History Node

0.8.0

History Node

0.7.2

History Node

0.7.1

History Node

0.7.0

History Node

0.6.2

History Node

0.6.1

History Node

0.6.0

History Node

0.5.4

History Node

0.5.3

History Node

0.5.2

History Node

0.5.1

History Node

0.5.0

History Node

0.4.0

History Node

0.3.0

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Filename, size & hash SHA256 hash help File type Python version Upload date
lassie-0.11.6.tar.gz (12.5 kB) Copy SHA256 hash SHA256 Source None May 24, 2018

Supported by

Elastic Elastic Search Pingdom Pingdom Monitoring Google Google BigQuery Sentry Sentry Error logging CloudAMQP CloudAMQP RabbitMQ AWS AWS Cloud computing Fastly Fastly CDN DigiCert DigiCert EV certificate StatusPage StatusPage Status page