Skip to main content

Simple image scraping in Python

Project description

Configurable, extensible image scraping for Python. Inspired by the design and internals of Kenneth Reitz’ Requests library.

>>> from snatch import snatch
>>> images = snatch('http://octodex.github.com/pythocat/')
>>> images.extensions
[u'png']
>>> images[1]
<Image ["pythocat.png"]>
>>> images[1].url
u'http://octodex.github.com/images/pythocat.png'

Easily usable, easily configurable:

>>> url = 'url/with/54/images'
>>> snatch(url)
<ImageList [54]>

# reduce your results by extension:
>>> _.with_extension('gif')
<ImageList [2]>

# or more explicitly limit your extension in the inital api call:
>>> snatch(url, with_extension=('gif',))
<ImageList [2]>

It’s also very easy to hook your own filters or operations into Snatch’s callbacks system. Let’s say you only wanted to capture images that were larger than 250 px wide:

import requests
import Image
from StringIO import StringIO
from snatch import snatch

def wider_than_250(images):
    def filter_fn(image):
        if image.width is None:
            res = requests.get(image.src)
            img = Image.open(StringIO(res.content))
            image.width = img.size[0]
        return image.width > 250
    return filter(filter_fn, images)

url = 'http://octodex.github.com/images/pythocat.png'
callbacks = {'complete': wider_than_250}
images = snatch(url, callbacks=callbacks)

And even simpler to download all images from a URL:

import os
import requests
from snatch import snatch

directory = 'snatched-images'

if not os.path.exists(directory):
    os.mkdir(directory)

for image in snatch('http://octodex.github.com/pythocat/'):
    contents = requests.get(image.url).content
    with open('%s/%s' % (directory, image.filename), 'w') as image_file:
        image_file.write(contents)

Release History

0.1.0 (2013-10-12)

  • Initial write/scaffold, lots to fix/improve upon

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

snatch-0.1.0.tar.gz (6.7 kB view details)

Uploaded Source

File details

Details for the file snatch-0.1.0.tar.gz.

File metadata

  • Download URL: snatch-0.1.0.tar.gz
  • Upload date:
  • Size: 6.7 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No

File hashes

Hashes for snatch-0.1.0.tar.gz
Algorithm Hash digest
SHA256 32e7e86b14de2064ee9860c4a99caa99a2a471fd5dc8201de83717d630f94aeb
MD5 da988461a3cb4b5761b51bf9b0ce76d9
BLAKE2b-256 25109d44219c75316c268b334b3cf9becc673d811c61b04cca59d622e49684eb

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page