webscrapetools

A basic but fast, persistent and threadsafe caching system

Project description

This package lets you efficiently retrieve pages from the Internet by caching request’s results.

Basic commands

Importing required modules first:

from webscrapetools import urlcaching

Initializing the cache:

urlcaching.set_cache_path(‘.wst_cache’)

The option _expiry_days_ sets the cache expiry period, default is 10 days.

This is a required step: otherwise responses to url calls will simply not be cached. Cache data are stored in the specified folder, so that re-using the same string makes the cache persistent. This creates the folder on the fly if it does not exist. The following command cleans up the cache, making sure we start with no prior data:

urlcaching.empty_cache()

Opening an url with the following command stores the repsonse content behind the scene, so that subsequent calls will not hit the network.

urlcaching.open_url(‘http://www.google.com’)

Full example

from webscrapetools import urlcaching
import time

# Initializing the cache
urlcaching.set_cache_path('.wst_cache')

# Making sure we start from scratch
urlcaching.empty_cache()

# Demo with 5 identical calls... only the first one is delayed, all others are hitting the cache
count_calls = 1
while count_calls <= 5:
    start_time = time.time()
    urlcaching.open_url('http://deelay.me/5000/http://www.google.com')
    duration = time.time() - start_time
    print('duration for call {}: {:0.2f}'.format(count_calls, duration))
    count_calls += 1

# Cleaning up
urlcaching.empty_cache()

The code above outputs the following:

duration for call 1: 6.74 duration for call 2: 0.00 duration for call 3: 0.00 duration for call 4: 0.00 duration for call 5: 0.00

Example plugging in a custom client

The framework lets you customize the way you access the web. It is therefore possible to drive a browser via Selenium for example.

from webscrapetools import urlcaching
urlcaching.set_cache_path('./output/tests', max_node_files=10, rebalancing_limit=100)

def dummy_client():
    return None

def dummy_call(_, key):
    return '{:d}'.format(int(key)) * int(key), key

# simulating calls using the dummy client
keys = ('{:05d}'.format(count) for count in range(500))
for key in keys:
    urlcaching.open_url(key, init_client_func=dummy_client, call_client_func=dummy_call)

urlcaching.empty_cache()

Project details

Release history Release notifications | RSS feed

This version

0.5.5

Oct 25, 2019

0.5.4

Oct 22, 2019

0.5.3

Oct 22, 2019

0.5.1

Oct 21, 2019

0.4.6

Sep 7, 2019

0.4.5

Sep 6, 2019

0.4.4

Sep 4, 2019

0.4.3

Mar 21, 2019

0.4.2

Mar 21, 2019

0.4.1

Mar 17, 2019

0.4.0

Jul 5, 2017

0.3

Oct 18, 2016

0.2

Oct 5, 2016

0.1

Oct 5, 2016

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

webscrapetools-0.5.5.tar.gz (11.8 kB view details)

Uploaded Oct 25, 2019 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

webscrapetools-0.5.5-py3-none-any.whl (13.6 kB view details)

Uploaded Oct 25, 2019 Python 3

File details

Details for the file webscrapetools-0.5.5.tar.gz.

File metadata

Download URL: webscrapetools-0.5.5.tar.gz
Upload date: Oct 25, 2019
Size: 11.8 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/1.13.0 pkginfo/1.5.0.1 requests/2.22.0 setuptools/28.8.0 requests-toolbelt/0.9.1 tqdm/4.35.0 CPython/3.6.0

File hashes

Hashes for webscrapetools-0.5.5.tar.gz
Algorithm	Hash digest
SHA256	`b8ba08b4a351796c7283e4c773a571f1984af9bf066bd3f738e489370e0f73c3`
MD5	`801217f626418ea6e3bdc7d0246543c8`
BLAKE2b-256	`a549f8f2070ac51fbfc3b0627ff31d19b199cb56c21b40c988997d4caa971761`

See more details on using hashes here.

File details

Details for the file webscrapetools-0.5.5-py3-none-any.whl.

File metadata

Download URL: webscrapetools-0.5.5-py3-none-any.whl
Upload date: Oct 25, 2019
Size: 13.6 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/1.13.0 pkginfo/1.5.0.1 requests/2.22.0 setuptools/28.8.0 requests-toolbelt/0.9.1 tqdm/4.35.0 CPython/3.6.0

File hashes

Hashes for webscrapetools-0.5.5-py3-none-any.whl
Algorithm	Hash digest
SHA256	`ae3daa8fb7073a4a7a50567d41991e502069e87330cb4e3c23a2f7ebcb845de9`
MD5	`088015edb671a4a44f4456458cc86391`
BLAKE2b-256	`c7ddce5df3605f05e8396bf2dc183fffaf0d40ae6235c8393be31936fadffc19`

See more details on using hashes here.

webscrapetools 0.5.5

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Project description

Basic commands

Full example

Example plugging in a custom client

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes