Pick up failed list processing without re-processing

These details have not been verified by PyPI

Project links

License
- OSI Approved :: MIT License
Operating System
- OS Independent
Programming Language
- Python :: 3

Project description

PickMeUp - Pick up failed list processing without re-processing

PickMeUp allows to pick up a failed processing of elements.

elements = [1,2,3,4,5,6,7,8]
with PickMeUp(elements, name="element_processing") as l:
    for e in l:
        result = process(e)
        save(result)

If process fails for elements dividable by 4, the script crashes. You now have to fix process and re-run the script. Normally, you would re-process 1, 2, and 3. However, you already have results for those! And process might be expensive. PickMeUp will know that the processing failed at 4, and will only iterate over 4, 5, 6, 7, and 8, saving you from re-processing.

Without PickMeUp

Process 1,2,3
Fix process
Process 1,2,3,4,5,6,7,8

Showcase GIF: Without PickMeUp

With PickMeUp

Process 1,2,3
Fix process
Process 4,5,6,7,8

Showcase GIF: With PickMeUp

Installation

pip install pickmeup

Usage

Create a context using your list and a name, and use the context element as your list:

with PickMeUp(yourList, "some_unique_name") as l:
    for e in l:
        # Do stuff with e

Example Use-Case: Scraping

Imagine you want to crawl content from a website. You have a list of 50 content pages, and you want to extract the src of one specific image from each page. You wrote the extractor (identifying the important image, extracting the src) by looking at the first 2 pages. It seems that the important image can be identified with the id=important-image:

import requests
from bs4 import BeautifulSoup


def scrape(url: str) -> str:
    """Returns the html of the given `url`"""
    resp = requests.get(url)
    return resp.content

def extract_important_image_src(html: str) -> str:
    """Finds the important image in the given `html` and
    returns its `src` attribute"""
    soup = BeautifulSoup(html)
    return soup.find('img', {'id': 'important-image'})['src']

URLS = ["example.com/content/1", "example.com/content/2", ...] 
for url in URLS:
    html = scrape(url)
    src = extract_important_image_src(html)
    print(f"{src} extracted!")

However, it just so happens that half of the content pages follow an old design, and those pages mark their image with id=important. Your extractor breaks, because there is no element with id=important-image and the subscript ['src'] fails:

> "example.com/content/1 extracted!"
> ...
> "example.com/content/25 extracted!"
> TypeError: 'NoneType' object is not subscriptable

So you enhance extract_important_image_src to handle the old design. But now you have to parse the first 25 pages again! Or skip them manually. Instead, if you used PickMeUp:

URLS = ["example.com/content/1", "example.com/content/2", ...]
with PickMeUp(URLS, name="url_processing") as l:
    for url in l:
        html = scrape(url)
        src = extract_important_image_src(html)
        print(f"{src} extracted!")

You could just update extract_important_image_src and re-run your script. It will pick up the scraping at the element that failed, and proceed as if nothing happened. If there is another design change somewhere, the processing will fail again. But again, you can fix the issue and re-run the script, without re-processing all the urls that worked before!

Caveats

PickMeUp creates state for your processing by dumping the remaining elements to disk. This creates a few caveats:

If your elements are really large objects, this will take much disk space.
If you are using a generator, and the elements are expensive to create, PickMeUp will take the time to create them all and dump them to disk.
Changing the list elements between script runs without clearing state creates undefined behavior!
List elements have to be serializable!

Project details

These details have not been verified by PyPI

Project links

License
- OSI Approved :: MIT License
Operating System
- OS Independent
Programming Language
- Python :: 3

Release history Release notifications | RSS feed

This version

0.0.3

Feb 19, 2023

0.0.2

Feb 19, 2023

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

pickmeup-0.0.3.tar.gz (7.0 kB view details)

Uploaded Feb 19, 2023 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

pickmeup-0.0.3-py3-none-any.whl (6.6 kB view details)

Uploaded Feb 19, 2023 Python 3

File details

Details for the file pickmeup-0.0.3.tar.gz.

File metadata

Download URL: pickmeup-0.0.3.tar.gz
Upload date: Feb 19, 2023
Size: 7.0 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/4.0.2 CPython/3.10.6

File hashes

Hashes for pickmeup-0.0.3.tar.gz
Algorithm	Hash digest
SHA256	`55ebf3fc7fd94997d614d852c360a8c5605261e9923c7d64843d129177b955ae`
MD5	`a4cc36042a69d7edef8c86d566d3cb6d`
BLAKE2b-256	`6fa9a81c825b1e01fe9712c059e1bc5be2e8b03937fbbad1a968b3da5a1bc7f5`

See more details on using hashes here.

File details

Details for the file pickmeup-0.0.3-py3-none-any.whl.

File metadata

Download URL: pickmeup-0.0.3-py3-none-any.whl
Upload date: Feb 19, 2023
Size: 6.6 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/4.0.2 CPython/3.10.6

File hashes

Hashes for pickmeup-0.0.3-py3-none-any.whl
Algorithm	Hash digest
SHA256	`ccbdc9415020402fa8ed6ba5c81d869c2ec478ba8b9629f0a0fcd2f352565cec`
MD5	`06983ed3816cde0dad46a61d44ff5256`
BLAKE2b-256	`55260ca6c414080271b8c2b85bf009206958340106db267c54871f1640d5e054`

See more details on using hashes here.

pickmeup 0.0.3

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

PickMeUp - Pick up failed list processing without re-processing

Without PickMeUp

With PickMeUp

Installation

Usage

Example Use-Case: Scraping

Caveats

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes