differential filter
Project description
alpha-filter
When parsing, sometimes it is necessary to reduce the number of requests to the server, for example, our script collects links from pagination to the product every day, and then parses each product separately. But what to do if some time ago we already parsed these goods, why do it twice. alpha-filter will help filter out those ads that have already been read, and will return only new ones.
Getting starting
pip install alpha-filter
Usage
from alphafilter import filter_ads, mark_as_processed, is_processed
>>> first_parsing_urls = ["https://www.example.com/1", "https://www.example.com/2"]
>>> new, old = filter_ads(first_parsing_urls)
>>> new
["https://www.example.com/1", "https://www.example.com/2"]
>>> old
[]
second_parsing_urls = first_parsing_urls # second parsing same with first
>>> new, old = filter_ads(second_parsing_urls)
>>> new
[]
>>> old
[]
>>>third_parsing_urls = ["https://www.example.com/2", "https://www.example.com/3"]
>>> new, old = filter_ads(third_parsing_urls)
>>> new
["https://www.example.com/3"]
>>> old
["https://www.example.com/1"]
Also you can mark your urls for some purposes
>>> urls_for_mark = ["https://www.example.com/2", "https://www.example.com/3"]
>>> mark_as_processed(urls_for_mark)
>>> is_processed("https://www.example.com/2")
True
>>> is_processed("https://www.example.com/4")
False
It uses a fast sqlite database to store urls. The database file ('ads.db') will be created in the root directory
Warning!!! this package has no protection against sql injection, do not use it for the external interface
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
File details
Details for the file alpha-filter-0.9.2.tar.gz
.
File metadata
- Download URL: alpha-filter-0.9.2.tar.gz
- Upload date:
- Size: 4.0 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/4.0.2 CPython/3.10.0
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 8c3208d7451f459bfbb17eb8a7006f8286c1130eec1d2aae78c0d38dfa09a973 |
|
MD5 | ba628cd11ba6e645f1bd2d0f7855bf1e |
|
BLAKE2b-256 | 82c3f29fa2d774ec1e4b37c2ad4ea152f5d5594f42e30d81d6cf60f891d8a977 |