Skip to main content

differential filter

Project description

Python 3.6 and greater

alpha-filter

When parsing, sometimes it is necessary to reduce the number of requests to the server, for example, our script collects links from pagination to the product every day, and then parses each product separately. But what to do if some time ago we already parsed these goods, why do it twice. alpha-filter will help filter out those ads that have already been read, and will return only new ones.

Getting starting

pip install alpha-filter

Usage

from alphafilter import filter_ads, mark_as_processed, is_processed

>>> first_parsing_urls = ["https://www.example.com/1", "https://www.example.com/2"]
>>> new, old = filter_ads(first_parsing_urls)
>>> new
["https://www.example.com/1", "https://www.example.com/2"]
>>> old
[]

second_parsing_urls = first_parsing_urls # second parsing same with first

>>> new, old = filter_ads(second_parsing_urls)
>>> new
[]
>>> old
[]

>>>third_parsing_urls = ["https://www.example.com/2", "https://www.example.com/3"]

>>> new, old = filter_ads(third_parsing_urls)
>>> new
["https://www.example.com/3"]
>>> old
["https://www.example.com/1"]

Also you can mark your urls for some purposes

>>> urls_for_mark = ["https://www.example.com/2", "https://www.example.com/3"]
>>> mark_as_processed(urls_for_mark)
>>> is_processed("https://www.example.com/2")
True
>>> is_processed("https://www.example.com/4")
False

It uses a fast sqlite database to store urls. The database file ('ads.db') will be created in the root directory

Warning!!! this package has no protection against sql injection, do not use it for the external interface

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

alpha-filter-0.9.2.tar.gz (4.0 kB view details)

Uploaded Source

File details

Details for the file alpha-filter-0.9.2.tar.gz.

File metadata

  • Download URL: alpha-filter-0.9.2.tar.gz
  • Upload date:
  • Size: 4.0 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.2 CPython/3.10.0

File hashes

Hashes for alpha-filter-0.9.2.tar.gz
Algorithm Hash digest
SHA256 8c3208d7451f459bfbb17eb8a7006f8286c1130eec1d2aae78c0d38dfa09a973
MD5 ba628cd11ba6e645f1bd2d0f7855bf1e
BLAKE2b-256 82c3f29fa2d774ec1e4b37c2ad4ea152f5d5594f42e30d81d6cf60f891d8a977

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page