Skip to main content

Parser for Adblock Plus rules

Project description

adblockparser

PyPI Version License Build Status Code Coverage

adblockparser is a package for working with Adblock Plus filter rules. It can parse Adblock Plus filters and match URLs against them.

Installation

pip install adblockparser

Python 2.7 and Python 3.3+ are supported.

If you plan to use this library with a large number of filters installing pyre2 library is highly recommended: the speedup for a list of default EasyList filters can be greater than 1000x.

pip install ‘re2 >= 0.2.21’

Note that pyre2 library requires C++ re2 library installed. On OS X you can get it using homebrew (brew install re2).

Usage

To learn about Adblock Plus filter syntax check these links:

  1. Get filter rules somewhere: write them manually, read lines from a file downloaded from EasyList, etc.:

    >>> raw_rules = [
    ...     "||ads.example.com^",
    ...     "@@||ads.example.com/notbanner^$~script",
    ... ]
  2. Create AdblockRules instance from rule strings:

    >>> from adblockparser import AdblockRules
    >>> rules = AdblockRules(raw_rules)
  3. Use this instance to check if an URL should be blocked or not:

    >>> rules.should_block("http://ads.example.com")
    True

    Rules with options are ignored unless you pass a dict with options values:

    >>> rules.should_block("http://ads.example.com/notbanner")
    True
    >>> rules.should_block("http://ads.example.com/notbanner", {'script': False})
    False
    >>> rules.should_block("http://ads.example.com/notbanner", {'script': True})
    True

Consult with Adblock Plus docs for options description. These options allow to write filters that depend on some external information not available in URL itself.

Performance

Regex engines

AdblockRules class creates a huge regex to match filters that don’t use options. pyre2 library works better than stdlib’s re with such regexes. If you have pyre2 installed then AdblockRules should work faster, and the speedup can be dramatic - more than 1000x in some cases.

Sometimes pyre2 prints something like re2/dfa.cc:459: DFA out of memory: prog size 270515 mem 1713850 to stderr. Give re2 library more memory to fix that:

>>> rules = AdblockRules(raw_rules, use_re2=True, max_mem=512*1024*1024)  # doctest: +SKIP

Make sure you are using re2 0.2.20 installed from PyPI, it doesn’t work.

Parsing rules with options

Rules that have options are currently matched in a loop, one-by-one. Also, they are checked for compatibility with options passed by user: for example, if user didn’t pass ‘script’ option (with a True or False value), all rules involving script are discarded.

This is slow if you have thousands of such rules. To make it work faster, explicitly list all options you want to support in AdblockRules constructor, disable skipping of unsupported rules, and always pass a dict with all options to should_block method:

>>> rules = AdblockRules(
...    raw_rules,
...    supported_options=['script', 'domain'],
...    skip_unsupported_rules=False
... )
>>> options = {'script': False, 'domain': 'www.mystartpage.com'}
>>> rules.should_block("http://ads.example.com/notbanner", options)
False

This way rules with unsupported options will be filtered once, when AdblockRules instance is created.

Limitations

There are some known limitations of the current implementation:

  • element hiding rules are ignored;

  • matching URLs against a large number of filters can be slow-ish, especially if pyre2 is not installed and many filter options are enabled;

  • match-case filter option is not properly supported (it is ignored);

  • document filter option is not properly supported;

  • rules are not validated before parsing, so invalid rules may raise inconsistent exceptions or silently work incorrectly.

It is possible to remove all these limitations. Pull requests are welcome if you want to make it happen sooner!

Contributing

In order to run tests, install tox and type

tox

from the source checkout.

The license is MIT.

Changes

0.6 (2016-09-10)

0.5 (2016-03-04)

  • Fixed an issue with blank lines in filter files (thanks https://github.com/skrypka);

  • fixed an issue with applying rules with ‘domain’ option when domain doesn’t have a dot (e.g. ‘localhost’);

  • Python 2.6 and Python 3.2 support is dropped; adblockparser likely still work in these interpreters, but this is no longer checked by tests.

0.4 (2015-03-29)

0.3 (2014-07-11)

  • Switch to setuptools;

  • better __repr__ for AdblockRule;

  • Python 3.4 support is confirmed;

  • testing improvements.

0.2 (2014-03-20)

This release provides much faster AdblockRules.should_block() method for rules without options and rules with ‘domain’ option.

  • better combined regex for option-less rules that makes re2 library always use DFA without falling back to NFA;

  • an index for rules with domains;

  • params method arguments are renamed to options for consistency.

0.1.1 (2014-03-11)

By default AdblockRules autodetects re2 library and uses it if a compatible version is detected.

0.1 (2014-03-03)

Initial release.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

adblockparser-0.6.tar.gz (12.8 kB view details)

Uploaded Source

Built Distribution

adblockparser-0.6-py2.py3-none-any.whl (13.2 kB view details)

Uploaded Python 2 Python 3

File details

Details for the file adblockparser-0.6.tar.gz.

File metadata

  • Download URL: adblockparser-0.6.tar.gz
  • Upload date:
  • Size: 12.8 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No

File hashes

Hashes for adblockparser-0.6.tar.gz
Algorithm Hash digest
SHA256 9f595b6138d5a684b2f19d2666fa9e13054e8d90e548102765a290e8bb352c46
MD5 eda182718c1c9bf35ddc33d3d9fb3606
BLAKE2b-256 ddfe235c4757e2c47c414f2f827801358dbff232797a15f4c080fc30b3af76de

See more details on using hashes here.

File details

Details for the file adblockparser-0.6-py2.py3-none-any.whl.

File metadata

File hashes

Hashes for adblockparser-0.6-py2.py3-none-any.whl
Algorithm Hash digest
SHA256 bdcbdcfc4bef13001464f2d532d6d322d9481fe9ce4c48ad6f80ab73cadfbd14
MD5 e0c31b24f664c55926bc0ef9014db6d6
BLAKE2b-256 c9157a995c9a7e5ae32c1d37939977ae79706005a92ef5b1bc203a2195fbd069

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page