Extract price and currency from a raw string
Project description
price-parser is a small library for extracting price and currency from raw text strings.
Features:
robust price amount and currency symbol extraction
zero-effort handling of thousand and decimal separators
The main use case is parsing prices extracted from web pages. For example, you can write a CSS/XPath selector which targets an element with a price, and then use this library for cleaning it up, instead of writing custom site-specific regex or Python code.
License is BSD 3-clause.
Installation
pip install price-parser
price-parser requires Python 3.6+.
Usage
Basic usage
>>> from price_parser import Price >>> price = Price.fromstring("22,90 €") >>> price Price(amount=Decimal('22.90'), currency='€') >>> price.amount # numeric price amount Decimal('22.90') >>> price.currency # currency symbol, as appears in the string '€' >>> price.amount_text # price amount, as appears in the string '22,90' >>> price.amount_float # price amount as float, not Decimal 22.9
If you prefer, Price.fromstring has an alias price_parser.parse_price, they do the same:
>>> from price_parser import parse_price >>> parse_price("22,90 €") Price(amount=Decimal('22.90'), currency='€')
The library has extensive tests (900+ real-world examples of price strings). Some of the supported cases are described below.
Supported cases
Unclean price strings with various currencies are supported; thousand separators and decimal separators are handled:
>>> Price.fromstring("Price: $119.00") Price(amount=Decimal('119.00'), currency='$')
>>> Price.fromstring("15 130 Р") Price(amount=Decimal('15130'), currency='Р')
>>> Price.fromstring("151,200 تومان") Price(amount=Decimal('151200'), currency='تومان')
>>> Price.fromstring("Rp 1.550.000") Price(amount=Decimal('1550000'), currency='Rp')
>>> Price.fromstring("Běžná cena 75 990,00 Kč") Price(amount=Decimal('75990.00'), currency='Kč')
Euro sign is used as a decimal separator in a wild:
>>> Price.fromstring("1,235€ 99") Price(amount=Decimal('1235.99'), currency='€')
>>> Price.fromstring("99 € 95 €") Price(amount=Decimal('99'), currency='€')
>>> Price.fromstring("35€ 999") Price(amount=Decimal('35'), currency='€')
Some special cases are handled:
>>> Price.fromstring("Free") Price(amount=Decimal('0'), currency=None)
When price or currency can’t be extracted, corresponding attribute values are set to None:
>>> Price.fromstring("") Price(amount=None, currency=None)
>>> Price.fromstring("Foo") Price(amount=None, currency=None)
>>> Price.fromstring("50% OFF") Price(amount=None, currency=None)
>>> Price.fromstring("50") Price(amount=Decimal('50'), currency=None)
>>> Price.fromstring("R$") Price(amount=None, currency='R$')
Currency hints
currency_hint argument allows to pass a text string which may (or may not) contain currency information. This feature is most useful for automated price extraction.
>>> Price.fromstring("34.99", currency_hint="руб. (шт)") Price(amount=Decimal('34.99'), currency='руб.')
Note that currency mentioned in the main price string may be preferred over currency specified in currency_hint argument; it depends on currency symbols found there. If you know the correct currency, you can set it directly:
>>> price = Price.fromstring("1 000") >>> price.currency = 'EUR' >>> price Price(amount=Decimal('1000'), currency='EUR')
Contributing
Source code: https://github.com/scrapinghub/price-parser
Issue tracker: https://github.com/scrapinghub/price-parser/issues
Use tox to run tests with different Python versions:
tox
The command above also runs type checks; we use mypy.
Changes
0.2.1 (2019-04-19)
23 additional currency symbols are added;
A$ alias for Australian Dollar is added.
0.2 (2019-04-12)
Added support for currencies replaced by euro.
0.1.1 (2019-04-12)
Minor packaging fixes.
0.1 (2019-04-12)
Initial release.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Hashes for price_parser-0.2.1-py2.py3-none-any.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | c5e153c96d523596a0fc9a648331e3dd8ff68fcd8ef7240c1eb8b74440571ee7 |
|
MD5 | d6998b40e4f9deb78761f590438c3bf7 |
|
BLAKE2b-256 | 4bfb76c748d1a6ec427f6ee99d07a20db5023624ac8cb06abe8fb72dc1314532 |