Skip to main content

Pyap is an MIT Licensed text processing library, written in Python, for detecting and parsing addresses. Currently it supports USA, Canadian and British addresses. This is a fork maintained by Beauhurst.

Project description

Pyap is an MIT Licensed text processing library, written in Python, for detecting and parsing addresses. Currently it supports US 🇺🇸, Canadian 🇨🇦 and British 🇬🇧 addresses.

This fork is maintained by [Beauhurst](https://github.com/Beauhurst).

>>> import pyap
>>> test_address = """
    Lorem ipsum
    225 E. John Carpenter Freeway,
    Suite 1500 Irving, Texas 75062
    Dorem sit amet
    """
>>> addresses = pyap.parse(test_address, country='US')
>>> for address in addresses:
        # shows found address
        print(address)
        # shows address parts
        print(address.dict())
...

Installation

To install Pyap, simply:

$ pip install pyap_beauhurst

About

This library has been created because i couldn’t find any reliable and opensource solution for detecting addresses on web pages when writing my web crawler. Currently available solutions have drawbacks when it comes to using them to process really large amounts of data fast. You’ll either have to buy some proprietary software; use third-party pay-per-use services or use address detecting which is slow and unsuitable for real-time processing.

Pyap is an alternative to all these methods. It is really fast because it is based on using regular expressions and it allows to find addresses in text in real time with low error rates.

Future work

  • Add rules for parsing FR addresses

Typical workflow

Pyap should be used as a first thing when you need to detect an address inside a text when you don’t know for sure whether the text contains addresses or not.

To achieve the most accuracy Pyap results could be reverified using geocoding process.

Limitations

Because Pyap is based on regular expressions it provides fast results. This is also a limitation because regexps intentionally do not use too much context to detect an address.

In other words in order to detect US address, the library doesn’t use any list of US cities or a list of typical street names. It looks for a pattern which is most likely to be an address.

For example the string below would be detected as a valid address: “1 SPIRITUAL HEALER DR SHARIF NSAMBU SPECIALISING IN”

It happens because this string has all the components of a valid address: street number “1”, street name “SPIRITUAL HEALER” followed by a street identifier “DR” (Drive), city “SHARIF NSAMBU SPECIALISING” and a state name abbreviation “IN” (Indiana).

The good news is that the above mentioned errors are quite rare.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

pyap_beauhurst-0.4.3.tar.gz (16.8 kB view details)

Uploaded Source

Built Distribution

pyap_beauhurst-0.4.3-py3-none-any.whl (20.6 kB view details)

Uploaded Python 3

File details

Details for the file pyap_beauhurst-0.4.3.tar.gz.

File metadata

  • Download URL: pyap_beauhurst-0.4.3.tar.gz
  • Upload date:
  • Size: 16.8 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: poetry/1.6.0 CPython/3.8.17 Linux/5.15.0-1041-azure

File hashes

Hashes for pyap_beauhurst-0.4.3.tar.gz
Algorithm Hash digest
SHA256 3e3bb192b5b96208122145ad2fba66e2c047e2d95eac7ff4c8f476b43692d3b2
MD5 481adb626c3924b33d7db0059dbfe88c
BLAKE2b-256 eef21936bea4bdc5f54f75a05e1d4319400ea9aa5e1f42776071f05d7511630b

See more details on using hashes here.

File details

Details for the file pyap_beauhurst-0.4.3-py3-none-any.whl.

File metadata

  • Download URL: pyap_beauhurst-0.4.3-py3-none-any.whl
  • Upload date:
  • Size: 20.6 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: poetry/1.6.0 CPython/3.8.17 Linux/5.15.0-1041-azure

File hashes

Hashes for pyap_beauhurst-0.4.3-py3-none-any.whl
Algorithm Hash digest
SHA256 4b1ff5d937d2ac42fafd47a4a27b76e28e0083c569645d989621565a77ad5cc1
MD5 63b515ae5f2f6734c9a4665aaa65ace4
BLAKE2b-256 22aaa87c57d61fa4ddb75c9ebc9f1a93787808bacde95037a9e55720b01ff6cd

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page