This is a pre-production deployment of Warehouse, however changes made here WILL affect the production instance of PyPI.
Latest Version Dependencies status unknown Test status unknown Test coverage unknown
Project Description

Pyap is an MIT Licensed text processing library, written in Python, for detecting and parsing addresses. Currently it supports USA and Canadian addresses.

>>> import pyap
>>> test_address = """
    Lorem ipsum
    225 E. John Carpenter Freeway,
    Suite 1500 Irving, Texas 75062
    Dorem sit amet
    """
>>> addresses = pyap.parse(test_address, country='US')
>>> for address in addresses:
        # shows found address
        print(address)
        # shows address parts
        print(address.as_dict())
...

Installation

To install Pyap, simply:

$ pip install pyap

About

This library has been created because i couldn’t find any reliable and opensource solution for detecting addresses on web pages when writing my web crawler. Currently available solutions have drawbacks when it comes to using them to process really large amounts of data fast. You’ll either have to buy some proprietary software; use third-party pay-per-use services or use address detecting which is slow and unsuitable for real-time processing.

Pyap is an alternative to all these methods. It is really fast because it is based on using regular expressions and it allows to find addresses in text in real time with low error rates.

Future work

  • Add rules for parsing UK addresses
  • Add rules for parsing FR addresses

Typical workflow

Pyap should be used as a first thing when you need to detect an address inside a text when you don’t know for sure whether the text contains addresses or not.

To achieve the most accuracy Pyap results could be reverified using geocoding process.

Limitations

Because Pyap is based on regular expressions it provides fast results. This is also a limitation because regexps intentionally do not use too much context to detect an address.

In other words in order to detect US address, the library doesn’t use any list of US cities or a list of typical street names. It looks for a pattern which is most likely to be an address.

For example the string below would be detected as a valid address: “1 SPIRITUAL HEALER DR SHARIF NSAMBU SPECIALISING IN”

It happens because this string has all the components of a valid address: street number “1”, street name “SPIRITUAL HEALER” followed by a street identifier “DR” (Drive), city “SHARIF NSAMBU SPECIALISING” and a state name abbreviation “IN” (Indiana).

The good news is that the above mentioned errors are quite rare.

Release History

Release History

0.1.0

This version

History Node

TODO: Figure out how to actually get changelog content.

Changelog content for this version goes here.

Donec et mollis dolor. Praesent et diam eget libero egestas mattis sit amet vitae augue. Nam tincidunt congue enim, ut porta lorem lacinia consectetur. Donec ut libero sed arcu vehicula ultricies a non tortor. Lorem ipsum dolor sit amet, consectetur adipiscing elit.

Show More

0.1

History Node

TODO: Figure out how to actually get changelog content.

Changelog content for this version goes here.

Donec et mollis dolor. Praesent et diam eget libero egestas mattis sit amet vitae augue. Nam tincidunt congue enim, ut porta lorem lacinia consectetur. Donec ut libero sed arcu vehicula ultricies a non tortor. Lorem ipsum dolor sit amet, consectetur adipiscing elit.

Show More

Download Files

Download Files

TODO: Brief introduction on what you do with files - including link to relevant help section.

File Name & Checksum SHA256 Checksum Help Version File Type Upload Date
pyap-0.1.0.tar.gz (20.3 kB) Copy SHA256 Checksum SHA256 Source Apr 15, 2015

Supported By

WebFaction WebFaction Technical Writing Elastic Elastic Search Pingdom Pingdom Monitoring Dyn Dyn DNS HPE HPE Development Sentry Sentry Error Logging CloudAMQP CloudAMQP RabbitMQ Heroku Heroku PaaS Kabu Creative Kabu Creative UX & Design Fastly Fastly CDN DigiCert DigiCert EV Certificate Rackspace Rackspace Cloud Servers DreamHost DreamHost Log Hosting