This is a pre-production deployment of Warehouse. Changes made here affect the production instance of PyPI (pypi.python.org).
Help us improve Python packaging - Donate today!
Project Description

A script to extract US-style street addresses from a text file

$ address_extractor
1600 Pennsylvania Ave NW, Washington, DC 20500 ^D
1 lines in input
,1600 Pennsylvania Ave NW,Washington DC 20500
$ address_extractor -o output.csv input.csv
4361 lines in input
*snip*
11 lines unable to be parsed
$ ls
output.csv

address_extractor takes text or a text file containing address-like data, one address per line, and parses it into a uniform format with the usaddress package.

Installation

This package is available from PyPi via pip:

pip install address_extractor

This will install the module as well as the command-line script as address_extractor.

Command-line Usage

address_extractor [-h] [-o OUTPUT] [--remove-post-zip] [input]

positional arguments:
  input                 the input file. Defaults to stdin.

optional arguments:
  -h, --help            show this help message and exit
  -o OUTPUT, --output OUTPUT
                        the output file. Defaults to stdout.
  --remove-post-zip, -r
                        when scanning the input lines, remove everything after
                        a sequence of 5 digits followed by a comma. The
                        parsing library used by this script chokes on
                        addresses containing this kind of information, often a
                        county name.

Lines that could not be parsed will be printed to STDERR. They can be saved to a file with standard bash redirection techniques:

$ address_extractor -o good_addresses.csv has_some_bad_addresses.txt 2> bad_addresses.txt

Usage as a Module

address_extractor can be used as a Python module:

>>> import address_extractor
>>> address_extractor.main(input=input_file_object, output=output_file_object, remove_post_zip=a_bool)

There are some small issues with this implementation:

  • If using sys.stdin or sys.stdout for input or output, respectively, the file objects will still be closed. This presents issues trying to use these in the future.
  • Errored lines are still printed to sys.stderr which may not be expected.

Versions and Stability

This package is uploaded as a 0.1.0 release. There are no tests and little error checking–it originated as a quick-‘n-dirty script and I decided to release it as a package to gain familiarity with that process.

Issues, comments, and pull requests are welcome at the GitHub page!

Release History

Release History

0.1.0.post1

This version

History Node

TODO: Figure out how to actually get changelog content.

Changelog content for this version goes here.

Donec et mollis dolor. Praesent et diam eget libero egestas mattis sit amet vitae augue. Nam tincidunt congue enim, ut porta lorem lacinia consectetur. Donec ut libero sed arcu vehicula ultricies a non tortor. Lorem ipsum dolor sit amet, consectetur adipiscing elit.

Show More

0.1.0

History Node

TODO: Figure out how to actually get changelog content.

Changelog content for this version goes here.

Donec et mollis dolor. Praesent et diam eget libero egestas mattis sit amet vitae augue. Nam tincidunt congue enim, ut porta lorem lacinia consectetur. Donec ut libero sed arcu vehicula ultricies a non tortor. Lorem ipsum dolor sit amet, consectetur adipiscing elit.

Show More

Download Files

Download Files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

File Name & Checksum SHA256 Checksum Help Version File Type Upload Date
address_extractor-0.1.0.post1-py3-none-any.whl (4.7 kB) Copy SHA256 Checksum SHA256 py3 Wheel Oct 31, 2015
address_extractor-0.1.0.post1.tar.gz (4.9 kB) Copy SHA256 Checksum SHA256 Source Oct 31, 2015

Supported By

WebFaction WebFaction Technical Writing Elastic Elastic Search Pingdom Pingdom Monitoring Dyn Dyn DNS Sentry Sentry Error Logging CloudAMQP CloudAMQP RabbitMQ Heroku Heroku PaaS Kabu Creative Kabu Creative UX & Design Fastly Fastly CDN DigiCert DigiCert EV Certificate Rackspace Rackspace Cloud Servers DreamHost DreamHost Log Hosting