Skip to main content

The usaddress library made easy with Pandas.

Project description

pandas-usaddress

The usaddress library made easy with Pandas.

Also supports standardizing addresses to meet USPS standards.

Installation

pip install pandas-usaddress

Usage

Basic Parsing

import pandas as pd
import pandas_usaddress

#load dataframe
df = pd.read_csv('test_file.csv')

#initiate usaddress
df = pandas_usaddress.tag(df, ['address_field'])

#send output to csv
df.to_csv('parsed_output.csv')


#------------------------------additional details------------------------------

#Output and fields will be identical to usaddress

Parsing with Address Standardization

import pandas as pd
import pandas_usaddress

#load dataframe
df = pd.read_csv('test_file.csv')

#initiate usaddress
df = pandas_usaddress.tag(df, ['address_field'], granularity='medium', standardize=True)

#send output to csv
df.to_csv('parsed_output.csv')


#------------------------------additional details------------------------------

#The standard output for usaddress has a lot of fields. The granularity parameter
#allows you to condense the results you get back for different types of analysis.
#see parameter documentation below for all granularity options.

#Addresses are often unstandardized. The same address can come as 123 1st ST, or
#123 First Street, etc. This can cause issues with analysis such as aggregation,
#or record matching. The standardize parameter attempts to standardize the address
#to US Postal Service (USPS) standards.

Parsing with Address Standardization

import pandas as pd
import pandas_usaddress

#load dataframe
df = pd.read_csv('test_file.csv')

#initiate usaddress
df = pandas_usaddress.tag(df, ['street1', 'street2', 'city', 'state'], granularity='single', standardize=True)

#send output to csv
df.to_csv('parsed_output.csv')


#------------------------------additional details------------------------------

#You can also use pandas-usaddress to concatenate and parse multiple address lines. 
#This can be helpful when you are working with two datasets that have different 
#field names and you want the field names to be standardized using a specific level of
#granularity. It's pretty common for instance that in one dataset will concatenate 
#address line 1 and 2, and another will not.

#You will help the parser do it's job if you try to concatenate fields in approximately
#same order that you would write them on an envelope.

#In this instance, we are taking multiple address fields and converting them into a
#single address line. That's fine to do!

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

pandas_usaddress-0.21.tar.gz (299.3 kB view details)

Uploaded Source

Built Distribution

pandas_usaddress-0.21-py3-none-any.whl (320.6 kB view details)

Uploaded Python 3

File details

Details for the file pandas_usaddress-0.21.tar.gz.

File metadata

  • Download URL: pandas_usaddress-0.21.tar.gz
  • Upload date:
  • Size: 299.3 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/1.13.0 pkginfo/1.5.0.1 requests/2.21.0 setuptools/40.6.3 requests-toolbelt/0.9.1 tqdm/4.31.1 CPython/3.7.1

File hashes

Hashes for pandas_usaddress-0.21.tar.gz
Algorithm Hash digest
SHA256 0dbf89d99231becfbf72a88b5fea6defcd45410b643e9f7e7ee7db569fb93f28
MD5 f586f768a9ca259d950d752e7525f875
BLAKE2b-256 631663e09bc175aee1a03dc6ba455e1f21902b3854d199b530a4cac8cf1ea9f4

See more details on using hashes here.

File details

Details for the file pandas_usaddress-0.21-py3-none-any.whl.

File metadata

  • Download URL: pandas_usaddress-0.21-py3-none-any.whl
  • Upload date:
  • Size: 320.6 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/1.13.0 pkginfo/1.5.0.1 requests/2.21.0 setuptools/40.6.3 requests-toolbelt/0.9.1 tqdm/4.31.1 CPython/3.7.1

File hashes

Hashes for pandas_usaddress-0.21-py3-none-any.whl
Algorithm Hash digest
SHA256 b6ca5d94f7b76754ecaa64357d5d30e569a4a6ea08a8638c7a3ddab73ce1390c
MD5 410a93cc631211aff3b845284a306ed8
BLAKE2b-256 feb77734c959d133d63179903d9b06d8fd11aefba1298e511deb0c601cbb93bc

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page