Skip to main content

Replace village names and commonly-misspelled Connecticut town names with real town/city names.

Project description

# CT Name Cleaner

Resolve village and coloquial Connecticut town names, as well as common misspellings of Connecticut town names to their official town names.

This is based on an R package of the same name by my colleague Andrew Ba Tran.

This installs a command line script, ctclean, as well as a library particularly meant for use within Jupyter notebooks.

by Jake Kara, jake@jakekara.com

### Latest version

0.10.1

### Installation

pip install ctnamecleaner

### Command line util

Usage:

$ ctclean NewPreston WASHINGTON $ ctclean “New Preston” WASHINGTON

When nothing is found, return None:

$ ctclean NotGonnaFindItsVille None

Set a custom value to return on error with the –error or -e flag:

$ ctclean NotGonnaFindItsVille –error “Ruh Roh” Ruh Roh

### Use with Pandas dataframes

See HELP.txt in this directory and the Notebook in the demo/ folder in this repo for an example of translating an entire column with the clean, clean_col and the clean_dataframe() method. clean_dataframe uses pandas’ DataFrame.join() method, so it’s faster than using the cean() method and applying it with a lambda function yourself.

### Extending with other data

Not in CT? Want to map other things? Just make a spreadsheet and put it anywhere, online or locally, that Pandas .read_csv() can open, and then use the constructor to customize the lookup class.

>>> l = lookup.Lookup(csv_url="http://path/to/your/sheet",
                      raw_name_col="something",
                      clean_name_col="something_else")

### Contents of HELP.txt

Below this point is auto documentation from the lookup class generated from help.py:

Help on module ctlookup.lookup in ctlookup:

NAME

ctlookup.lookup - Main module for CT Name Cleaner

FILE

/Applications/MAMP/htdocs/tdev/pyctnamecleaner/package/ctlookup/lookup.py

CLASSES

Lookup

class Lookup
Lookup class for CT place names, or any other DF for that matter

Methods defined here:

__init__(self, raw_name_col=’name’, clean_name_col=’real.town.name’, csv_url=None, use_inet_csv=False)
Constructor for Lookup

No need to use parameters unless you are specifying a different
source URL.

Parameters
———–
raw_name_col : string, optional
The name of the column with input names, like “New Preston”

Only use if you’re using a different source spreadsheet.

clean_name_col : string, optional
The name of the column with out names, like “Washington”

Only use if you’re using a different source spreadsheet.

csv_url : string, optional
A valid local file or remote url to use as an alternative
source spreadsheet.

use_inet_csv : boolean, optional
Force a reload of the spreadsheet from the web to reflect any
new additions since it was bundled with this python package.

Defaults to False. The list doesn’t change too much anymore.

clean(self, raw_name, error=None)
Get a clean place name (e.g. input “New Preston” and get
“Washington”)

Parameters
———-
raw_name : string
The input name of the place, such as a village or a
common misspelling of a town name

error : obj, optional
The default to return if no match is found

Defaults to None

Returns
——-
String or the value of None (or anything specified with the error
parameter) if no match is found

clean_col(self, series, error=None)
Clean a Pandas Series of place names

Parameters
———-
series : Pandas Series
A series containing place names that need to be cleaned

error : obj, optional
Value to use if no match is found for a given place.

Defaults to None

Notes
—–
Meant as a less opinionated version of clean_dataframe

clean_dataframe(self, df, town_col, error=None)
Clean an entire column of place names

Parameters
———-

df : Pandas DataFrame
Dataframe containing to clean

town_col : valid column label
Label of column containing town names to clean

error : obj, optional
Default value to use when no match is found.

Defaults to None

Notes
—–
I plan to deprecate this but leave it in place for
backward-compatibility. Use clean_col instead.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

ctnamecleaner-0.10.1.tar.gz (13.4 kB view hashes)

Uploaded Source

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page