Replace village names and commonly-misspelled Connecticut town names with real town/city names.
# CT Name Cleaner
Resolve village and coloquial Connecticut town names, as well as common misspellings of Connecticut town names to their official town names.
This is based on an R package of the same name by my colleague Andrew Ba Tran.
This installs a command line script, ctclean, as well as a library particularly meant for use within Jupyter notebooks.
by Jake Kara, firstname.lastname@example.org
### Latest version
pip install ctnamecleaner
### Command line util
$ ctclean NewPreston WASHINGTON $ ctclean “New Preston” WASHINGTON
When nothing is found, return None:
$ ctclean NotGonnaFindItsVille None
Set a custom value to return on error with the –error or -e flag:
$ ctclean NotGonnaFindItsVille –error “Ruh Roh” Ruh Roh
### Use with Pandas dataframes
See HELP.txt in this directory and the Notebook in the demo/ folder in this repo for an example of translating an entire column with the clean, clean_col and the clean_dataframe() method. clean_dataframe uses pandas’ DataFrame.join() method, so it’s faster than using the cean() method and applying it with a lambda function yourself.
### Extending with other data
Not in CT? Want to map other things? Just make a spreadsheet and put it anywhere, online or locally, that Pandas .read_csv() can open, and then use the constructor to customize the lookup class.
>>> l = lookup.Lookup(csv_url="http://path/to/your/sheet", raw_name_col="something", clean_name_col="something_else")
### Contents of HELP.txt
Below this point is auto documentation from the lookup class generated from help.py:
Help on module ctlookup.lookup in ctlookup:
- ctlookup.lookup - Main module for CT Name Cleaner
- class Lookup
- Lookup class for CT place names, or any other DF for that matterMethods defined here:__init__(self, raw_name_col=’name’, clean_name_col=’real.town.name’, csv_url=None, use_inet_csv=False)Constructor for LookupNo need to use parameters unless you are specifying a differentsource URL.Parameters———–raw_name_col : string, optionalThe name of the column with input names, like “New Preston”Only use if you’re using a different source spreadsheet.clean_name_col : string, optionalThe name of the column with out names, like “Washington”Only use if you’re using a different source spreadsheet.csv_url : string, optionalA valid local file or remote url to use as an alternativesource spreadsheet.use_inet_csv : boolean, optionalForce a reload of the spreadsheet from the web to reflect anynew additions since it was bundled with this python package.Defaults to False. The list doesn’t change too much anymore.clean(self, raw_name, error=None)Get a clean place name (e.g. input “New Preston” and get“Washington”)Parameters———-raw_name : stringThe input name of the place, such as a village or acommon misspelling of a town nameerror : obj, optionalThe default to return if no match is foundDefaults to NoneReturns——-String or the value of None (or anything specified with the errorparameter) if no match is foundclean_col(self, series, error=None)Clean a Pandas Series of place namesParameters———-series : Pandas SeriesA series containing place names that need to be cleanederror : obj, optionalValue to use if no match is found for a given place.Defaults to NoneNotes—–Meant as a less opinionated version of clean_dataframeclean_dataframe(self, df, town_col, error=None)Clean an entire column of place namesParameters———-df : Pandas DataFrameDataframe containing to cleantown_col : valid column labelLabel of column containing town names to cleanerror : obj, optionalDefault value to use when no match is found.Defaults to NoneNotes—–I plan to deprecate this but leave it in place forbackward-compatibility. Use clean_col instead.