Replace village names and commonly-misspelled Connecticut town names with real town/city names.
Project description
# CT Name Cleaner
Resolve village and coloquial Connecticut town names, as well as common misspellings of Connecticut town names to their official town names.
This is based on an R package of the same name by my colleague Andrew Ba Tran.
This installs a command line script, ctclean, as well as a library particularly meant for use within Jupyter notebooks.
by Jake Kara, jake@jakekara.com
### Latest version
0.10.1
### Installation
pip install ctnamecleaner
### Command line util
Usage:
$ ctclean NewPreston WASHINGTON $ ctclean “New Preston” WASHINGTON
When nothing is found, return None:
$ ctclean NotGonnaFindItsVille None
Set a custom value to return on error with the –error or -e flag:
$ ctclean NotGonnaFindItsVille –error “Ruh Roh” Ruh Roh
### Use with Pandas dataframes
See HELP.txt in this directory and the Notebook in the demo/ folder in this repo for an example of translating an entire column with the clean, clean_col and the clean_dataframe() method. clean_dataframe uses pandas’ DataFrame.join() method, so it’s faster than using the cean() method and applying it with a lambda function yourself.
### Extending with other data
Not in CT? Want to map other things? Just make a spreadsheet and put it anywhere, online or locally, that Pandas .read_csv() can open, and then use the constructor to customize the lookup class.
>>> l = lookup.Lookup(csv_url="http://path/to/your/sheet", raw_name_col="something", clean_name_col="something_else")
### Contents of HELP.txt
Below this point is auto documentation from the lookup class generated from help.py:
Help on module ctlookup.lookup in ctlookup:
- NAME
ctlookup.lookup - Main module for CT Name Cleaner
- FILE
/Applications/MAMP/htdocs/tdev/pyctnamecleaner/package/ctlookup/lookup.py
- CLASSES
Lookup
- class Lookup
- Lookup class for CT place names, or any other DF for that matterMethods defined here:__init__(self, raw_name_col=’name’, clean_name_col=’real.town.name’, csv_url=None, use_inet_csv=False)Constructor for LookupNo need to use parameters unless you are specifying a differentsource URL.Parameters———–raw_name_col : string, optionalThe name of the column with input names, like “New Preston”Only use if you’re using a different source spreadsheet.clean_name_col : string, optionalThe name of the column with out names, like “Washington”Only use if you’re using a different source spreadsheet.csv_url : string, optionalA valid local file or remote url to use as an alternativesource spreadsheet.use_inet_csv : boolean, optionalForce a reload of the spreadsheet from the web to reflect anynew additions since it was bundled with this python package.Defaults to False. The list doesn’t change too much anymore.clean(self, raw_name, error=None)Get a clean place name (e.g. input “New Preston” and get“Washington”)Parameters———-raw_name : stringThe input name of the place, such as a village or acommon misspelling of a town nameerror : obj, optionalThe default to return if no match is foundDefaults to NoneReturns——-String or the value of None (or anything specified with the errorparameter) if no match is foundclean_col(self, series, error=None)Clean a Pandas Series of place namesParameters———-series : Pandas SeriesA series containing place names that need to be cleanederror : obj, optionalValue to use if no match is found for a given place.Defaults to NoneNotes—–Meant as a less opinionated version of clean_dataframeclean_dataframe(self, df, town_col, error=None)Clean an entire column of place namesParameters———-df : Pandas DataFrameDataframe containing to cleantown_col : valid column labelLabel of column containing town names to cleanerror : obj, optionalDefault value to use when no match is found.Defaults to NoneNotes—–I plan to deprecate this but leave it in place forbackward-compatibility. Use clean_col instead.
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
File details
Details for the file ctnamecleaner-0.10.1.tar.gz
.
File metadata
- Download URL: ctnamecleaner-0.10.1.tar.gz
- Upload date:
- Size: 13.4 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 3f14b09b5e9d1ff20a854c12e130d4d8a4bb6f0fc23de6e2105f6b315c6ff2c8 |
|
MD5 | 9065a428d0731eff2ecf148c523c8dfb |
|
BLAKE2b-256 | 93196ea8bdcb804bb861aeb88685233cc24ee4934fc1bdcbc4df682596c35434 |