ctnamecleaner

Replace village names and commonly-misspelled Connecticut town names with real town/city names.

Project description

# CT Name Cleaner

Resolve village and coloquial Connecticut town names, as well as common misspellings of Connecticut town names to their official town names.

This is based on an R package of the same name by my colleague Andrew Ba Tran.

This installs a command line script, ctclean, as well as a library particularly meant for use within Jupyter notebooks.

by Jake Kara, jake@jakekara.com

### Latest version

0.10.1

### Installation

pip install ctnamecleaner

### Command line util

Usage:

$ ctclean NewPreston WASHINGTON $ ctclean “New Preston” WASHINGTON

When nothing is found, return None:

$ ctclean NotGonnaFindItsVille None

Set a custom value to return on error with the –error or -e flag:

$ ctclean NotGonnaFindItsVille –error “Ruh Roh” Ruh Roh

### Use with Pandas dataframes

See HELP.txt in this directory and the Notebook in the demo/ folder in this repo for an example of translating an entire column with the clean, clean_col and the clean_dataframe() method. clean_dataframe uses pandas’ DataFrame.join() method, so it’s faster than using the cean() method and applying it with a lambda function yourself.

### Extending with other data

Not in CT? Want to map other things? Just make a spreadsheet and put it anywhere, online or locally, that Pandas .read_csv() can open, and then use the constructor to customize the lookup class.

>>> l = lookup.Lookup(csv_url="http://path/to/your/sheet",
                      raw_name_col="something",
                      clean_name_col="something_else")

### Contents of HELP.txt

Below this point is auto documentation from the lookup class generated from help.py:

Help on module ctlookup.lookup in ctlookup:

NAME

ctlookup.lookup - Main module for CT Name Cleaner

FILE

/Applications/MAMP/htdocs/tdev/pyctnamecleaner/package/ctlookup/lookup.py

CLASSES

Lookup

class Lookup: Lookup class for CT place names, or any other DF for that matter

Methods defined here:

__init__(self, raw_name_col=’name’, clean_name_col=’real.town.name’, csv_url=None, use_inet_csv=False)

Constructor for Lookup

No need to use parameters unless you are specifying a different

source URL.

Parameters

———–

raw_name_col : string, optional

The name of the column with input names, like “New Preston”

Only use if you’re using a different source spreadsheet.

clean_name_col : string, optional

The name of the column with out names, like “Washington”

Only use if you’re using a different source spreadsheet.

csv_url : string, optional

A valid local file or remote url to use as an alternative

source spreadsheet.

use_inet_csv : boolean, optional

Force a reload of the spreadsheet from the web to reflect any

new additions since it was bundled with this python package.

Defaults to False. The list doesn’t change too much anymore.

clean(self, raw_name, error=None)

Get a clean place name (e.g. input “New Preston” and get

“Washington”)

Parameters

———-

raw_name : string

The input name of the place, such as a village or a

common misspelling of a town name

error : obj, optional

The default to return if no match is found

Defaults to None

Returns

——-

String or the value of None (or anything specified with the error

parameter) if no match is found

clean_col(self, series, error=None)

Clean a Pandas Series of place names

Parameters

———-

series : Pandas Series

A series containing place names that need to be cleaned

error : obj, optional

Value to use if no match is found for a given place.

Defaults to None

Notes

—–

Meant as a less opinionated version of clean_dataframe

clean_dataframe(self, df, town_col, error=None)

Clean an entire column of place names

Parameters

———-

df : Pandas DataFrame

Dataframe containing to clean

town_col : valid column label

Label of column containing town names to clean

error : obj, optional

Default value to use when no match is found.

Defaults to None

Notes

—–

I plan to deprecate this but leave it in place for

backward-compatibility. Use clean_col instead.

Project details

Release history Release notifications | RSS feed

This version

0.10.1

Mar 3, 2018

0.10

Sep 2, 2016

0.9

Sep 2, 2016

0.8

Aug 29, 2016

0.7

Aug 29, 2016

0.6

Aug 29, 2016

0.5

Aug 26, 2016

0.4

Aug 25, 2016

0.3

Aug 25, 2016

0.2

Aug 25, 2016

0.1

Aug 25, 2016

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

ctnamecleaner-0.10.1.tar.gz (13.4 kB view details)

Uploaded Mar 3, 2018 Source

File details

Details for the file ctnamecleaner-0.10.1.tar.gz.

File metadata

Download URL: ctnamecleaner-0.10.1.tar.gz
Upload date: Mar 3, 2018
Size: 13.4 kB
Tags: Source
Uploaded using Trusted Publishing? No

File hashes

Hashes for ctnamecleaner-0.10.1.tar.gz
Algorithm	Hash digest
SHA256	`3f14b09b5e9d1ff20a854c12e130d4d8a4bb6f0fc23de6e2105f6b315c6ff2c8`
MD5	`9065a428d0731eff2ecf148c523c8dfb`
BLAKE2b-256	`93196ea8bdcb804bb861aeb88685233cc24ee4934fc1bdcbc4df682596c35434`

See more details on using hashes here.

ctnamecleaner 0.10.1

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Project description

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Release history Release notifications | RSS feed

Download files

Source Distribution

File details

File metadata

File hashes