Skip to main content

Python library to process company names

Project description

cleanco - clean organization names

What is it / what does it do?

This is a Python package that processes company names, providing cleaned versions of the names by stripping away terms indicating organization type (such as "Ltd." or "Corp").

Using a database of organization type terms, It also provides an utility to deduce the type of organization, in terms of US/UK business entity types (ie. "limited liability company" or "non-profit").

Finally, the system uses the term information to suggest countries the organization could be established in. For example, the term "Oy" in company name suggests it is established in Finland, whereas "Ltd" in company name could mean UK, US or a number of other countries.

How do I install it?

Just use 'pip install cleanco' if you have pip installed (as most systems do). Or download the zip distribution from this site, unzip it and then:

  • Mac: cd into it, and enter sudo python setup.py install along with your system password.
  • Windows: Same thing but without sudo.

How does it work?

Let's look at some sample code. To get the base name of a business without legal suffix:

>>> from cleanco import prepare_terms, basename
>>> business_name = "Some Big Pharma, LLC"
>>> terms = prepare_terms()
>>> basename(name, terms, prefix=False, middle=False, suffix=True)
>>> 'Some Big Pharma'

Note that sometimes a name may have e.g. two different suffixes after one another. The cleanco term data covers many of these but you may want to run basename() twice, just in case.

To get the business type or country:

>>> from cleanco import typesources, matches
>>> classification_sources = typesources()
>>> matches("Some Big Pharma, LLC", classification_sources)
['Limited Liability Company']

To get the possible countries of jurisdiction:

>>> from cleanco import countrysources, matches
>>> classification_sources = countrysources()
>>> matches("Some Big Pharma, LLC", classification_sources) ´
['United States of America', 'Philippines']

The legacy (versions < 2.0) way can still be used, too, but will eventually be discontinued:

Import the utility class:

>>> from cleanco import cleanco

Prepare a string of a company name that you want to process:

>>> business_name = "Some Big Pharma, LLC"

Throw it into the instance:

>>> x = cleanco(business_name)

You can now get the company types:

>>> x.type()
['Limited Liability Company']

...the possible countries...

>>> x.country()
['United States of America', 'Philippines']

...and a clean version of the company name.

>>> x.clean_name()
'Some Big Pharma'

Are there bugs?

See the issue tracker. If you find a bug or have enhancement suggestion or question, please file an issue and provide a PR if you can. For example, some of the company suffixes may be incorrect or there may be suffixes missing.

To run tests, simply install the package and run python setup.py test. To run tests on multiple Python versions, install tox and run it (see the provided tox.ini).

Special thanks to:

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

cleanco-2.0.1.tar.gz (7.7 kB view details)

Uploaded Source

Built Distributions

cleanco-2.0.1-py3.7.egg (19.6 kB view details)

Uploaded Egg

cleanco-2.0.1-py3-none-any.whl (10.3 kB view details)

Uploaded Python 3

File details

Details for the file cleanco-2.0.1.tar.gz.

File metadata

  • Download URL: cleanco-2.0.1.tar.gz
  • Upload date:
  • Size: 7.7 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/1.12.1 pkginfo/1.4.2 requests/2.22.0 setuptools/41.2.0 requests-toolbelt/0.8.0 tqdm/4.28.1 CPython/3.7.3

File hashes

Hashes for cleanco-2.0.1.tar.gz
Algorithm Hash digest
SHA256 dd2e29845bdfbc195d80f904f09410639a199da90569a1fcc99df9d3153b5d78
MD5 5f491998715706d7b03cf74b18406d29
BLAKE2b-256 78186decf55b4c9e9b14b44537b049a2602bae07bb7dd04cb6dd149f50d093b8

See more details on using hashes here.

File details

Details for the file cleanco-2.0.1-py3.7.egg.

File metadata

  • Download URL: cleanco-2.0.1-py3.7.egg
  • Upload date:
  • Size: 19.6 kB
  • Tags: Egg
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/1.12.1 pkginfo/1.4.2 requests/2.22.0 setuptools/41.2.0 requests-toolbelt/0.8.0 tqdm/4.28.1 CPython/3.7.3

File hashes

Hashes for cleanco-2.0.1-py3.7.egg
Algorithm Hash digest
SHA256 69ca33990a3dac654f1ef89a2ed207ce254eddb9b134ed54acaa9383f38c5eb5
MD5 1ab03b7ad3ba781d8de4a460afa86ba9
BLAKE2b-256 b7defbdb41cfe676c8c629177c2df8bf889052dda926d124849cc34987a0b7ff

See more details on using hashes here.

File details

Details for the file cleanco-2.0.1-py3-none-any.whl.

File metadata

  • Download URL: cleanco-2.0.1-py3-none-any.whl
  • Upload date:
  • Size: 10.3 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/1.12.1 pkginfo/1.4.2 requests/2.22.0 setuptools/41.2.0 requests-toolbelt/0.8.0 tqdm/4.28.1 CPython/3.7.3

File hashes

Hashes for cleanco-2.0.1-py3-none-any.whl
Algorithm Hash digest
SHA256 5a6191630bf728d4a72af9b48991750f0e0d1c0c5bed6c26b7d3e499d24e61d2
MD5 321b09e38499db1995c4f5837a885c84
BLAKE2b-256 69ebb5ff00ab1f54d2436d395314a7b7011e34aaf3fd9643e1973b0d57580657

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page