Skip to main content

Python library to process company names

Project description

cleanco - clean organization names

Python package CodeQL

What is it / what does it do?

This is a Python package that processes company names, providing cleaned versions of the names by stripping away terms indicating organization type (such as "Ltd." or "Corp").

Using a database of organization type terms, It also provides an utility to deduce the type of organization, in terms of US/UK business entity types (ie. "limited liability company" or "non-profit").

Finally, the system uses the term information to suggest countries the organization could be established in. For example, the term "Oy" in company name suggests it is established in Finland, whereas "Ltd" in company name could mean UK, US or a number of other countries.

How do I install it?

Just use 'pip install cleanco' if you have pip installed (as most systems do). Or download the zip distribution from this site, unzip it and then:

  • Mac: cd into it, and enter sudo python setup.py install along with your system password.
  • Windows: Same thing but without sudo.

How does it work?

Let's look at some sample code. To get the base name of a business without legal suffix:

>>> from cleanco import basename
>>> business_name = "Some Big Pharma, LLC"
>>> basename(business_name)
>>> 'Some Big Pharma'

Note that sometimes a name may have e.g. two different suffixes after one another. The cleanco term data covers many of these, but you may want to run basename() twice on the name, just in case.

If you want to use your custom terms, please see custom_basename() that also provides some other ways to adjust how base name is produced.

To get the business type or country:

>>> from cleanco import typesources, matches
>>> classification_sources = typesources()
>>> matches("Some Big Pharma, LLC", classification_sources)
['Limited Liability Company']

To get the possible countries of jurisdiction:

>>> from cleanco import countrysources, matches
>>> classification_sources = countrysources()
>>> matches("Some Big Pharma, LLC", classification_sources) ´
['United States of America', 'Philippines']

Are there bugs?

See the issue tracker. If you find a bug or have enhancement suggestion or question, please file an issue and provide a PR if you can. For example, some of the company suffixes may be incorrect or there may be suffixes missing.

To run tests, simply install the package and run python setup.py test. To run tests on multiple Python versions, install tox and run it (see the provided tox.ini).

Special thanks to:

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

cleanco-2.2.tar.gz (11.1 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

cleanco-2.2-py3-none-any.whl (11.1 kB view details)

Uploaded Python 3

File details

Details for the file cleanco-2.2.tar.gz.

File metadata

  • Download URL: cleanco-2.2.tar.gz
  • Upload date:
  • Size: 11.1 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.3.0 pkginfo/1.7.0 requests/2.25.0 setuptools/58.2.0 requests-toolbelt/0.9.1 tqdm/4.56.0 CPython/3.9.0

File hashes

Hashes for cleanco-2.2.tar.gz
Algorithm Hash digest
SHA256 f16717234c1936866233283efc1387702dada7103fee8a6e712ca7caee2d182d
MD5 ccbf101c689cab48ad20a13f1660c50a
BLAKE2b-256 bbce4fbdf24370ff15faa47e41e3fe9b3e9b59799a6556535c8ba7998eab5c95

See more details on using hashes here.

File details

Details for the file cleanco-2.2-py3-none-any.whl.

File metadata

  • Download URL: cleanco-2.2-py3-none-any.whl
  • Upload date:
  • Size: 11.1 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.3.0 pkginfo/1.7.0 requests/2.25.0 setuptools/58.2.0 requests-toolbelt/0.9.1 tqdm/4.56.0 CPython/3.9.0

File hashes

Hashes for cleanco-2.2-py3-none-any.whl
Algorithm Hash digest
SHA256 0364fef77eea7a0db582e5b61435c59d86724c18fe0ebc5cb12ad37500f7183f
MD5 95f555318fd4e81d54a6a3db34cde4cb
BLAKE2b-256 8e0ef73be2f247f78cc2cab8ccd57518376bf3f5f195242d1bdd5baf4cc5daa1

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page