Skip to main content

A package to scrape Companies House data

Project description

Coho Spider

This is a scraper written by @haaroon and @rachchan that can be used to scrape data from Companies House.

Requirements

  • Python >= 3.7
  • Scrapy >= 2.8.0
  • Companies House Developer Hub Account/REST API Key
  • List of Company Numbers from Companies House to scrape (optional)

We have included a list of companies if you do not have any, but it may not be up to date.

You can get an API key from the Companies House website.

You can also contact Companies House directly, their support and developer teams are extremely friendly and can get you direct access to bulk read only data.

Installation Guide

First, open up a new terminal and install our scraper.

pip install cohospider

Next, open up your python terminal of choice, pick a spider to use and enter the following commands.

Scraping Persons With Significant Control

If you would like to obtain JSON data on a company's Persons With Significant Control, you can follow the following commands:

from spiders import CohoPscSpiderRun

With default Company data

psc_runner = CohoPscSpiderRun(key="INSERT_API_KEY_HERE")

OR with your own company data

psc_runner = CohoPscSpiderRun(key="INSERT_API_KEY_HERE", company_numbers=[COMPANY_NUMBER1, COMPANY_NUMBER2, etc..])
psc_runner.start()

The output will follow this JSON format: https://developer-specs.company-information.service.gov.uk/companies-house-public-data-api/resources/list?v=latest

Scraping Directors

If you would like to obtain JSON data on a company's Directors, you can follow the following commands:

from spiders import CohoOfficerSpiderRun

With default Company data

officer_runner = CohoOfficerSpiderRun(key="INSERT_API_KEY_HERE")

OR with your own company data

officer_runner = CohoOfficerSpiderRun(key="INSERT_API_KEY_HERE", company_numbers=[COMPANY_NUMBER1, COMPANY_NUMBER2, etc..])
officer_runner.start()

The output will follow this JSON format: https://developer-specs.company-information.service.gov.uk/companies-house-public-data-api/resources/officerlist?v=latest

Scraping for our example notebook

If you are following our example notebook/blog on Companies House in Raphtory, you will need to use our barbara-spider:

from spiders import BarbaraSpiderRun
barbara_runner = CohoOfficerSpiderRun(key="INSERT_API_KEY_HERE")
barbara_runner.start()

The output will follow this JSON format: https://developer-specs.company-information.service.gov.uk/companies-house-public-data-api/resources/officerlist?v=latest

All these runners produce a data folder in your root directory, where you can find all your JSON data, ready to be used in Raphtory for analysis.

License

THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

cohospider-0.0.1.tar.gz (59.3 kB view details)

Uploaded Source

Built Distribution

cohospider-0.0.1-py3-none-any.whl (60.3 kB view details)

Uploaded Python 3

File details

Details for the file cohospider-0.0.1.tar.gz.

File metadata

  • Download URL: cohospider-0.0.1.tar.gz
  • Upload date:
  • Size: 59.3 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.2 CPython/3.11.3

File hashes

Hashes for cohospider-0.0.1.tar.gz
Algorithm Hash digest
SHA256 4979b1de56f889090d9dc382bbceac9273a820b24fb2f7075a9bd6e1b3c06258
MD5 165b35536e10279d9d92f6015f883e67
BLAKE2b-256 d6c0251f4417f24bb70b8fe536f3d76dd552b58deaf2812f167fff1a92d6c135

See more details on using hashes here.

File details

Details for the file cohospider-0.0.1-py3-none-any.whl.

File metadata

  • Download URL: cohospider-0.0.1-py3-none-any.whl
  • Upload date:
  • Size: 60.3 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.2 CPython/3.11.3

File hashes

Hashes for cohospider-0.0.1-py3-none-any.whl
Algorithm Hash digest
SHA256 bceb332d5f7d8f979672f3e18124331eaaae1b60c5932c277aa3d1faad48003e
MD5 43298d34cdfee20e70bd6978f3c0db00
BLAKE2b-256 dcdda3091142c533b70e44ae44f8d0df06d119c5787474c7a14631f40d178b59

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page