A package to scrape Companies House data
Project description
Coho Spider
This is a scraper written by @haaroon and @rachchan that can be used to scrape data from Companies House.
Requirements
- Python >= 3.7
- Scrapy >= 2.8.0
- Companies House Developer Hub Account/REST API Key
- List of Company Numbers from Companies House to scrape (optional)
We have included a list of companies if you do not have any, but it may not be up to date.
You can get an API key from the Companies House website.
You can also contact Companies House directly, their support and developer teams are extremely friendly and can get you direct access to bulk read only data.
Installation Guide
First, open up a new terminal and install our scraper.
pip install cohospider
Next, open up your python terminal of choice, pick a spider to use and enter the following commands.
Scraping Persons With Significant Control
If you would like to obtain JSON data on a company's Persons With Significant Control, you can follow the following commands:
from spiders import CohoPscSpiderRun
With default Company data
psc_runner = CohoPscSpiderRun(key="INSERT_API_KEY_HERE")
OR with your own company data
psc_runner = CohoPscSpiderRun(key="INSERT_API_KEY_HERE", company_numbers=[COMPANY_NUMBER1, COMPANY_NUMBER2, etc..])
psc_runner.start()
The output will follow this JSON format: https://developer-specs.company-information.service.gov.uk/companies-house-public-data-api/resources/list?v=latest
Scraping Directors
If you would like to obtain JSON data on a company's Directors, you can follow the following commands:
from spiders import CohoOfficerSpiderRun
With default Company data
officer_runner = CohoOfficerSpiderRun(key="INSERT_API_KEY_HERE")
OR with your own company data
officer_runner = CohoOfficerSpiderRun(key="INSERT_API_KEY_HERE", company_numbers=[COMPANY_NUMBER1, COMPANY_NUMBER2, etc..])
officer_runner.start()
The output will follow this JSON format: https://developer-specs.company-information.service.gov.uk/companies-house-public-data-api/resources/officerlist?v=latest
Scraping for our example notebook
If you are following our example notebook/blog on Companies House in Raphtory, you will need to use our barbara-spider:
from spiders import BarbaraSpiderRun
barbara_runner = CohoOfficerSpiderRun(key="INSERT_API_KEY_HERE")
barbara_runner.start()
The output will follow this JSON format: https://developer-specs.company-information.service.gov.uk/companies-house-public-data-api/resources/officerlist?v=latest
All these runners produce a data
folder in your root directory, where you can find all your JSON data, ready to be used in Raphtory for analysis.
License
THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
File details
Details for the file cohospider-0.0.1.tar.gz
.
File metadata
- Download URL: cohospider-0.0.1.tar.gz
- Upload date:
- Size: 59.3 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/4.0.2 CPython/3.11.3
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 4979b1de56f889090d9dc382bbceac9273a820b24fb2f7075a9bd6e1b3c06258 |
|
MD5 | 165b35536e10279d9d92f6015f883e67 |
|
BLAKE2b-256 | d6c0251f4417f24bb70b8fe536f3d76dd552b58deaf2812f167fff1a92d6c135 |
File details
Details for the file cohospider-0.0.1-py3-none-any.whl
.
File metadata
- Download URL: cohospider-0.0.1-py3-none-any.whl
- Upload date:
- Size: 60.3 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/4.0.2 CPython/3.11.3
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | bceb332d5f7d8f979672f3e18124331eaaae1b60c5932c277aa3d1faad48003e |
|
MD5 | 43298d34cdfee20e70bd6978f3c0db00 |
|
BLAKE2b-256 | dcdda3091142c533b70e44ae44f8d0df06d119c5787474c7a14631f40d178b59 |