Scrape data from allabolag.se.
Project description
This is a scraper for collecting data from allabolag.se. It has no formal relationship with the site.
It is written and maintained for Newsworthy, but could possibly come in handy for other people as well.
Installing
pip install allabolag
Example usage
from allabolag import Company
company = Company("559071-2807")
# show all available data about the company in a raw...
print(company.raw_data)
# ...or cleaned format
print(company.data)
And you can iterate the list of recent liquidations.
from allabolag import iter_liquidated_companies
for company in iter_liquidated_companies(until="2019-06-01"):
print(company)
Use AWS API Gateway to rotate IP addresses
from allabolag import AWSGatewayRequestClient
request_client = AWSGatewayRequestClient()
company = Company("559071-2807", request_client=request_client)
for company in iter_liquidated_companies(until="2019-06-01",request_client=request_client):
print(company)
Developing
To run tests:
python3 -m pytest
Deployment
To deploy a new version to PyPi:
Update Changelog below.
Update version in setup.py
Build: python3 setup.py sdist bdist_wheel
Upload: python3 -m twine upload dist/allabolag-X.Y.X*
…assuming you have Twine installed (pip install twine) and configured.
Changelog
0.9.0 - First version of an updated scraper that handles the new site structure, released in October 2024. Some data (such as “Händelser”) is still missing. “Topplistor” is also not supported. - NB: The data structure in .raw_data and .data is different than before.
0.8.0 - Handle Koncernredovisning - Make RequestClient Python 3.8 compatible
0.7.1 - Update request client to use inited client, rather than class
0.7.0 - Add AWSGatewayRequestClient to enable request through rotating IP with AWS API Gateway
0.6.1 - Bug fix: Actually use header in requests.
0.6.0 - Add headers to request - Minor dependency updates - Use logger for debugging
0.5.1 - Fix return type for Company.liquidation
0.5.0 - Add Company.liquidation
0.4.1 - Remove debug output - Don’t crash when we reach the end of a list
0.4.0 - Add option to start from page N - Add custom exception for missing company
0.3.1 - Add cache for company data
0.3.0 - Add Company.remarks (a list of remarks, e.g. “Konkurs”)
0.2.1 - Make iter_list() more generic, by accepting the while url fragment
0.2.0 - Add iter_list() function
0.1.7
Bug fix: Add encoding for Python 2.7
0.1.6
Fixes bug when company has remark about Svensk Handels Varningslistan
0.1.5
Make Python 2.7 compatible.
0.1.4
Updating _iter_liquidate_companies to handle rebuilt site.
0.1.3
Bug fixes
0.1.0
First version
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
File details
Details for the file allabolag-0.9.0.tar.gz
.
File metadata
- Download URL: allabolag-0.9.0.tar.gz
- Upload date:
- Size: 8.3 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/5.1.1 CPython/3.10.12
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 3a98044afb3f534bd71f62607461144c112563ac0c5af58b1623895c25aca049 |
|
MD5 | d758de5fbdfa29c34e74108d3f442654 |
|
BLAKE2b-256 | 6df3a3976f235cdcfd4fdb808572695e395d71b8b2cac83912b5c997c19858b7 |
File details
Details for the file allabolag-0.9.0-py3-none-any.whl
.
File metadata
- Download URL: allabolag-0.9.0-py3-none-any.whl
- Upload date:
- Size: 9.3 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/5.1.1 CPython/3.10.12
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 0564d6a74643f34ecbcedd5422e1fa2a7d833be7878a09543318a4593e6d4e03 |
|
MD5 | 4a019cb472470418e3607fa0bac6909d |
|
BLAKE2b-256 | 77d36c3240047290c02b5a588b4c2e01163f5db6e982b60b33972e9e938ff863 |