Skip to main content

Scrape data from allabolag.se.

Project description

This is a scraper for collecting data from allabolag.se. It has no formal relationship with the site.

It is written and maintained for Newsworthy, but could possibly come in handy for other people as well.

Installing

pip install allabolag

Example usage

from allabolag import Company

company = Company("559071-2807")

# show all available data about the company in a raw...
print(company.raw_data)

# ...or cleaned format
print(company.data)

And you can iterate the list of recent liquidations.

from allabolag import iter_liquidated_companies

for company in iter_liquidated_companies(until="2019-06-01"):
  print(company)

Use AWS API Gateway to rotate IP addresses

from allabolag import AWSGatewayRequestClient
request_client = AWSGatewayRequestClient()
company = Company("559071-2807", request_client=request_client)

for company in iter_liquidated_companies(until="2019-06-01",request_client=request_client):
  print(company)

Developing

To run tests:

python3 -m pytest

Deployment

To deploy a new version to PyPi:

  1. Update Changelog below.

  2. Update version in setup.py

  3. Build: python3 setup.py sdist bdist_wheel

  4. Upload: python3 -m twine upload dist/allabolag-X.Y.X*

…assuming you have Twine installed (pip install twine) and configured.

Changelog

  • 0.9.0 - First version of an updated scraper that handles the new site structure, released in October 2024. Some data (such as “Händelser”) is still missing. “Topplistor” is also not supported. - NB: The data structure in .raw_data and .data is different than before.

  • 0.8.0 - Handle Koncernredovisning - Make RequestClient Python 3.8 compatible

  • 0.7.1 - Update request client to use inited client, rather than class

  • 0.7.0 - Add AWSGatewayRequestClient to enable request through rotating IP with AWS API Gateway

  • 0.6.1 - Bug fix: Actually use header in requests.

  • 0.6.0 - Add headers to request - Minor dependency updates - Use logger for debugging

  • 0.5.1 - Fix return type for Company.liquidation

  • 0.5.0 - Add Company.liquidation

  • 0.4.1 - Remove debug output - Don’t crash when we reach the end of a list

  • 0.4.0 - Add option to start from page N - Add custom exception for missing company

  • 0.3.1 - Add cache for company data

  • 0.3.0 - Add Company.remarks (a list of remarks, e.g. “Konkurs”)

  • 0.2.1 - Make iter_list() more generic, by accepting the while url fragment

  • 0.2.0 - Add iter_list() function

  • 0.1.7

    • Bug fix: Add encoding for Python 2.7

  • 0.1.6

    • Fixes bug when company has remark about Svensk Handels Varningslistan

  • 0.1.5

    • Make Python 2.7 compatible.

  • 0.1.4

    • Updating _iter_liquidate_companies to handle rebuilt site.

  • 0.1.3

    • Bug fixes

  • 0.1.0

    • First version

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

allabolag-0.9.0.tar.gz (8.3 kB view details)

Uploaded Source

Built Distribution

allabolag-0.9.0-py3-none-any.whl (9.3 kB view details)

Uploaded Python 3

File details

Details for the file allabolag-0.9.0.tar.gz.

File metadata

  • Download URL: allabolag-0.9.0.tar.gz
  • Upload date:
  • Size: 8.3 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/5.1.1 CPython/3.10.12

File hashes

Hashes for allabolag-0.9.0.tar.gz
Algorithm Hash digest
SHA256 3a98044afb3f534bd71f62607461144c112563ac0c5af58b1623895c25aca049
MD5 d758de5fbdfa29c34e74108d3f442654
BLAKE2b-256 6df3a3976f235cdcfd4fdb808572695e395d71b8b2cac83912b5c997c19858b7

See more details on using hashes here.

File details

Details for the file allabolag-0.9.0-py3-none-any.whl.

File metadata

  • Download URL: allabolag-0.9.0-py3-none-any.whl
  • Upload date:
  • Size: 9.3 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/5.1.1 CPython/3.10.12

File hashes

Hashes for allabolag-0.9.0-py3-none-any.whl
Algorithm Hash digest
SHA256 0564d6a74643f34ecbcedd5422e1fa2a7d833be7878a09543318a4593e6d4e03
MD5 4a019cb472470418e3607fa0bac6909d
BLAKE2b-256 77d36c3240047290c02b5a588b4c2e01163f5db6e982b60b33972e9e938ff863

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page