Skip to main content

Scrape data from allabolag.se.

Project description

This is a scraper for collecting data from allabolag.se. It has no formal relationship with the site.

It is written and maintained for Newsworthy, but could possibly come in handy for other people as well.

Installing

pip install allabolag

Example usage

from allabolag import Company

company = Company("559071-2807")

# show all available data about the company in a raw...
print(company.raw_data)

# ...or cleaned format
print(company.data)

And you can iterate the list of recent liquidations.

from allabolag import iter_liquidated_companies

for company in iter_liquidated_companies(until="2019-06-01"):
  print(company)

Use AWS API Gateway to rotate IP addresses

from allabolag.request_client import AWSGatewayRequestClient
company = Company("559071-2807", RequestClient=AWSGatewayRequestClient)

for company in iter_liquidated_companies(until="2019-06-01",request_client=AWSGatewayRequestClient()):
  print(company)

Developing

To run tests:

python3 -m pytest

Deployment

To deploy a new version to PyPi:

  1. Update Changelog below.

  2. Update version in setup.py

  3. Build: python3 setup.py sdist bdist_wheel

  4. Upload: python3 -m twine upload dist/allabolag-X.Y.X*

…assuming you have Twine installed (pip install twine) and configured.

Changelog

  • 0.7 - Add AWSGatewayRequestClient to enable request through rotating IP with AWS API Gateway

  • 0.6.1 - Bug fix: Actually use header in requests.

  • 0.6.0 - Add headers to request - Minor dependency updates - Use logger for debugging

  • 0.5.1 - Fix return type for Company.liquidation

  • 0.5.0 - Add Company.liquidation

  • 0.4.1 - Remove debug output - Don’t crash when we reach the end of a list

  • 0.4.0 - Add option to start from page N - Add custom exception for missing company

  • 0.3.1 - Add cache for company data

  • 0.3.0 - Add Company.remarks (a list of remarks, e.g. “Konkurs”)

  • 0.2.1 - Make iter_list() more generic, by accepting the while url fragment

  • 0.2.0 - Add iter_list() function

  • 0.1.7

    • Bug fix: Add encoding for Python 2.7

  • 0.1.6

    • Fixes bug when company has remark about Svensk Handels Varningslistan

  • 0.1.5

    • Make Python 2.7 compatible.

  • 0.1.4

    • Updating _iter_liquidate_companies to handle rebuilt site.

  • 0.1.3

    • Bug fixes

  • 0.1.0

    • First version

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

allabolag-0.7.0.tar.gz (11.7 kB view details)

Uploaded Source

Built Distribution

allabolag-0.7.0-py3-none-any.whl (11.1 kB view details)

Uploaded Python 3

File details

Details for the file allabolag-0.7.0.tar.gz.

File metadata

  • Download URL: allabolag-0.7.0.tar.gz
  • Upload date:
  • Size: 11.7 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/5.1.0 CPython/3.10.12

File hashes

Hashes for allabolag-0.7.0.tar.gz
Algorithm Hash digest
SHA256 d5e9c281d4d312717514771beb1c6843bfeba6b5f4a4946bca3871313fe3d3c5
MD5 c36f4cb89ec25adbd8dc6bdd3c96451b
BLAKE2b-256 4c05e9caa144cf9691abbcc491d0da778d9034b39278f68a47ae53bed73199c5

See more details on using hashes here.

File details

Details for the file allabolag-0.7.0-py3-none-any.whl.

File metadata

  • Download URL: allabolag-0.7.0-py3-none-any.whl
  • Upload date:
  • Size: 11.1 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/5.1.0 CPython/3.10.12

File hashes

Hashes for allabolag-0.7.0-py3-none-any.whl
Algorithm Hash digest
SHA256 33c75c949927b4e85371cf718120b08a4917e97bfad4966e740e955cc64395f5
MD5 2bb8739f2473d3feb1c332b455ad2915
BLAKE2b-256 a47ac4f2a43a33c4700739651245d9b6e9d82538b3420ae837cef370a7484f45

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page