Skip to main content

Declarative web parsers

Project description

Soup Stars

Build Status

Version Python

Soup Stars is a framework for building web parsers with Python. It is designed to make building, deploying, and scheduling web parsers easier by simplifying what you need to get started.

Quickstart

pip install soupstars

The client is also available as a docker image.

docker pull soupstars/client

Building a parser

Create a new parser using the soupstars command. The create command will use a template parser.

soupstars create -m myparser.py

Parsers are simple python modules.

cat myparser.py

Notice that the only set up required is the special parse decorator and a variable named url for the web page you want to parse.

from soupstars import parse

url = "https://corbettanalytics.com/"

@parse
def h1(soup):
    return soup.h1.text

You can test that the parser functions correctly.

soupstars run -m myparser.py

Use soupstars --help to see a full list of available commands.

More documentation is available here.

Development

Start the docker services.

docker-compose up -d

Set up the containers.

docker-compose exec web flask s3 mb soupstars-archive
docker-compose exec web flask db upgrade
docker-compose exec web flask seed schedules
docker-compose exec web flask seed plans
docker-compose exec web flask seed user
docker-compose exec web flask seed parsers

Run the tests.

docker-compose run --rm client pytest -vs

Releasing

New tags that pass on CI will automatically be pushed to docker hub.

To deploy to PyPI requires manually running the following commands.

pip3 install twine
python3 setup.py sdist bdist_wheel
twine upload dist/*

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

soupstars-2.11.19.tar.gz (16.3 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

soupstars-2.11.19-py3-none-any.whl (21.3 kB view details)

Uploaded Python 3

File details

Details for the file soupstars-2.11.19.tar.gz.

File metadata

  • Download URL: soupstars-2.11.19.tar.gz
  • Upload date:
  • Size: 16.3 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/1.13.0 pkginfo/1.5.0.1 requests/2.22.0 setuptools/41.0.1 requests-toolbelt/0.9.1 tqdm/4.32.2 CPython/3.7.4

File hashes

Hashes for soupstars-2.11.19.tar.gz
Algorithm Hash digest
SHA256 ccf54e33440a3bad698a05cf669fa24e8ec5a95b70cca00dbb14c7b88c56896d
MD5 b79bdb71841fed3c37cd1193c27e6821
BLAKE2b-256 b5dfba698348d7f53d69777591c6dd20d7f2142169d97b60fee59544b56cdeca

See more details on using hashes here.

File details

Details for the file soupstars-2.11.19-py3-none-any.whl.

File metadata

  • Download URL: soupstars-2.11.19-py3-none-any.whl
  • Upload date:
  • Size: 21.3 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/1.13.0 pkginfo/1.5.0.1 requests/2.22.0 setuptools/41.0.1 requests-toolbelt/0.9.1 tqdm/4.32.2 CPython/3.7.4

File hashes

Hashes for soupstars-2.11.19-py3-none-any.whl
Algorithm Hash digest
SHA256 2c2c7448772eb8a332f7030636ae7bcc188bbd8b12122041904a235c37eb0bba
MD5 da9c02f03c40083ffa965f62a710d9c8
BLAKE2b-256 1c04117496a7fb291e401b4c0c349adc6b7651bb60146d3337781b18c2fe5431

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page