Skip to main content

Declarative web parsers

Project description

Soup Stars

Build Status

Version Python

Soup Stars is a framework for building web parsers with Python. It is designed to make building, deploying, and scheduling web parsers easier by simplifying what you need to get started.

Quickstart

pip install soupstars

The client is also available as a docker image.

docker pull soupstars/client

Building a parser

Create a new parser using the soupstars command. The create command will use a template parser.

soupstars create -m myparser.py

Parsers are simple python modules.

cat myparser.py

Notice that the only set up required is the special parse decorator and a variable named url for the web page you want to parse.

from soupstars import parse

url = "https://corbettanalytics.com/"

@parse
def h1(soup):
    return soup.h1.text

You can test that the parser functions correctly.

soupstars run -m myparser.py

Use soupstars --help to see a full list of available commands.

More documentation is available here.

Development

Create a virtual environment with python3.6

virtualenv venv --python=python3.6

Install the package in development mode.

venv/bin/pip3 install --requirement requirements.txt
venv/bin/pip3 install --editable .

Run the tests.

venv/bin/pytest -v
venv/bin/flake soupstars examples

Releasing

New tags that pass on CI will automatically be pushed to PyPI and docker hub.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

soupstars-2.10.3.tar.gz (13.9 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

soupstars-2.10.3-py3-none-any.whl (19.0 kB view details)

Uploaded Python 3

File details

Details for the file soupstars-2.10.3.tar.gz.

File metadata

  • Download URL: soupstars-2.10.3.tar.gz
  • Upload date:
  • Size: 13.9 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/2.0.0 pkginfo/1.5.0.1 requests/2.22.0 setuptools/41.2.0 requests-toolbelt/0.9.1 tqdm/4.36.1 CPython/3.6.3

File hashes

Hashes for soupstars-2.10.3.tar.gz
Algorithm Hash digest
SHA256 65aef08446a96225b09c772d15a59e39e5e48693baa63152bfb020243d1684bf
MD5 cfffbe38361b078a9c48ac6074289dc4
BLAKE2b-256 945de211e03dd2f1a39e381bce9aaafb3621a987065e354e37534e9b76fa6dea

See more details on using hashes here.

File details

Details for the file soupstars-2.10.3-py3-none-any.whl.

File metadata

  • Download URL: soupstars-2.10.3-py3-none-any.whl
  • Upload date:
  • Size: 19.0 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/2.0.0 pkginfo/1.5.0.1 requests/2.22.0 setuptools/41.2.0 requests-toolbelt/0.9.1 tqdm/4.36.1 CPython/3.6.3

File hashes

Hashes for soupstars-2.10.3-py3-none-any.whl
Algorithm Hash digest
SHA256 f7d291acb06044e55b1278e877d28ee13bb7781c68145c23e17d93c4028ca846
MD5 5a98c9f325388effe024aee0ca5ef25e
BLAKE2b-256 fc3dc724192d1795522a3576f7c10d11833bd7b286b5ea46246992b9513701e5

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page