Skip to main content

Declarative web parsers

Project description

Soup Stars

Build Status

Version Python

Soup Stars is a framework for building web parsers with Python. It is designed to make building, deploying, and scheduling web parsers easier by simplifying what you need to get started.

Quickstart

pip install soupstars

The client is also available as a docker image.

docker pull soupstars/client

Building a parser

Create a new parser using the soupstars command. The create command will use a template parser.

soupstars create -m myparser.py

Parsers are simple python modules.

cat myparser.py

Notice that the only set up required is the special parse decorator and a variable named url for the web page you want to parse.

from soupstars import parse

url = "https://corbettanalytics.com/"

@parse
def h1(soup):
    return soup.h1.text

You can test that the parser functions correctly.

soupstars run -m myparser.py

Use soupstars --help to see a full list of available commands.

More documentation is available here.

Development

Create a virtual environment with python3.6

virtualenv venv --python=python3.6

Install the package in development mode.

venv/bin/pip3 install --requirement requirements.txt
venv/bin/pip3 install --editable .

Run the tests.

venv/bin/pytest -v
venv/bin/flake soupstars examples

Releasing

New tags that pass on CI will automatically be pushed to PyPI and docker hub.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

soupstars-2.10.2.tar.gz (13.9 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

soupstars-2.10.2-py3-none-any.whl (19.0 kB view details)

Uploaded Python 3

File details

Details for the file soupstars-2.10.2.tar.gz.

File metadata

  • Download URL: soupstars-2.10.2.tar.gz
  • Upload date:
  • Size: 13.9 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/2.0.0 pkginfo/1.5.0.1 requests/2.22.0 setuptools/41.2.0 requests-toolbelt/0.9.1 tqdm/4.36.1 CPython/3.6.3

File hashes

Hashes for soupstars-2.10.2.tar.gz
Algorithm Hash digest
SHA256 af6c4ad46deedcca68bf0ab46000a7607e0c405822eb4f8c3578b6c1cb81cdb5
MD5 edaaeefd9bbdcbf5c3d5e6c9e94968d0
BLAKE2b-256 ecd893ba2331781c8c4a9ccb1ffa6a556dcd27f14b47548840bccaf2b6215508

See more details on using hashes here.

File details

Details for the file soupstars-2.10.2-py3-none-any.whl.

File metadata

  • Download URL: soupstars-2.10.2-py3-none-any.whl
  • Upload date:
  • Size: 19.0 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/2.0.0 pkginfo/1.5.0.1 requests/2.22.0 setuptools/41.2.0 requests-toolbelt/0.9.1 tqdm/4.36.1 CPython/3.6.3

File hashes

Hashes for soupstars-2.10.2-py3-none-any.whl
Algorithm Hash digest
SHA256 f8e86690e39da1637140f833ca5abe617fc64915cb73465cc61f5083573e2877
MD5 1367c4fa921f6b9c0307b780663abdba
BLAKE2b-256 27c665c93be37e8a49bf0ca8f5ef552e531f975420e373c1f977e16823d5cbd3

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page