Skip to main content

Declarative web parsers

Project description

Soup Stars

Build Status

Version Python

Soup Stars is a framework for building web parsers with Python. It is designed to make building, deploying, and scheduling web parsers easier by simplifying what you need to get started.

Quickstart

pip install soupstars

The client is also available as a docker image.

docker pull soupstars/client

Building a parser

Create a new parser using the soupstars command. The create command will use a template parser.

soupstars create -m myparser.py

Parsers are simple python modules.

cat myparser.py

Notice that the only set up required is the special parse decorator and a variable named url for the web page you want to parse.

from soupstars import parse

url = "https://corbettanalytics.com/"

@parse
def h1(soup):
    return soup.h1.text

You can test that the parser functions correctly.

soupstars run -m myparser.py

Use soupstars --help to see a full list of available commands.

More documentation is available here.

Development

Start the docker services.

docker-compose up -d

Run the tests.

docker-compose run --rm client pytest -vs

Releasing

New tags that pass on CI will automatically be pushed to docker hub.

To deploy to PyPI requires manually running the following commands.

pip3 install twine
python3 setup.py sdist bdist_wheel
twine upload dist/*

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

soupstars-2.11.15.tar.gz (14.6 kB view hashes)

Uploaded Source

Built Distribution

soupstars-2.11.15-py3-none-any.whl (8.3 kB view hashes)

Uploaded Python 3

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page