Skip to main content

Declarative web parsers

Project description

Soup Stars

Build Status

Version Python

Soup Stars is a framework for building web parsers with Python. It is designed to make building, deploying, and scheduling web parsers easier by simplifying what you need to get started.

Quickstart

Installation

Install it with pip.

pip install soupstars

The client is also available as a docker image.

docker pull soupstars/client

Building a parser

Create a new parser using the soupstars command. The create command will use a template parser.

soupstars create -m myparser.py

Parsers are simple python modules.

cat myparser.py

Notice that the only set up required is the special parse decorator and a variable named url for the web page you want to parse.

from soupstars import parse

url = "https://corbettanalytics.com/"

@parse
def h1(soup):
    return soup.h1.text

You can test that the parser functions correctly.

soupstars test -m myparser.py

The output is a json object.

{
  "data": {
    "h1": "Level up your analytics"
  },
  "errors": {},
  "status": 200,
  "url": "https://corbettanalytics.com/"
}

Use soupstars --help to see a full list of available commands.

Usage: soupstars [OPTIONS] COMMAND [ARGS]...

  CLI to interact with SoupStars cloud.

Options:
  --help  Show this message and exit.

Commands:
  config    Print the configuration used by the client
  create    Create a new parser from a template
  debug     Open a python prompt with a parser result
  health    Print the status of the SoupStars api
  login     Log in with an existing email
  ls        Show the parsers uploaded to SoupStars cloud
  pull      Pull a parser from SoupStars cloud into a local module
  push      Push a parser to SoupStars cloud
  register  Register a new account on SoupStars cloud
  results   Print results of a parser
  run       Run a parser on SoupStars cloud
  show      Show the contents of a parser on SoupStars cloud
  test      Test running a parser locally
  whoami    Print the email address of the current user

Deploying to soupstars.cloud

You can deploy your parsers to be ran on our service.

Use the CLI to create an account. You'll be prompted for a username and password.

soupstars register

Upload your parser.

soupstars push -m myparser.py

You can now run the parser from our service.

soupstars run -m myparser.py

The output is a json object.

{
  "data": {
    "data": {
      "h1": "Level up your analytics"
    },
    "errors": {},
    "status": 200,
    "url": "https://corbettanalytics.com/"
  },
  "id": "2bd746f6-ae14-4057-8af8-f92aa5d304ca",
  "parser_id": "cb25aa3b-375d-4d55-966b-99cfef6e4015",
  "status_code": 200,
  "user_id": 1
}

Development

Create a virtual environment with python3.6

virtualenv venv --python=python3.6

Install the package in development mode.

venv/bin/pip3 install --requirement requirements.txt
venv/bin/pip3 install --editable .

Run the tests.

venv/bin/pytest -v
venv/bin/flake soupstars examples

Releasing

New tags that pass on CI will automatically be pushed to PyPI and docker hub.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

soupstars-2.9.0.tar.gz (6.5 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

soupstars-2.9.0-py3-none-any.whl (8.1 kB view details)

Uploaded Python 3

File details

Details for the file soupstars-2.9.0.tar.gz.

File metadata

  • Download URL: soupstars-2.9.0.tar.gz
  • Upload date:
  • Size: 6.5 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/1.13.0 pkginfo/1.5.0.1 requests/2.22.0 setuptools/41.2.0 requests-toolbelt/0.9.1 tqdm/4.35.0 CPython/3.6.3

File hashes

Hashes for soupstars-2.9.0.tar.gz
Algorithm Hash digest
SHA256 e3d0bddface3a92aeb5632546d87bdb1d337db97b5ea3603b80f4b60d50d5fce
MD5 2c1d72cb8e8d09ed3c4c05ffddd69e0e
BLAKE2b-256 87755d2f31a9e0a53ac9b180498bd4f71b6ef1f890d18ca5cc25f361739c6578

See more details on using hashes here.

File details

Details for the file soupstars-2.9.0-py3-none-any.whl.

File metadata

  • Download URL: soupstars-2.9.0-py3-none-any.whl
  • Upload date:
  • Size: 8.1 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/1.13.0 pkginfo/1.5.0.1 requests/2.22.0 setuptools/41.2.0 requests-toolbelt/0.9.1 tqdm/4.35.0 CPython/3.6.3

File hashes

Hashes for soupstars-2.9.0-py3-none-any.whl
Algorithm Hash digest
SHA256 479bb592b3dba2568c1ea8e34d2457e3f5fc2076573f407c9732d41f8e264a54
MD5 44affa3b46f42e59801aad0e80a7be07
BLAKE2b-256 e94caea45bea2984c7029a5f118ceb90baa1692a199558bf652e38a6cf815175

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page