Skip to main content

Declarative web parsers

Project description

Soup Stars

Build Status

Version Python

Soup Stars is a framework for building web parsers with Python. It is designed to make building, deploying, and scheduling web parsers easier by simplifying what you need to get started.

Quickstart

pip install soupstars

Creating a parser

New parsers are created by typing soupstars create into a terminal, and supplying the name of a python module.

soupstars create myparser.py

Soup Stars will use a template parser to help you get started. This example creates a parser that extracts headlines from articles on the New York Times website.

from soupstars import data, follow

url = "https://www.nytimes.com"

@follow
def follow(url):
    return (url.domain == "www.nytimes.com") and (url.match("\d{4}\/\d{2}\/\d{2}"))

@parse
def h1(soup):
    return soup.h1.text

You can test that the parser functions correctly.

soupstars run myparser

Use soupstars --help to see a full list of available commands.

More documentation is available here.

Development

Start the docker services.

docker-compose up -d

Set up the containers.

docker-compose exec web flask s3 mb soupstars-archive
docker-compose exec web flask db upgrade
docker-compose exec web flask seed schedules
docker-compose exec web flask seed plans
docker-compose exec web flask seed user
docker-compose exec web flask seed parsers

Run the tests.

docker-compose run --rm client pytest -vs

Releasing

New tags that pass on CI will automatically be pushed to docker hub.

To deploy to PyPI requires manually running the following commands.

pip3 install twine
python3 setup.py sdist bdist_wheel
twine upload dist/*

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

soupstars-2.11.24.tar.gz (17.6 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

soupstars-2.11.24-py3-none-any.whl (23.4 kB view details)

Uploaded Python 3

File details

Details for the file soupstars-2.11.24.tar.gz.

File metadata

  • Download URL: soupstars-2.11.24.tar.gz
  • Upload date:
  • Size: 17.6 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/1.13.0 pkginfo/1.5.0.1 requests/2.22.0 setuptools/41.0.1 requests-toolbelt/0.9.1 tqdm/4.32.2 CPython/3.7.4

File hashes

Hashes for soupstars-2.11.24.tar.gz
Algorithm Hash digest
SHA256 d04b0d4c3abe4b501f3486193fed5e018c6ade06786585d10151d47cf86cb161
MD5 85a406f94c96088e15fb1ca469eeee27
BLAKE2b-256 a2dff2596017373660acdf183646b5f9fe06ea71b82dce8eb58866f3dc4c7b65

See more details on using hashes here.

File details

Details for the file soupstars-2.11.24-py3-none-any.whl.

File metadata

  • Download URL: soupstars-2.11.24-py3-none-any.whl
  • Upload date:
  • Size: 23.4 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/1.13.0 pkginfo/1.5.0.1 requests/2.22.0 setuptools/41.0.1 requests-toolbelt/0.9.1 tqdm/4.32.2 CPython/3.7.4

File hashes

Hashes for soupstars-2.11.24-py3-none-any.whl
Algorithm Hash digest
SHA256 694c07bb78f6a6133f108fcc8ec725b197903a91bb350c3bb23db958a62caef0
MD5 d5f1d53b0f19ed4b7ecbb0a919a6dd74
BLAKE2b-256 4f34a3b8c07f1b968b643b45e6cc9abe8f7330ff2d61ff299fffb389b37d08fc

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page