Skip to main content

Declarative web parsers

Project description

Soup Stars

Build Status

Version Python

Soup Stars is a framework for building web parsers with Python. It is designed to make building, deploying, and scheduling web parsers easier by simplifying what you need to get started.

Quickstart

pip install soupstars

Creating a parser

New parsers are created by typing soupstars create into a terminal, and supplying the name of a python module.

soupstars create myparser.py

Soup Stars will use a template parser to help you get started. This example creates a parser that extracts headlines from articles on the New York Times website.

from soupstars import data, follow

url = "https://www.nytimes.com"

@follow
def follow(url):
    return (url.domain == "www.nytimes.com") and (url.match("\d{4}\/\d{2}\/\d{2}"))

@parse
def h1(soup):
    return soup.h1.text

You can test that the parser functions correctly.

soupstars run myparser

Use soupstars --help to see a full list of available commands.

More documentation is available here.

Development

Start the docker services.

docker-compose up -d

Set up the containers.

docker-compose exec web flask s3 mb soupstars-archive
docker-compose exec web flask db upgrade
docker-compose exec web flask seed schedules
docker-compose exec web flask seed plans
docker-compose exec web flask seed user
docker-compose exec web flask seed parsers

Run the tests.

docker-compose run --rm client pytest -vs

Releasing

New tags that pass on CI will automatically be pushed to docker hub.

To deploy to PyPI requires manually running the following commands.

pip3 install twine
python3 setup.py sdist bdist_wheel
twine upload dist/*

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

soupstars-2.11.25.tar.gz (18.0 kB view details)

Uploaded Source

Built Distribution

soupstars-2.11.25-py3-none-any.whl (23.7 kB view details)

Uploaded Python 3

File details

Details for the file soupstars-2.11.25.tar.gz.

File metadata

  • Download URL: soupstars-2.11.25.tar.gz
  • Upload date:
  • Size: 18.0 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/1.13.0 pkginfo/1.5.0.1 requests/2.22.0 setuptools/41.0.1 requests-toolbelt/0.9.1 tqdm/4.32.2 CPython/3.7.4

File hashes

Hashes for soupstars-2.11.25.tar.gz
Algorithm Hash digest
SHA256 dd732cee88c354e341269c1a14926bdade5bf844de75148ac99cd371cc713b1a
MD5 576a0d1724f620c0839b4f567d5b3256
BLAKE2b-256 0e0a5a635306001f8a0a47254c409a74c7ce7fc9f3bc447c32e6f915e51d7366

See more details on using hashes here.

File details

Details for the file soupstars-2.11.25-py3-none-any.whl.

File metadata

  • Download URL: soupstars-2.11.25-py3-none-any.whl
  • Upload date:
  • Size: 23.7 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/1.13.0 pkginfo/1.5.0.1 requests/2.22.0 setuptools/41.0.1 requests-toolbelt/0.9.1 tqdm/4.32.2 CPython/3.7.4

File hashes

Hashes for soupstars-2.11.25-py3-none-any.whl
Algorithm Hash digest
SHA256 4fa69240a6f2563ed3b4c7ba4bdfd71ebba76e305dff6d6567ff6324201cf1d6
MD5 2e62194a9de2104cfd7e71a9cc4bd63f
BLAKE2b-256 c2665d45b4b0501888b0101ee40690f92a38412d561902d4acdc17f9030d557c

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page