Skip to main content

Easy to build html parsers

Project description

Soupstars

Build Status Documentation Status Coverage Status

Soupstars makes it easy to build website parsers.

from soupstars import HttpParser, parse

class FacebookParser(HttpParser):

    DEFAULT_HOST = "https://www.facebook.com"

    @parse
    def title(soup):
        """
        >>> expected()
        'connect with friends and the world around you'
        """

        return soup.find('h2').text.strip()

fb = FacebookParser("/")
fb.json() # { "title": "connect with friends and the world around you" }  

Installation

The easiest way to get started is to install with pip.

pip install soupstars

You can play around with one of the prebuilt parsers directly.

>>> from soupstars.examples import NytimesArticleParser
>>> article = NytimesArticleParser("2019/01/09/us/politics/government-shutdown-trump-senate.html")
>>> article['title']  # Trump storms out of white house meeting with democrats

If you have docker, you can run a web api to serve the parsers. Clone this repo and start the containers.

$ docker-compose up

Parsers are served at /parsers/{parser_package}/{parser_module}, and any json data will be used to initialize the parser.

curl -X GET 0.0.0.0:5000/parse/nytimes/article \
  -H "Content-Type: application/json" \
  -d '{"url": "/2019/01/10/us/politics/trump-wall-texas-border.html"}'

  {
    "data": {
      "authors": "By Michael Tackett",
      "published_at": "Jan. 10, 2019",
      "title": "Trump, Heading to the Border, Suggests He Will Declare an Emergency to Fund the Wall"
    }
  }

To integrate the parsers with an existing flask app, register the soupstars_blueprint.

from soupstars import soupstars_blueprint

def create_app():
    app = Flask(__name__)
    app.register_blueprint(soupstars_blueprint)

Developing

Make sure that you've installed docker-compose. Then start the containers.

docker-compose up -d
docker-compose ps

Tests should be ran from inside the container.

docker-compose run --rm test

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

soupstars-0.3.1.tar.gz (13.4 kB view details)

Uploaded Source

File details

Details for the file soupstars-0.3.1.tar.gz.

File metadata

  • Download URL: soupstars-0.3.1.tar.gz
  • Upload date:
  • Size: 13.4 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/1.12.1 pkginfo/1.5.0.1 requests/2.21.0 setuptools/40.4.3 requests-toolbelt/0.8.0 tqdm/4.29.0 CPython/2.7.15

File hashes

Hashes for soupstars-0.3.1.tar.gz
Algorithm Hash digest
SHA256 5bc069258c8b926e60807d4d49acf7247b932d4366c1fff98168b2fa475acfe7
MD5 680c2cef3b19ea152d470113112f55b3
BLAKE2b-256 e7660b41462bd58b1103a502ab8798a3ac6377d0e8a0e057d54279978b0d6e2c

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page