Skip to main content

Easy to build html parsers

Project description

Soupstars

Build Status Documentation Status Coverage Status

Soupstars makes it easy to build website parsers.

from soupstars import HttpParser, parse

class FacebookParser(HttpParser):

    DEFAULT_HOST = "https://www.facebook.com"

    @parse
    def title(soup):
        """
        >>> expected()
        'connect with friends and the world around you'
        """

        return soup.find('h2').text.strip()

fb = FacebookParser("/")
fb.json() # { "title": "connect with friends and the world around you" }  

Installation

The easiest way to get started is to install with pip.

pip install soupstars

You can play around with one of the prebuilt parsers directly.

>>> from soupstars.examples import NytimesArticleParser
>>> article = NytimesArticleParser("2019/01/09/us/politics/government-shutdown-trump-senate.html")
>>> article['title']  # Trump storms out of white house meeting with democrats

If you have docker, you can run a web api to serve the parsers. Clone this repo and start the containers.

$ docker-compose up

Parsers are served at /parsers/{parser_package}/{parser_module}, and any json data will be used to initialize the parser.

curl -X GET 0.0.0.0:5000/parse/nytimes/article \
  -H "Content-Type: application/json" \
  -d '{"url": "/2019/01/10/us/politics/trump-wall-texas-border.html"}'

  {
    "data": {
      "authors": "By Michael Tackett",
      "published_at": "Jan. 10, 2019",
      "title": "Trump, Heading to the Border, Suggests He Will Declare an Emergency to Fund the Wall"
    }
  }

To integrate the parsers with an existing flask app, register the soupstars_blueprint.

from soupstars import soupstars_blueprint

def create_app():
    app = Flask(__name__)
    app.register_blueprint(soupstars_blueprint)

Developing

Make sure that you've installed docker-compose. Then start the containers.

docker-compose up -d
docker-compose ps

Tests should be ran from inside the container.

docker-compose run --rm test

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

soupstars-0.3.0.tar.gz (13.0 kB view hashes)

Uploaded Source

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page