Skip to main content

Easy to build html parsers

Project description

Soupstars

Build Status Documentation Status Coverage Status

Soupstars makes it easy to build website parsers.

from soupstars import HttpParser, parse

class FacebookParser(HttpParser):

    default_host = "https://www.facebook.com"

    @parse
    def title(self):
        return self.read().find('h2').text.strip()

fb = FacebookParser("/")
fb.json() # { "title": "connect with friends and the world around you" }

Installation

The easiest way to get started is to install with pip.

pip install soupstars

You can play around with one of the prebuilt parsers directly.

>>> from soupstars.parsers.nytimes import NytimesArticleParser
>>> article = NytimesArticleParser("2019/01/09/us/politics/government-shutdown-trump-senate.html")
>>> article['title']  # Trump storms out of white house meeting with democrats

If you have docker, you can run a web api to serve the parsers. Clone this repo and start the containers.

$ docker-compose up

Parsers are served at /parsers/{parser_package}/{parser_module}, and any json data will be used to initialize the parser.

curl -X GET 0.0.0.0:5000/parsers/nytimes/article \
  -H "Content-Type: application/json" \
  -d '{"url": "/2019/01/10/us/politics/trump-wall-texas-border.html"}'

  {
    "data": {
      "authors": "By Michael Tackett",
      "published_at": "Jan. 10, 2019",
      "title": "Trump, Heading to the Border, Suggests He Will Declare an Emergency to Fund the Wall"
    }
  }

To integrate the parsers with an existing flask app, register the soupstars_blueprint.

from soupstars import soupstars_blueprint

def create_app():
    app = Flask(__name__)
    app.register_blueprint(soupstars_blueprint)

Developing

Make sure that you've installed docker-compose. Then start the containers.

docker-compose up -d
docker-compose ps

Tests should be ran from inside the container.

docker-compose run --rm test

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

soupstars-0.2.0.tar.gz (6.0 kB view details)

Uploaded Source

File details

Details for the file soupstars-0.2.0.tar.gz.

File metadata

  • Download URL: soupstars-0.2.0.tar.gz
  • Upload date:
  • Size: 6.0 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/1.12.1 pkginfo/1.5.0.1 requests/2.21.0 setuptools/40.4.3 requests-toolbelt/0.8.0 tqdm/4.29.0 CPython/2.7.15

File hashes

Hashes for soupstars-0.2.0.tar.gz
Algorithm Hash digest
SHA256 126b1834fbeda2825252630a63497f2517eefcbe671c2503168104ee136cdea8
MD5 12aff0066eabc38efaf8c0a6b65becdc
BLAKE2b-256 f4d00425a945e9519b60434eb9706c765aa481bf2baf5e3a0dfe8b0189023f8b

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page