Easy to build html parsers
Project description
Soupstars
Soupstars makes it easy to build website parsers.
from soupstars import HttpParser, parse
class FacebookParser(HttpParser):
DEFAULT_HOST = "https://www.facebook.com"
@parse
def title(soup):
"""
>>> expected()
'connect with friends and the world around you'
"""
return soup.find('h2').text.strip()
fb = FacebookParser("/")
fb.json() # { "title": "connect with friends and the world around you" }
Installation
The easiest way to get started is to install with pip.
pip install soupstars
You can play around with one of the prebuilt parsers directly.
>>> from soupstars.examples import NytimesArticleParser
>>> article = NytimesArticleParser("2019/01/09/us/politics/government-shutdown-trump-senate.html")
>>> article['title'] # Trump storms out of white house meeting with democrats
If you have docker, you can run a web api to serve the parsers. Clone this repo and start the containers.
$ docker-compose up
Parsers are served at /parsers/{parser_package}/{parser_module}
, and any json data will be used to initialize the parser.
curl -X GET 0.0.0.0:5000/parse/nytimes/article \
-H "Content-Type: application/json" \
-d '{"url": "/2019/01/10/us/politics/trump-wall-texas-border.html"}'
{
"data": {
"authors": "By Michael Tackett",
"published_at": "Jan. 10, 2019",
"title": "Trump, Heading to the Border, Suggests He Will Declare an Emergency to Fund the Wall"
}
}
To integrate the parsers with an existing flask app, register the soupstars_blueprint
.
from soupstars import soupstars_blueprint
def create_app():
app = Flask(__name__)
app.register_blueprint(soupstars_blueprint)
Developing
Make sure that you've installed docker-compose. Then start the containers.
docker-compose up -d
docker-compose ps
Tests should be ran from inside the container.
docker-compose run --rm test
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
soupstars-0.3.0.tar.gz
(13.0 kB
view hashes)