Easy to build html parsers
Project description
Soupstars
Soupstars makes it easy to build website parsers.
from soupstars import HttpParser, parse
class FacebookParser(HttpParser):
default_host = "https://www.facebook.com"
@parse
def title(self):
return self.read().find('h2').text.strip()
fb = FacebookParser("/")
fb.json() # { "title": "connect with friends and the world around you" }
Installation
The easiest way to get started is to install with pip.
pip install soupstars
You can play around with one of the prebuilt parsers directly.
>>> from soupstars.parsers.nytimes import NytimesArticleParser
>>> article = NytimesArticleParser("2019/01/09/us/politics/government-shutdown-trump-senate.html")
>>> article['title'] # Trump storms out of white house meeting with democrats
If you have docker, you can run a web api to serve the parsers. Clone this repo and start the containers.
$ docker-compose up
Parsers are served at /parsers/{parser_package}/{parser_module}, and any json data will be used to initialize the parser.
curl -X GET 0.0.0.0:5000/parsers/nytimes/article \
-H "Content-Type: application/json" \
-d '{"url": "/2019/01/10/us/politics/trump-wall-texas-border.html"}'
{
"data": {
"authors": "By Michael Tackett",
"published_at": "Jan. 10, 2019",
"title": "Trump, Heading to the Border, Suggests He Will Declare an Emergency to Fund the Wall"
}
}
To integrate the parsers with an existing flask app, register the soupstars_blueprint.
from soupstars import soupstars_blueprint
def create_app():
app = Flask(__name__)
app.register_blueprint(soupstars_blueprint)
Developing
Make sure that you've installed docker-compose. Then start the containers.
docker-compose up -d
docker-compose ps
Tests should be ran from inside the container.
docker-compose run --rm test
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
File details
Details for the file soupstars-0.2.0.tar.gz.
File metadata
- Download URL: soupstars-0.2.0.tar.gz
- Upload date:
- Size: 6.0 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/1.12.1 pkginfo/1.5.0.1 requests/2.21.0 setuptools/40.4.3 requests-toolbelt/0.8.0 tqdm/4.29.0 CPython/2.7.15
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
126b1834fbeda2825252630a63497f2517eefcbe671c2503168104ee136cdea8
|
|
| MD5 |
12aff0066eabc38efaf8c0a6b65becdc
|
|
| BLAKE2b-256 |
f4d00425a945e9519b60434eb9706c765aa481bf2baf5e3a0dfe8b0189023f8b
|