Skip to main content

Declarative web parsers

Project description

Soupstars

Build declarative web parsers in python. Install it with pip.

pip install soupstars

A full example:

from soupstars import Parser, serialize

class NytimesArticleParser(Parser):
    "Parse data from a NY times article"

    @serialize
    def title(self):
        return self.h1.text

    @serialize
    def author(self):
        return self.find(attrs={'itemprop': 'author creator'}).text


if __name__ == "__main__":
    url = "https://www.nytimes.com/2019/04/25/us/politics/joe-biden-anita-hill.html"
    parser = NytimesArticleParser(url)
    print(parser.to_json())

Running the script above produces:

{
    'author': 'By Sheryl Gay Stolberg and Carl Hulse',
    'title': 'Joe Biden Expresses Regret to Anita Hill, but She Says ‘I’m Sorry’ Is Not Enough'
}

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distributions

No source distribution files available for this release.See tutorial on generating distribution archives.

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

soupstars-1.0.0-py3-none-any.whl (2.9 kB view details)

Uploaded Python 3

File details

Details for the file soupstars-1.0.0-py3-none-any.whl.

File metadata

  • Download URL: soupstars-1.0.0-py3-none-any.whl
  • Upload date:
  • Size: 2.9 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/1.12.1 pkginfo/1.5.0.1 requests/2.21.0 setuptools/40.4.3 requests-toolbelt/0.8.0 tqdm/4.29.0 CPython/2.7.15

File hashes

Hashes for soupstars-1.0.0-py3-none-any.whl
Algorithm Hash digest
SHA256 b61096ec6f46f0300b5df177c3824b2d5399cf2c64dea32d2b0c9c484d513b92
MD5 0bc834258917f38a4ffbaeb4b73c07ea
BLAKE2b-256 9eaf808e33f10493f5eb7a4c736186ab0b7597b5731b1037accab1f59fb5b733

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page