Skip to main content

A web parser wrapper on top of lxml and selectolax

Project description

A web content parser using Python lxml


Compatibility
-------------

The library is compatible with Python3. Python2 is currently not supported.


Usage
-----

Install the package using pip.

```
pip install webparser-py
```

**Convert to Document**

Accept the html content document, convert it to the doc element, if we want to convert relative links to absolute links,
we pass the domain url to the absolute links.

**convert_to_doc()**

```
from webparser.parser import convert_to_doc

doc = convert_to_doc('HTML content', 'http://yourwebsite.com')

```

**class FeedParser()**

Feed parser class is used for parsing the feed through the response content or using a URL.


```
from webparser.parser import FeedParser

feed = FeedParser() # optional feed URL can be provided.
parsed_links = feed.parse(url='http://viralnova.com/feed') # url will override constructor feed URL.
```

**has_rss_feed()**

Check if the website/URL has a RSS feed link present.

- Check the document with Mimetype of links using XPATH.
- Fuzzy URL search e.g /feed at the end of the website URL. (Attempted if no links for the RSS URL found)

```
from webparser.parser import has_rss_feed
rss_links = has_rss_feed(doc=html_content, url=website_url)
```



Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

webparser-py-0.3.tar.gz (4.6 kB view details)

Uploaded Source

Built Distribution

webparser_py-0.3-py3-none-any.whl (5.3 kB view details)

Uploaded Python 3

File details

Details for the file webparser-py-0.3.tar.gz.

File metadata

  • Download URL: webparser-py-0.3.tar.gz
  • Upload date:
  • Size: 4.6 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No

File hashes

Hashes for webparser-py-0.3.tar.gz
Algorithm Hash digest
SHA256 d3b510b7152d55480dd4a0a679415a63e8a4d1333f7692b9fa66010061a3d14c
MD5 7c098143fddb3735fe09af4ec42c7849
BLAKE2b-256 9f944c25bce9ef18054b7e97511c8487264e43f8196ae0a62cada0ac3d438691

See more details on using hashes here.

File details

Details for the file webparser_py-0.3-py3-none-any.whl.

File metadata

File hashes

Hashes for webparser_py-0.3-py3-none-any.whl
Algorithm Hash digest
SHA256 93fb244a4a2a12639e667d473cdae6561110ad1a442ea96f6d2fb1f6f4b1ef11
MD5 e1f2ed4430484895f9db76a65f8ac48b
BLAKE2b-256 5fced840ea1b729a4abc789379d45595361d5a11bb6c24449852df01ba3bc910

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page