Article text extraction library
Project description
Article extraction library.
article-extraction is a package that can be used to extract the article content from an HTML page.
Installation
Use poetry to install the library from GitHub.
poetry add "git+https://github.com/pmatigakis/article-extraction.git"
Usage
Extract the content of an article using article-extraction.
from urllib.request import urlopen
from articles.mss.extractors import MSSArticleExtractor
document = urlopen("https://www.bbc.com/sport/formula1/64983451").read()
article_extractor = MSSArticleExtractor()
article = article_extractor.extract_article(document)
print(article)
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Close
Hashes for article_extraction-0.3.0-py3-none-any.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | b02bbb6daa433237058aabbed9e0373c58e00d8c9f4636923931db46ab7ce016 |
|
MD5 | 039f5159a4a29260bf5093de47246a35 |
|
BLAKE2b-256 | 7545f78f8650845dc5feca2433f14559a607e23ea5b9ff3f28b4b1815dafa6c1 |