Skip to main content

Article text extraction library

Project description

Article extraction library.

article-extraction is a package that can be used to extract the article content from an HTML page.

Installation

Use poetry to install the library from GitHub.

poetry add "git+https://github.com/pmatigakis/article-extraction.git"

Usage

Extract the content of an article using article-extraction.

from urllib.request import urlopen

from articles.mss.extractors import MSSArticleExtractor

document = urlopen("https://www.bbc.com/sport/formula1/64983451").read()
article_extractor = MSSArticleExtractor()
article = article_extractor.extract_article(document)
print(article)

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

article-extraction-0.3.0.tar.gz (4.1 kB view details)

Uploaded Source

Built Distribution

article_extraction-0.3.0-py3-none-any.whl (5.1 kB view details)

Uploaded Python 3

File details

Details for the file article-extraction-0.3.0.tar.gz.

File metadata

  • Download URL: article-extraction-0.3.0.tar.gz
  • Upload date:
  • Size: 4.1 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: poetry/1.1.13 CPython/3.10.6 Linux/5.15.0-43-generic

File hashes

Hashes for article-extraction-0.3.0.tar.gz
Algorithm Hash digest
SHA256 a1e5f3d4eb980f8c987bdce31b5d5bfebc385b0d0e8379237e4ec9a63ea2b699
MD5 44da2496337a514e28b3c58ba342217f
BLAKE2b-256 8752a79cbb7ce210cacd430c55f4efb755ed4ab68867a74b4c4c034acbe33111

See more details on using hashes here.

File details

Details for the file article_extraction-0.3.0-py3-none-any.whl.

File metadata

  • Download URL: article_extraction-0.3.0-py3-none-any.whl
  • Upload date:
  • Size: 5.1 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: poetry/1.1.13 CPython/3.10.6 Linux/5.15.0-43-generic

File hashes

Hashes for article_extraction-0.3.0-py3-none-any.whl
Algorithm Hash digest
SHA256 b02bbb6daa433237058aabbed9e0373c58e00d8c9f4636923931db46ab7ce016
MD5 039f5159a4a29260bf5093de47246a35
BLAKE2b-256 7545f78f8650845dc5feca2433f14559a607e23ea5b9ff3f28b4b1815dafa6c1

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page