Skip to main content

Article text extraction library

Project description

Article extraction library.

article-extraction is a package that can be used to extract the article content from an HTML page.

Installation

Use poetry to install the library from GitHub.

poetry add "git+https://github.com/pmatigakis/article-extraction.git"

Usage

Extract the content of an article using article-extraction.

from urllib.request import urlopen

from articles.mss.extractors import MSSArticleExtractor

document = urlopen("https://www.bbc.com/sport/formula1/64983451").read()
article_extractor = MSSArticleExtractor()
article = article_extractor.extract_article(document)
print(article)

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

article-extraction-0.3.0.tar.gz (4.1 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

article_extraction-0.3.0-py3-none-any.whl (5.1 kB view details)

Uploaded Python 3

File details

Details for the file article-extraction-0.3.0.tar.gz.

File metadata

  • Download URL: article-extraction-0.3.0.tar.gz
  • Upload date:
  • Size: 4.1 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: poetry/1.1.13 CPython/3.10.6 Linux/5.15.0-43-generic

File hashes

Hashes for article-extraction-0.3.0.tar.gz
Algorithm Hash digest
SHA256 a1e5f3d4eb980f8c987bdce31b5d5bfebc385b0d0e8379237e4ec9a63ea2b699
MD5 44da2496337a514e28b3c58ba342217f
BLAKE2b-256 8752a79cbb7ce210cacd430c55f4efb755ed4ab68867a74b4c4c034acbe33111

See more details on using hashes here.

File details

Details for the file article_extraction-0.3.0-py3-none-any.whl.

File metadata

  • Download URL: article_extraction-0.3.0-py3-none-any.whl
  • Upload date:
  • Size: 5.1 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: poetry/1.1.13 CPython/3.10.6 Linux/5.15.0-43-generic

File hashes

Hashes for article_extraction-0.3.0-py3-none-any.whl
Algorithm Hash digest
SHA256 b02bbb6daa433237058aabbed9e0373c58e00d8c9f4636923931db46ab7ce016
MD5 039f5159a4a29260bf5093de47246a35
BLAKE2b-256 7545f78f8650845dc5feca2433f14559a607e23ea5b9ff3f28b4b1815dafa6c1

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page