Skip to main content

Advanced news extraction, article parsing, and content analysis.

Project description

newspaperV3

An advanced library for news extraction, article parsing, and content analysis. This is a fork/version based on the original newspaper library by Lucas Ou-Yang.

Installation

Install the package using pip:

pip install newspaperV3

Basic Usage

Here's a simple example of how to download and parse an article:

from newspaperV3 import Article
import nltk

# NLTK data is required for the first run
# nltk.download('punkt')

url = 'https://edition.cnn.com/2025/07/29/middleeast/israeli-settler-odeh-hathalin-west-bank-oscar-intl'

# Create an Article object
article = Article(url)

# Download and parse the article
article.download()
article.parse()

# Perform Natural Language Processing (NLP)
article.nlp()

# Print the results
print("Title:", article.title)
print("Authors:", article.authors)
print("Publish Date:", article.publish_date)
print("Top Image:", article.top_image)
print("\nSummary:")
print(article.summary)
print("\nKeywords:", article.keywords)

Features

  • Article Extraction : Automatically extract clean article text from web pages
  • Metadata Parsing : Extract titles, authors, publication dates, and images
  • Natural Language Processing : Generate summaries and extract keywords
  • Multi-language Support : Process articles in various languages
  • Image Processing : Extract and analyze article images
  • Content Analysis : Advanced text processing and analysis capabilities

Requirements

  • Python 3.6+
  • NLTK (for natural language processing)
  • Additional dependencies installed automatically

License

This project is licensed under the MIT License.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

newspaperv3-0.3.2.tar.gz (213.9 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

newspaperv3-0.3.2-py3-none-any.whl (225.2 kB view details)

Uploaded Python 3

File details

Details for the file newspaperv3-0.3.2.tar.gz.

File metadata

  • Download URL: newspaperv3-0.3.2.tar.gz
  • Upload date:
  • Size: 213.9 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: poetry/2.1.3 CPython/3.12.7 Linux/6.14.0-32-generic

File hashes

Hashes for newspaperv3-0.3.2.tar.gz
Algorithm Hash digest
SHA256 8eb00ec14aca73232f360544d20d8cc057691c577493b7ca108fb74bab54d049
MD5 3650d8e3f118d60d5be51a088610862e
BLAKE2b-256 5d8b96394d9aef0ce903e98902f16b7fd0f84beaef3214b13db4f568d889f454

See more details on using hashes here.

File details

Details for the file newspaperv3-0.3.2-py3-none-any.whl.

File metadata

  • Download URL: newspaperv3-0.3.2-py3-none-any.whl
  • Upload date:
  • Size: 225.2 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: poetry/2.1.3 CPython/3.12.7 Linux/6.14.0-32-generic

File hashes

Hashes for newspaperv3-0.3.2-py3-none-any.whl
Algorithm Hash digest
SHA256 8b01f2058e53c6dac3c95d6a64f05a6d438ab47974589a52d7465a97d8154360
MD5 4625f79a8ba5798ef3849d45ec3530db
BLAKE2b-256 1041ae1974766dee26982ed3733071fe1ae4b63088a6a4e1da1655774266a083

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page