Skip to main content

A scraper for Indonesian news websites.

Project description

news-watch

PyPI version Build Status

news-watch allows you to scrape news articles from various Indonesian news websites based on specific keywords and date ranges.

Installation

You can install newswatch via pip:

pip install news-watch

Usage

To run the scraper from the command line:

newswatch -k <keywords> -sd <start_date> -s [<scrapers>] [-v]

Command-Line Arguments

--keywords, -k: Required. A comma-separated list of keywords to scrape (e.g., -k "ojk,bank,npl").

--start_date, -sd: Required. The start date for scraping in YYYY-MM-DD format (e.g., -sd 2023-01-01).

--scrapers, -s: Optional. A comma-separated list of scrapers to use (e.g., -s "kompas,viva"). If not provided, all scrapers will be used by default.

--verbose, -v: Optional. Increase verbosity level (e.g., -v, -vv, -vvv).

Examples

Scrape articles related to "ihsg" from October 28, 2024:

newswatch -k ihsg -sd 2024-11-01

Scrape articles for multiple keywords and increase verbosity:

newswatch -k "ihsg,bank,keuangan" -sd 2024-11-01 -vv

Output

The scraped articles are saved as a CSV file in the current working directory with the format news-watch-{keywords}-YYYYMMDD_HH.csv.

The CSV file contains the following fields:

  • title
  • publish_date
  • author
  • content
  • keyword
  • category
  • source
  • link

Supported Websites

Contributing

Contributions are welcome! If you'd like to add support for more websites or improve the existing code, please open an issue or submit a pull request.

Running Tests

To run the test suite:

pytest tests/

License

This project is licensed under the Apache License 2.0 - see the LICENSE file for details.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

news_watch-0.1.6.tar.gz (22.9 kB view details)

Uploaded Source

Built Distribution

news_watch-0.1.6-py3-none-any.whl (30.2 kB view details)

Uploaded Python 3

File details

Details for the file news_watch-0.1.6.tar.gz.

File metadata

  • Download URL: news_watch-0.1.6.tar.gz
  • Upload date:
  • Size: 22.9 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/5.1.1 CPython/3.10.15

File hashes

Hashes for news_watch-0.1.6.tar.gz
Algorithm Hash digest
SHA256 bc9b77301962d91fa13a6db738635a1cfa527a405e69b767bb7b85bace7bf088
MD5 7fab29cab9cf34b86caa26b2d558732f
BLAKE2b-256 b18ddba1e53f8af4ed479282561e6e2731639dfba40d3efe93894fdb656d035d

See more details on using hashes here.

File details

Details for the file news_watch-0.1.6-py3-none-any.whl.

File metadata

  • Download URL: news_watch-0.1.6-py3-none-any.whl
  • Upload date:
  • Size: 30.2 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/5.1.1 CPython/3.10.15

File hashes

Hashes for news_watch-0.1.6-py3-none-any.whl
Algorithm Hash digest
SHA256 5738f887b17e37c0a6ee6a62f8d42eb23e9edf79899f480e750fe513360be41d
MD5 6cce4dd5bbfae6530c597d47c440d4eb
BLAKE2b-256 59bd1769e4e8f66fbe37aaef16003f1451f833742741b836ba9fca0f61147764

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page