Skip to main content

Async news scraper for Nigerian and International news.

Project description

OSINT News Deamon Package

A high-performance, asynchronous Open Source Intelligence (OSINT) tool designed to scrape news articles from major Nigerian and International news outlets. This package leverages aiohttp, selenium, and playwright to handle both static and dynamic (JavaScript-heavy) websites concurrently.

Supported Outlets

News Outlet Method Key Features
BBC News aiohttp Fast, lightweight, static scraping.
CNN aiohttp Fast, lightweight, static scraping.
Arise TV aiohttp Fast, lightweight, static scraping.
TVC News aiohttp Fast, lightweight, static scraping.
Punch NG Selenium Handles dynamic content & anti-bot checks.
Business Day Playwright Handles complex JS rendering & search results.

Installation

1. Install the Package

You can install the package directly from PyPI (once published) or from the source:

pip install osint-news-deamon-pkg

Usage

Basic Usage (Fast Scrapers) For outlets like BBC, CNN, Arise, and TVC, the scrapers are purely asynchronous and very fast.

import asyncio
from osint_news_deamon_pkg import BBCTVScraper, AriseTvScraper

async def main():
    # 1. BBC News
    print("--- Scraping BBC ---")
    bbc = BBCTVScraper()
    bbc_results = await bbc.scrape(keyword="election", max_pages=1)
    for article in bbc_results[:3]:
        print(f"[BBC] {article['title']} - {article['page_link']}")

    # 2. Arise TV
    print("\n--- Scraping Arise TV ---")
    arise = AriseTvScraper()
    arise_results = await arise.scrape(keyword="economy", max_pages=1)
    for article in arise_results[:3]:
        print(f"[Arise] {article['title']} - {article['page_link']}")

if __name__ == "__main__":
    asyncio.run(main())

Advanced Usage (Browser-Based Scrapers) For outlets like Punch NG and Business Day, the scrapers launch a headless browser engine.

import asyncio
from osint_news_deamon_pkg import PunchNGScraper, BusinessDayScraper

async def scrape_dynamic():
    # 1. Punch NG (Uses Selenium)
    # Note: Requires Chrome installed
    print("--- Scraping Punch NG ---")
    punch = PunchNGScraper(headless=True)
    # Supports date filtering: "DD MMM, YYYY"
    punch_results = await punch.scrape(
        query="politics", 
        max_pages=1,
        from_date="01 Jan, 2024",
        to_date="20 Dec, 2024"
    )
    for article in punch_results[:3]:
        print(f"[Punch] {article.get('title', 'No Title')} - {article.get('url')}")

    # 2. Business Day (Uses Playwright)
    print("\n--- Scraping Business Day ---")
    bd = BusinessDayScraper()
    bd_results = await bd.scrape(query="finance", max_pages=1)
    for article in bd_results[:3]:
        print(f"[BusinessDay] {article['title']} - {article['url']}")

if __name__ == "__main__":
    asyncio.run(scrape_dynamic())

Configuration

Most scrapers accept the following parameters in their .scrape() method:

keyword / query: The search term.

max_pages: Number of pagination pages to traverse (Default: 1).

from_date: Start date filter (Format: "DD MMM, YYYY", e.g., "01 Jan, 2025").

to_date: End date filter (Format: "DD MMM, YYYY").

Requirements

Python 3.8+

Google Chrome (for Selenium)

Playwright Chromium (install via playwright install chromium)

Disclaimer

This tool is intended for educational purposes and legitimate Open Source Intelligence (OSINT) research. Users are responsible for adhering to the Terms of Service and robots.txt policies of the target websites.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

osint_news_deamon_pkg-0.1.0.tar.gz (9.6 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

osint_news_deamon_pkg-0.1.0-py3-none-any.whl (12.4 kB view details)

Uploaded Python 3

File details

Details for the file osint_news_deamon_pkg-0.1.0.tar.gz.

File metadata

  • Download URL: osint_news_deamon_pkg-0.1.0.tar.gz
  • Upload date:
  • Size: 9.6 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.10.19

File hashes

Hashes for osint_news_deamon_pkg-0.1.0.tar.gz
Algorithm Hash digest
SHA256 9616fd974f148247a3f2469cd66c7a4df647e519efe7cd8855e45b5a70265a32
MD5 71ab84d40b96b336ac300da739158372
BLAKE2b-256 796b3bed28e53ae1eb01f1f8b05f22da11ece87e1cf97e339740c638066fd951

See more details on using hashes here.

File details

Details for the file osint_news_deamon_pkg-0.1.0-py3-none-any.whl.

File metadata

File hashes

Hashes for osint_news_deamon_pkg-0.1.0-py3-none-any.whl
Algorithm Hash digest
SHA256 c9f83f38677d82489e04ea7c6c341071cd5bb31a54624db914c3448d58093e95
MD5 97a2dab84e4e6e7c83183a981a1c6de3
BLAKE2b-256 621e41e5b6a1a6eff4e1160c94f991d0795099687204c53a021e89599eb53228

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page