Async news scraper for Nigerian and International news.
Project description
OSINT News Deamon Package
A high-performance, asynchronous Open Source Intelligence (OSINT) tool designed to scrape news articles from major Nigerian and International news outlets. This package leverages aiohttp, selenium, and playwright to handle both static and dynamic (JavaScript-heavy) websites concurrently.
Supported Outlets
| News Outlet | Method | Key Features |
|---|---|---|
| BBC News | aiohttp |
Fast, lightweight, static scraping. |
| CNN | aiohttp |
Fast, lightweight, static scraping. |
| Arise TV | aiohttp |
Fast, lightweight, static scraping. |
| TVC News | aiohttp |
Fast, lightweight, static scraping. |
| Punch NG | Selenium |
Handles dynamic content & anti-bot checks. |
| Business Day | Playwright |
Handles complex JS rendering & search results. |
Installation
1. Install the Package
You can install the package directly from PyPI (once published) or from the source:
pip install osint-news-deamon-pkg
Usage
Basic Usage (Fast Scrapers) For outlets like BBC, CNN, Arise, and TVC, the scrapers are purely asynchronous and very fast.
import asyncio
from osint_news_deamon_pkg import BBCTVScraper, AriseTvScraper
async def main():
# 1. BBC News
print("--- Scraping BBC ---")
bbc = BBCTVScraper()
bbc_results = await bbc.scrape(keyword="election", max_pages=1)
for article in bbc_results[:3]:
print(f"[BBC] {article['title']} - {article['page_link']}")
# 2. Arise TV
print("\n--- Scraping Arise TV ---")
arise = AriseTvScraper()
arise_results = await arise.scrape(keyword="economy", max_pages=1)
for article in arise_results[:3]:
print(f"[Arise] {article['title']} - {article['page_link']}")
if __name__ == "__main__":
asyncio.run(main())
Advanced Usage (Browser-Based Scrapers) For outlets like Punch NG and Business Day, the scrapers launch a headless browser engine.
import asyncio
from osint_news_deamon_pkg import PunchNGScraper, BusinessDayScraper
async def scrape_dynamic():
# 1. Punch NG (Uses Selenium)
# Note: Requires Chrome installed
print("--- Scraping Punch NG ---")
punch = PunchNGScraper(headless=True)
# Supports date filtering: "DD MMM, YYYY"
punch_results = await punch.scrape(
query="politics",
max_pages=1,
from_date="01 Jan, 2024",
to_date="20 Dec, 2024"
)
for article in punch_results[:3]:
print(f"[Punch] {article.get('title', 'No Title')} - {article.get('url')}")
# 2. Business Day (Uses Playwright)
print("\n--- Scraping Business Day ---")
bd = BusinessDayScraper()
bd_results = await bd.scrape(query="finance", max_pages=1)
for article in bd_results[:3]:
print(f"[BusinessDay] {article['title']} - {article['url']}")
if __name__ == "__main__":
asyncio.run(scrape_dynamic())
Configuration
Most scrapers accept the following parameters in their .scrape() method:
keyword / query: The search term.
max_pages: Number of pagination pages to traverse (Default: 1).
from_date: Start date filter (Format: "DD MMM, YYYY", e.g., "01 Jan, 2025").
to_date: End date filter (Format: "DD MMM, YYYY").
Requirements
Python 3.8+
Google Chrome (for Selenium)
Playwright Chromium (install via playwright install chromium)
Disclaimer
This tool is intended for educational purposes and legitimate Open Source Intelligence (OSINT) research. Users are responsible for adhering to the Terms of Service and robots.txt policies of the target websites.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file osint_news_deamon_pkg-0.1.0.tar.gz.
File metadata
- Download URL: osint_news_deamon_pkg-0.1.0.tar.gz
- Upload date:
- Size: 9.6 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.10.19
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
9616fd974f148247a3f2469cd66c7a4df647e519efe7cd8855e45b5a70265a32
|
|
| MD5 |
71ab84d40b96b336ac300da739158372
|
|
| BLAKE2b-256 |
796b3bed28e53ae1eb01f1f8b05f22da11ece87e1cf97e339740c638066fd951
|
File details
Details for the file osint_news_deamon_pkg-0.1.0-py3-none-any.whl.
File metadata
- Download URL: osint_news_deamon_pkg-0.1.0-py3-none-any.whl
- Upload date:
- Size: 12.4 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.10.19
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
c9f83f38677d82489e04ea7c6c341071cd5bb31a54624db914c3448d58093e95
|
|
| MD5 |
97a2dab84e4e6e7c83183a981a1c6de3
|
|
| BLAKE2b-256 |
621e41e5b6a1a6eff4e1160c94f991d0795099687204c53a021e89599eb53228
|