A python package to scrape data from Ghana News Portals
Project description
GhanaNews Scraper
A simple unofficial python package to scrape data from Ghanaweb and MyJoyOnline. Affiliated to bank-of-ghana-fx-rates
How to install
pip install ghananews-scraper
Warning: DO NOT RUN IN ONLINE JUPYTERNOTEBOOKS eg. Colabs)
GhanaWeb Urls:
urls = [
"https://www.ghanaweb.com/GhanaHomePage/regional/"
"https://www.ghanaweb.com/GhanaHomePage/editorial/"
"https://www.ghanaweb.com/GhanaHomePage/health/"
"https://www.ghanaweb.com/GhanaHomePage/diaspora/"
"https://www.ghanaweb.com/GhanaHomePage/tabloid/"
"https://www.ghanaweb.com/GhanaHomePage/africa/"
"https://www.ghanaweb.com/GhanaHomePage/religion/"
"https://www.ghanaweb.com/GhanaHomePage/NewsArchive/"
"https://www.ghanaweb.com/GhanaHomePage/business/"
"https://www.ghanaweb.com/GhanaHomePage/SportsArchive/"
"https://www.ghanaweb.com/GhanaHomePage/entertainment/"
"https://www.ghanaweb.com/GhanaHomePage/africa/"
"https://www.ghanaweb.com/GhanaHomePage/television/"
]
Usage
from ghanaweb.scraper import GhanaWeb
url = 'https://www.ghanaweb.com/GhanaHomePage/politics/'
# url = 'https://www.ghanaweb.com/GhanaHomePage/health/'
# url = 'https://www.ghanaweb.com/GhanaHomePage/crime/'
# url = 'https://www.ghanaweb.com/GhanaHomePage/regional/'
# url = 'https://www.ghanaweb.com/GhanaHomePage/year-in-review/'
# web = GhanaWeb(url='https://www.ghanaweb.com/GhanaHomePage/politics/')
web = GhanaWeb(url=url)
# scrape data and save to `current working dir`
web.download(output_dir=None)
scrape list of articles from GhanaWeb
from ghanaweb.scraper import GhanaWeb
urls = [
'https://www.ghanaweb.com/GhanaHomePage/politics/',
'https://www.ghanaweb.com/GhanaHomePage/health/',
'https://www.ghanaweb.com/GhanaHomePage/crime/',
'https://www.ghanaweb.com/GhanaHomePage/regional/',
'https://www.ghanaweb.com/GhanaHomePage/year-in-review/'
]
for url in urls:
print(f"Downloading: {url}")
web = GhanaWeb(url=url)
# download to current working directory
# if no location is specified
# web.download(output_dir="/Users/tsiameh/Desktop/")
web.download(output_dir=None)
Scrape data from MyJoyOnline
from myjoyonline.scraper import MyJoyOnline
url = 'https://www.myjoyonline.com/news/',
print(f"Downloading data from: {url}")
joy = MyJoyOnline(url=url)
# download to current working directory
# if no location is specified
# joy.download(output_dir="/Users/tsiameh/Desktop/")
joy.download()
from myjoyonline.scraper import MyJoyOnline
urls = [
'https://www.myjoyonline.com/news/',
'https://www.myjoyonline.com/entertainment/',
'https://www.myjoyonline.com/business/',
'https://www.myjoyonline.com/sports/',
'https://www.myjoyonline.com/opinion/'
]
for url in urls:
print(f"Downloading data from: {url}")
joy = MyJoyOnline(url=url)
# download to current working directory
# if no location is specified
# joy.download(output_dir="/Users/tsiameh/Desktop/")
joy.download()
BuyMeCoffee
Credits
Theophilus Siameh
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Close
Hashes for ghananews_scraper-1.0.4-py3-none-any.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | ee1cea63a26a3b50d39114eedb482a4ac5c326569d4f57087d34d702ce5ad94e |
|
MD5 | c07e668573c1a34b109b816b7f0866dd |
|
BLAKE2b-256 | 98efdfd1d37e263344ba3298d87cbc24e572e7a257a018fe24973ba56ee804a3 |