Skip to main content

The ultimate library for data scientist to scrape data from https://www.lefaso.net

Project description

lefaso-net-scraper

PyPI version

Description

lefaso-net-scraper is a robust and versatile Python library designed to efficiently extract articles from the popular online news source in Burkina Faso, www.lefaso.net. This powerful scraping tool allows users to effortlessly collect article content and user comments on lefaso.net.

Important

This scraper, like other scrapers, is based on the structure of the target website. Changes to the website's structure can affect the scraper. We use automated workflows to detect these issues frequently, but we cannot catch all of them. Please report any issues you encounter and use the latest version.

JSON/dictionary Fields

Field Description
article_topic Category or subject of the article.
article_title The main headline or title of the article.
article_published_date Date when the article was published.
article_origin Source or platform where the article was published.
article_url Web link to the article.
article_content Full text or body of the article.
article_comments Feedback or responses from readers.

Installation

  • Using pip
pip install --upgrade  lefaso-net-scraper

# For jupiter support
pip install --upgrade  lefaso-net-scraper[notebook]
  • Using poetry
poetry add lefaso-net-scraper

# For jupiter support
poetry add lefaso-net-scraper[notebook]

Usage

# coding: utf-8

from lefaso_net_scraper import LefasoNetScraper

section_url = 'https://lefaso.net/spip.php?rubrique473'
scraper = LefasoNetScraper(section_url)
data = scraper.run()
  • Settings Pagination range
# coding: utf-8

from lefaso_net_scraper import LefasoNetScraper

section_url = 'https://lefaso.net/spip.php?rubrique473'
scraper = LefasoNetScraper(section_url)
scraper.set_pagination_range(start=20, stop=100)
data = scraper.run()
  • Save data to csv
# coding: utf-8

from lefaso_net_scraper import LefasoNetScraper
import pandas as pd

section_url = 'https://lefaso.net/spip.php?rubrique473'
scraper = LefasoNetScraper(section_url)
data = scraper.run()
df = pd.DataFrame.from_records(data)
df.to_csv('path/to/df.csv')

We ❤ open source

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

lefaso_net_scraper-0.4.0.tar.gz (16.6 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

lefaso_net_scraper-0.4.0-py3-none-any.whl (17.9 kB view details)

Uploaded Python 3

File details

Details for the file lefaso_net_scraper-0.4.0.tar.gz.

File metadata

  • Download URL: lefaso_net_scraper-0.4.0.tar.gz
  • Upload date:
  • Size: 16.6 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: poetry/1.8.5 CPython/3.10.12 Linux/6.5.0-1025-azure

File hashes

Hashes for lefaso_net_scraper-0.4.0.tar.gz
Algorithm Hash digest
SHA256 c7185d8a04b95214407f1409fd5e724e773144b2a6645e3f172b867829213611
MD5 48d3855b331aba1e54992dd0b2699f9a
BLAKE2b-256 2f0099a6fee8bbb0aabe1c1db3da5bea3b07ab6334fccf05979798a1185f59fc

See more details on using hashes here.

File details

Details for the file lefaso_net_scraper-0.4.0-py3-none-any.whl.

File metadata

  • Download URL: lefaso_net_scraper-0.4.0-py3-none-any.whl
  • Upload date:
  • Size: 17.9 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: poetry/1.8.5 CPython/3.10.12 Linux/6.5.0-1025-azure

File hashes

Hashes for lefaso_net_scraper-0.4.0-py3-none-any.whl
Algorithm Hash digest
SHA256 63d07b6ae7cbcbdeb2eb976c2c6981d3fb41162b1a538bcc1e8bfb0703c900e3
MD5 248ec8a3bfa1ecc3d82130df0b776488
BLAKE2b-256 245be0a16bdc3b8b6233a9292340ab587843d87af4ba0e9a2e161ed030905c07

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page