Skip to main content

get title and text for given web url

Project description

web_article: A Python Article Scraper and CLI Tool

web_article is a Python package designed to scrape and parse articles from various web sources seamlessly. Accompanied by a command-line interface (CLI) tool, users can efficiently extract articles from given URLs and export them in multiple formats.

Features

  • Scrape articles from multiple supported URLs.
  • Export articles to various formats: JSON, JL, and CSV.
  • Intuitive command-line tool for quick extraction and export.
  • Flexible output options, including encoding, indentation, and delimiters.

Requirements

Python 3.x

Installation

python3 -m pip install web_article

Usage

Using the web_article CLI tool is straightforward. Here are some common use cases:

  1. Scraping a Single Article from URL:
web_article url1
  1. Scraping Multiple Articles from URLs:
web_article url1 url2 url3
  1. Scraping Articles from URLs Listed in a File:
web_article urls.txt
  1. Exporting Articles in JSON Format:
web_article urls.txt -o out.json -f json
  1. Exporting Articles in JSON Format with Indentation:
web_article urls.txt -o out.json -f json -i 2
  1. Exporting Articles in CSV Format:
web_article urls.txt -o out.csv -f csv
  1. Exporting in a Specific Encoding (e.g., UTF-8 with BOM for Office Excel):
web_article urls.txt -o out.csv -f csv -e utf-8-sig

Options:

-o, --outfile : The name of the output file. Defaults to standard output if not provided.

-f, --outfmt : The format for the output. Choices are 'jl', 'json', 'csv'. Defaults to 'jl'.

-d, --delimiter : Delimiter for CSV output. Default is ','.

-e, --encoding : Encoding for the output file. Default is 'utf-8'.

-i, --indent : Indentation for JSON output. An integer value.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

web_article-1.0.0.tar.gz (7.0 kB view details)

Uploaded Source

Built Distribution

web_article-1.0.0-py3-none-any.whl (7.9 kB view details)

Uploaded Python 3

File details

Details for the file web_article-1.0.0.tar.gz.

File metadata

  • Download URL: web_article-1.0.0.tar.gz
  • Upload date:
  • Size: 7.0 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.2 CPython/3.8.8

File hashes

Hashes for web_article-1.0.0.tar.gz
Algorithm Hash digest
SHA256 acb9e66162f85b8a92dce40f2569dead844bba9bc1aac95383e091dd2577ce47
MD5 98c343a409d13def71ad8ca4da9fa6d3
BLAKE2b-256 65fdd59078750a11fdfaacf7df6fec1d052c7a2fc6392ae88202a3f4ef709aee

See more details on using hashes here.

File details

Details for the file web_article-1.0.0-py3-none-any.whl.

File metadata

  • Download URL: web_article-1.0.0-py3-none-any.whl
  • Upload date:
  • Size: 7.9 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.2 CPython/3.8.8

File hashes

Hashes for web_article-1.0.0-py3-none-any.whl
Algorithm Hash digest
SHA256 f5d741e5df8cc54091b71a8675fda227b6e34057322857c39db23928ac71b5b8
MD5 047565d80e093a557bbcc954ae58f7c0
BLAKE2b-256 1cb297a993584033ce4281a28e0d0b64b52172840e38e6acebe3e62346fa02e6

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page