get title and text for given web url
Project description
web_article: A Python Article Scraper and CLI Tool
web_articleis a Python package designed to scrape and parse articles from various web sources seamlessly. Accompanied by a command-line interface (CLI) tool, users can efficiently extract articles from given URLs and export them in multiple formats.
Features
- Scrape articles from multiple supported URLs.
- Export articles to various formats: JSON, JL, and CSV.
- Intuitive command-line tool for quick extraction and export.
- Flexible output options, including encoding, indentation, and delimiters.
Requirements
Python 3.x
Installation
python3 -m pip install web_article
Usage
Using the web_article CLI tool is straightforward. Here are some common use cases:
- Scraping a Single Article from URL:
web_article url1
- Scraping Multiple Articles from URLs:
web_article url1 url2 url3
- Scraping Articles from URLs Listed in a File:
web_article urls.txt
- Exporting Articles in JSON Format:
web_article urls.txt -o out.json -f json
- Exporting Articles in JSON Format with Indentation:
web_article urls.txt -o out.json -f json -i 2
- Exporting Articles in CSV Format:
web_article urls.txt -o out.csv -f csv
- Exporting in a Specific Encoding (e.g., UTF-8 with BOM for Office Excel):
web_article urls.txt -o out.csv -f csv -e utf-8-sig
Options:
-o, --outfile : The name of the output file. Defaults to standard output if not provided.
-f, --outfmt : The format for the output. Choices are 'jl', 'json', 'csv'. Defaults to 'jl'.
-d, --delimiter : Delimiter for CSV output. Default is ','.
-e, --encoding : Encoding for the output file. Default is 'utf-8'.
-i, --indent : Indentation for JSON output. An integer value.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file web_article-1.0.0.tar.gz.
File metadata
- Download URL: web_article-1.0.0.tar.gz
- Upload date:
- Size: 7.0 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/4.0.2 CPython/3.8.8
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
acb9e66162f85b8a92dce40f2569dead844bba9bc1aac95383e091dd2577ce47
|
|
| MD5 |
98c343a409d13def71ad8ca4da9fa6d3
|
|
| BLAKE2b-256 |
65fdd59078750a11fdfaacf7df6fec1d052c7a2fc6392ae88202a3f4ef709aee
|
File details
Details for the file web_article-1.0.0-py3-none-any.whl.
File metadata
- Download URL: web_article-1.0.0-py3-none-any.whl
- Upload date:
- Size: 7.9 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/4.0.2 CPython/3.8.8
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
f5d741e5df8cc54091b71a8675fda227b6e34057322857c39db23928ac71b5b8
|
|
| MD5 |
047565d80e093a557bbcc954ae58f7c0
|
|
| BLAKE2b-256 |
1cb297a993584033ce4281a28e0d0b64b52172840e38e6acebe3e62346fa02e6
|