Extract the main content from a webpage using Playwright, readability-lxml, and BeautifulSoup.
Project description
atai-web-tool
atai-web-tool is a command-line utility that extracts the main content from a webpage. It leverages zendriver, readability-lxml, and BeautifulSoup to fetch pages, extract primary content, and display a clean, text-only version.
Features
- Headless Browsing: Fetch webpages using zendriver.
- Content Extraction: Extract main content with readability-lxml.
- Clean Output: Remove unwanted HTML tags using BeautifulSoup.
- Easy CLI: Run from the terminal with a single command.
Installation
You can install atai-web-tool via pip:
pip install atai-web-tool
If you prefer to install from source, clone the repository and run:
pip install .
Usage
Extract the main content from a webpage by running:
atai-web-tool https://example.com
This command will open the specified URL, extract the primary content, and print it to the terminal.
Requirements
- Python 3.6 or higher
- zendriver
- readability-lxml
- BeautifulSoup4
- lxml[html_clean]
Development
For local development, install the required dependencies using:
pip install -r requirements.txt
Contributing
Contributions are welcome! Please fork the repository and submit a pull request with your improvements. For major changes, open an issue first to discuss what you would like to change.
License
This project is licensed under the MIT License. See the LICENSE file for details.
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file atai_web_tool-0.0.8.tar.gz.
File metadata
- Download URL: atai_web_tool-0.0.8.tar.gz
- Upload date:
- Size: 5.3 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.1.0 CPython/3.13.2
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
85923040a1696e005dd84a7d5049744ccfc14f5d920b9edca51282ab97155c70
|
|
| MD5 |
a93a2970edc2cfdaed394d0d788d6282
|
|
| BLAKE2b-256 |
096c653493b0d55a9b69e7bbd4c94bb2ff197db5cb806e76c198196acdaf26cf
|
File details
Details for the file atai_web_tool-0.0.8-py3-none-any.whl.
File metadata
- Download URL: atai_web_tool-0.0.8-py3-none-any.whl
- Upload date:
- Size: 5.7 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.1.0 CPython/3.13.2
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
ab6988bf3db23ea73b5a2205542e90f9ad0060fa6dbb5e4f129af3aa5f0b42a0
|
|
| MD5 |
1b0697100242218840751a1344e24216
|
|
| BLAKE2b-256 |
8d9d15182f1c0edff670b174180187a083dad52c4d74ea0f7d642d03ebfc3858
|