Skip to main content

Extract the main content from a webpage using Playwright, readability-lxml, and BeautifulSoup.

Project description

atai-web-tool

atai-web-tool is a command-line utility that extracts the main content from a webpage. It leverages zendriver, readability-lxml, and BeautifulSoup to fetch pages, extract primary content, and display a clean, text-only version.

Features

  • Headless Browsing: Fetch webpages using zendriver.
  • Content Extraction: Extract main content with readability-lxml.
  • Clean Output: Remove unwanted HTML tags using BeautifulSoup.
  • Easy CLI: Run from the terminal with a single command.

Installation

You can install atai-web-tool via pip:

pip install atai-web-tool

If you prefer to install from source, clone the repository and run:

pip install .

Usage

Extract the main content from a webpage by running:

atai-web-tool https://example.com

This command will open the specified URL, extract the primary content, and print it to the terminal.

Requirements

Development

For local development, install the required dependencies using:

pip install -r requirements.txt

Contributing

Contributions are welcome! Please fork the repository and submit a pull request with your improvements. For major changes, open an issue first to discuss what you would like to change.

License

This project is licensed under the MIT License. See the LICENSE file for details.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

atai_web_tool-0.0.7.tar.gz (5.3 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

atai_web_tool-0.0.7-py3-none-any.whl (5.7 kB view details)

Uploaded Python 3

File details

Details for the file atai_web_tool-0.0.7.tar.gz.

File metadata

  • Download URL: atai_web_tool-0.0.7.tar.gz
  • Upload date:
  • Size: 5.3 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.13.2

File hashes

Hashes for atai_web_tool-0.0.7.tar.gz
Algorithm Hash digest
SHA256 2c2121f558a32cae79715dfb7952be91658bc17e9bfea86cf47558b714eb73f4
MD5 3f483060d0be201969d9ab59c4e44b9a
BLAKE2b-256 788e0b8fbd48ab6334caa275baef1766f8a641fbd0cd27b39b3a3f94bdd1e41d

See more details on using hashes here.

File details

Details for the file atai_web_tool-0.0.7-py3-none-any.whl.

File metadata

  • Download URL: atai_web_tool-0.0.7-py3-none-any.whl
  • Upload date:
  • Size: 5.7 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.13.2

File hashes

Hashes for atai_web_tool-0.0.7-py3-none-any.whl
Algorithm Hash digest
SHA256 740f000d3f6dd942dce14a26cccb9066970c8131c69980ba627e70a724707423
MD5 c5de4ca9537dba06a01983971d0771da
BLAKE2b-256 11537338770dd4c4c8a7bf2653cbb16762622c7f560d48e744d4d2fbf9889b79

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page