Skip to main content

Extract the main content from a webpage using zendriver, readability-lxml, and BeautifulSoup.

Project description

atai-web-tool

atai-web-tool is a command-line utility that extracts the main content from a webpage. It leverages zendriver, readability-lxml, and BeautifulSoup to fetch pages, extract primary content, and display a clean, text-only version.

Features

  • Headless Browsing: Fetch webpages using zendriver.
  • Content Extraction: Extract main content with readability-lxml.
  • Clean Output: Remove unwanted HTML tags using BeautifulSoup.
  • Easy CLI: Run from the terminal with a single command.

Installation

You can install atai-web-tool via pip:

pip install atai-web-tool

If you prefer to install from source, clone the repository and run:

pip install .

Usage

Extract the main content from a webpage by running:

atai-web-tool https://example.com

This command will open the specified URL, extract the primary content, and print it to the terminal.

Requirements

Development

For local development, install the required dependencies using:

pip install -r requirements.txt

Contributing

Contributions are welcome! Please fork the repository and submit a pull request with your improvements. For major changes, open an issue first to discuss what you would like to change.

License

This project is licensed under the MIT License. See the LICENSE file for details.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

atai_web_tool-0.0.2.tar.gz (3.7 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

atai_web_tool-0.0.2-py3-none-any.whl (4.9 kB view details)

Uploaded Python 3

File details

Details for the file atai_web_tool-0.0.2.tar.gz.

File metadata

  • Download URL: atai_web_tool-0.0.2.tar.gz
  • Upload date:
  • Size: 3.7 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.13.2

File hashes

Hashes for atai_web_tool-0.0.2.tar.gz
Algorithm Hash digest
SHA256 9d2c780d5d07ce941354c75b1e709d39831f68fbe23ff1bc74ca396dbff0779a
MD5 5025c08a8df3e941ca2d6ad4105dd6c3
BLAKE2b-256 391afd943137865a2a966558e783df60894731bead4c17893429f288efb9b5cd

See more details on using hashes here.

File details

Details for the file atai_web_tool-0.0.2-py3-none-any.whl.

File metadata

  • Download URL: atai_web_tool-0.0.2-py3-none-any.whl
  • Upload date:
  • Size: 4.9 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.13.2

File hashes

Hashes for atai_web_tool-0.0.2-py3-none-any.whl
Algorithm Hash digest
SHA256 ca9a4cbc10c15bd81d08709a1c9583795aff73b63fe07b3de23fbec3a59cb987
MD5 68a421e5cedfea2eade9c296c604db51
BLAKE2b-256 7e6acb3124437a0bb7a9786351a900597ebed38c3a6ea6f2977065b473d09ff9

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page