Skip to main content

Extract the main content from a webpage using Playwright, readability-lxml, and BeautifulSoup.

Project description

atai-web-tool

atai-web-tool is a command-line utility that extracts the main content from a webpage. It leverages zendriver, readability-lxml, and BeautifulSoup to fetch pages, extract primary content, and display a clean, text-only version.

Features

  • Headless Browsing: Fetch webpages using zendriver.
  • Content Extraction: Extract main content with readability-lxml.
  • Clean Output: Remove unwanted HTML tags using BeautifulSoup.
  • Easy CLI: Run from the terminal with a single command.

Installation

You can install atai-web-tool via pip:

pip install atai-web-tool

If you prefer to install from source, clone the repository and run:

pip install .

Usage

Extract the main content from a webpage by running:

atai-web-tool https://example.com

This command will open the specified URL, extract the primary content, and print it to the terminal.

Requirements

Development

For local development, install the required dependencies using:

pip install -r requirements.txt

Contributing

Contributions are welcome! Please fork the repository and submit a pull request with your improvements. For major changes, open an issue first to discuss what you would like to change.

License

This project is licensed under the MIT License. See the LICENSE file for details.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

atai_web_tool-0.0.8.tar.gz (5.3 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

atai_web_tool-0.0.8-py3-none-any.whl (5.7 kB view details)

Uploaded Python 3

File details

Details for the file atai_web_tool-0.0.8.tar.gz.

File metadata

  • Download URL: atai_web_tool-0.0.8.tar.gz
  • Upload date:
  • Size: 5.3 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.13.2

File hashes

Hashes for atai_web_tool-0.0.8.tar.gz
Algorithm Hash digest
SHA256 85923040a1696e005dd84a7d5049744ccfc14f5d920b9edca51282ab97155c70
MD5 a93a2970edc2cfdaed394d0d788d6282
BLAKE2b-256 096c653493b0d55a9b69e7bbd4c94bb2ff197db5cb806e76c198196acdaf26cf

See more details on using hashes here.

File details

Details for the file atai_web_tool-0.0.8-py3-none-any.whl.

File metadata

  • Download URL: atai_web_tool-0.0.8-py3-none-any.whl
  • Upload date:
  • Size: 5.7 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.13.2

File hashes

Hashes for atai_web_tool-0.0.8-py3-none-any.whl
Algorithm Hash digest
SHA256 ab6988bf3db23ea73b5a2205542e90f9ad0060fa6dbb5e4f129af3aa5f0b42a0
MD5 1b0697100242218840751a1344e24216
BLAKE2b-256 8d9d15182f1c0edff670b174180187a083dad52c4d74ea0f7d642d03ebfc3858

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page