Skip to main content

Extract the main content from a webpage using Playwright, readability-lxml, and BeautifulSoup.

Project description

atai-web-tool

atai-web-tool is a command-line utility that extracts the main content from a webpage. It leverages zendriver, readability-lxml, and BeautifulSoup to fetch pages, extract primary content, and display a clean, text-only version.

Features

  • Headless Browsing: Fetch webpages using zendriver.
  • Content Extraction: Extract main content with readability-lxml.
  • Clean Output: Remove unwanted HTML tags using BeautifulSoup.
  • Easy CLI: Run from the terminal with a single command.

Installation

You can install atai-web-tool via pip:

pip install atai-web-tool

If you prefer to install from source, clone the repository and run:

pip install .

Usage

Extract the main content from a webpage by running:

atai-web-tool https://example.com

This command will open the specified URL, extract the primary content, and print it to the terminal.

Requirements

Development

For local development, install the required dependencies using:

pip install -r requirements.txt

Contributing

Contributions are welcome! Please fork the repository and submit a pull request with your improvements. For major changes, open an issue first to discuss what you would like to change.

License

This project is licensed under the MIT License. See the LICENSE file for details.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

atai_web_tool-0.0.9.tar.gz (5.4 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

atai_web_tool-0.0.9-py3-none-any.whl (5.8 kB view details)

Uploaded Python 3

File details

Details for the file atai_web_tool-0.0.9.tar.gz.

File metadata

  • Download URL: atai_web_tool-0.0.9.tar.gz
  • Upload date:
  • Size: 5.4 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.13.2

File hashes

Hashes for atai_web_tool-0.0.9.tar.gz
Algorithm Hash digest
SHA256 a2c8a22056df8da563143ea020b668e347c3a2e45fd0c35c5fa991ace0de9c87
MD5 ae8d8ea56991797addb4faefd219e13f
BLAKE2b-256 8e290eddf957f71371f121ff96fef02ac70274aedca49dfbb422421a48a72ef5

See more details on using hashes here.

File details

Details for the file atai_web_tool-0.0.9-py3-none-any.whl.

File metadata

  • Download URL: atai_web_tool-0.0.9-py3-none-any.whl
  • Upload date:
  • Size: 5.8 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.13.2

File hashes

Hashes for atai_web_tool-0.0.9-py3-none-any.whl
Algorithm Hash digest
SHA256 6a94f5419de56858a892a777680da3e5a8b0ef64ac72f27b05f0e038d339873f
MD5 a3389197a9fb6a7f31eb88e13a9de559
BLAKE2b-256 eacd0a94d274dafec3cb59ca6bf00b0ca3f0ef77bec139c612e9a35047eeeee4

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page