Skip to main content

A package for crawling and converting web content to Markdown

Project description

UpToDateAI

UpToDateAI is a Python package designed to fetch and provide the latest documentation about recently released programming frameworks to AI models. This package helps bridge the gap between AI model training cut-off dates and the latest developments in the programming world.

Installation

You can install UpToDateAI using pip:

pip install uptodateai

Usage

URL of the website you want to crawl:

from uptodateai import process_docs

process_docs("https://docs.fastht.ml/")

This will crawl the specified website and save the content as Markdown files in a docs directory.

Features

  • Web crawling using Scrapy
  • Content extraction using newspaper3k
  • HTML to Markdown conversion
  • Automatic directory structure creation based on URL paths
  • Customizable crawling settings

Development

To set up the development environment:

  1. Clone the repository
  2. Install dependencies: pip install -r requirements.txt
  3. Run tests: python -m unittest discover tests

License

This project is licensed under the MIT License. See the LICENSE file for details.

Contributing

Contributions are welcome! Please open an issue or submit a pull request.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

uptodateai-0.1.0.tar.gz (4.6 kB view hashes)

Uploaded Source

Built Distribution

uptodateai-0.1.0-py3-none-any.whl (4.9 kB view hashes)

Uploaded Python 3

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page