Dynamic AI
Project description
DynamicAI
dynai is a Python library that allows you to scrape web pages, cleanse data and save the content to a file. Currently it provides a simple interface for web scraping and file management. You can use PyScraper to fetch the HTML content of a webpage, save it to a file, and clean up the output directory by keeping only the most recent files. In the future more features will be added
Installation
You can install dynai using pip
:
pip install dynai
Usage
Importing the Library
from dynai import core
Initialising the Scraper
url = "https://www.example.com"
scraper = core(url)
Scraping the Webpage
You can scrape the webpage and get the HTML content as a string:
webpage_content = scraper.scrape()
Saving the Webpage to a File
You can save the scraped webpage to a file in the output directory with a specified file extension (e.g., .html
):
extension = ".html"
scraper.scrape_to_output(extension)
Setting Custom Name and Output Directory
You can set a custom name for the scraped content and specify the output directory:
scraper.set_name("custom_name")
scraper.set_output_dir("custom_output_directory")
Cleaning up the Output Directory
You can clean up the output directory and keep a specified number of most recent files:
keep_files = 3
scraper.cleanup(keep_files)
Documentation
class core(url)
Methods
set_name(name)
: Sets the name of the core instance.get_name() -> str
: Returns the name of the core instance.set_output_dir(out)
: Sets the output directory of the core.get_output_dir() -> str
: Returns the path to the output directory.scrape() -> str
: Scrapes the webpage defined in the constructor and returns the whole page as a string.scrape_to_output(extension)
: Scrapes the webpage and saves it to a file in the output directory with the specified extension.cleanup(keepme=0)
: Cleans the output directory and keeps the specified number of most recent files.
Example
from dynai import core
url = "https://www.example.com"
scraper = core(url)
webpage_content = scraper.scrape()
scraper.set_output_dir("output")
scraper.scrape_to_output(".html")
scraper.cleanup(3)
This example scrapes the webpage, saves it to an HTML file in the "output" directory, and keeps the 3 most recent files in the output directory.
License
This project is licensed under the MIT License - see the LICENSE file for details.
Acknowledgments
- Beautiful Soup: For HTML parsing.
- Requests: For making HTTP requests.
- Validators: For URL validation.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
File details
Details for the file dynai-0.1.1.tar.gz
.
File metadata
- Download URL: dynai-0.1.1.tar.gz
- Upload date:
- Size: 4.2 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/4.0.2 CPython/3.11.4
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 |
a5ecd30c0ecac34154def3c9edbaccb228b91d169362cd48709936bf5b3461f1
|
|
MD5 |
29f85b67be6b75b0b6550e1994151950
|
|
BLAKE2b-256 |
605b1c33a68b40ffa53ba9593dd9bd8bb9448d852472f86a80ed9a3586b72945
|
File details
Details for the file dynai-0.1.1-py3-none-any.whl
.
File metadata
- Download URL: dynai-0.1.1-py3-none-any.whl
- Upload date:
- Size: 4.5 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/4.0.2 CPython/3.11.4
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 |
67c786e198c057d3e52cc29246ec69890944700a08a595cb272728c6a9f6d7c1
|
|
MD5 |
4edd26965ae6266fe4a542f24d493dc1
|
|
BLAKE2b-256 |
a726a49bee56fdabef5abfe0c007b7e5dcd8d8c42236fc060a9ee89cc49c11f2
|