A brief description of what your package does
Project description
pyvigate
Pyvigate: A Python framework that combines headless browsing with LLMs that assists you in your data solutions, product tours, building RAG applications, web automation, functional testing, and many more!
Installation
Pyvigate can be installed using pip or directly from the source for the latest version.
Using pip
pip install pyvigate
Installing from source
git clone https://github.com/kindsmiles/pyvigate.git cd pyvigate pip install .
Components
Pyvigate consists of several key components designed to work together seamlessly for web automation tasks.
PlayWrightEngine:
PlayWright is one library we use for headless browsing and other browser automation tasks.
from pyvigate.core.engine import PlaywrightEngine
engine = PlaywrightEngine(headless=True)
await engine.start_browser()
LlamaIndexWrapper
LlamaIndexWrapper incorporates AI to dynamically detect web page elements, significantly improving the efficiency and reliability of automated interactions. It also can help the user navigate and also create their own applications, which involve curating data, creating RAG applications, product tour, functional testing, etc.
from pyvigate.ai.llama_index_wrapper import LlamaIndexWrapper
llama_index_wrapper = LlamaIndexWrapper(
api_key="your_api_key",
# Additional parameters
)
Login
Some products can be accessed by the browser only after the login. We can do this either manually identifying the login selectors or letting the AI detect the UI elements where the credentials can be passed.The Login component utilizes LlamaIndexWrapper to intelligently identify login forms and fields, streamlining the login process.
from pyvigate.core.login import Login
login = Login(llama_index_wrapper)
await login.perform_login(engine.page, "https://example.com/login", "username", "password")
Scraping
With Scraping, Pyvigate offers powerful data extraction capabilities, enabling the collection of content from web pages post-login or navigation.
from pyvigate.services.scraping import Scraping
scraping = Scraping(data_dir="data")
content = await scraping.extract_data_from_page(engine.page)
print("Scraped content:", content)
Caching
The Caching component allows for the local storage of web page content, facilitating offline analysis and reducing bandwidth usage.
from pyvigate.services.caching import Caching
caching = Caching(cache_dir="html_cache")
await caching.cache_page_content(engine.page, "https://example.com/page")
Full Example
Bringing it all together, here's how you can use Pyvigate to login, scrape content, and cache it:
import asyncio
from dotenv import load_dotenv
from pyvigate.core.engine import PlaywrightEngine
from pyvigate.core.login import Login
from pyvigate.services.scraping import Scraping
from pyvigate.services.caching import Caching
from pyvigate.ai.llama_index_wrapper import LlamaIndexWrapper
import os
load_dotenv()
async def login_and_scrape():
engine = PlaywrightEngine(headless=True)
await engine.start_browser()
llama_index_wrapper = LlamaIndexWrapper(api_key=os.getenv("OPENAI_API_KEY"))
login = Login(llama_index_wrapper)
await login.perform_login(engine.page, "https://example.com/login", os.getenv("USERNAME"), os.getenv("PASSWORD"))
scraping = Scraping(data_dir="data")
content = await scraping.extract_data_from_page(engine.page)
print("Scraped content:", content)
caching = Caching(cache_dir="html_cache")
await caching.cache_page_content(engine.page, "https://example.com/dashboard")
await engine.stop_browser()
if __name__ == "__main__":
asyncio.run(login_and_scrape())
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distributions
Built Distribution
File details
Details for the file pyvigate-0.0.1-py3-none-any.whl
.
File metadata
- Download URL: pyvigate-0.0.1-py3-none-any.whl
- Upload date:
- Size: 9.3 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/5.0.0 CPython/3.9.16
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | f9b8aab97138dfdea19eb7363108d4d071414107fa77becf58820f48654a4e54 |
|
MD5 | 8be30f793e96cf26b75de36a5818f772 |
|
BLAKE2b-256 | 2295af853a7be9abaa46170e028b7adac366e93c453def52ecc866e710088546 |