A web scraping and data extraction tool.
Project description
Auxn Agent
Overview
Auxn Agent is a web scraping and data extraction tool designed to automate the process of collecting information from websites. Built with modern Python async capabilities, it uses Playwright for browser automation and SQLite for efficient data storage.
Status: Alpha (v0.1.0)
Current test coverage: 84%
Key Features
- ✅ Asynchronous web scraping with Playwright
- ✅ Automatic pagination handling
- ✅ SQLite database with SQLAlchemy ORM
- ✅ Comprehensive test suite
- ✅ Configurable logging system
- ✅ Type-safe data models with Pydantic
Requirements
- Python 3.10 or higher
- Poetry for dependency management
- System dependencies for Playwright
# Ubuntu/Debian sudo apt-get install -y \ libevent-2.1-7 \ libavif16
Installation
-
Clone the repository:
git clone https://github.com/your-username/auxn-agent.git cd auxn-agent
-
Install Poetry (if not already installed):
curl -sSL https://install.python-poetry.org | python3 -
-
Install dependencies:
poetry install -
Install Playwright browsers:
poetry run playwright install chromium
-
Install browser dependencies:
poetry run playwright install-deps
Running Tests
# Run all tests
poetry run pytest
# Run with coverage report
poetry run pytest --cov=src
# Run specific test file
poetry run pytest tests/test_scraper.py
Usage Example
from src.scraper.scraper_manager import ScraperManager
import asyncio
async def main():
manager = ScraperManager()
listings = await manager.scrape_listings(
url="https://example.com",
listing_selector=".listing",
next_button_selector=".next-page"
)
print(f"Found {len(listings)} listings")
if __name__ == "__main__":
asyncio.run(main())
Project Structure
auxn-agent/
├── src/
│ ├── database/ # Database models and CRUD operations
│ ├── models/ # Pydantic data models
│ ├── scraper/ # Web scraping logic
│ └── utils/ # Utilities and helpers
├── tests/ # Test suite
└── poetry.lock # Dependency lock file
Contributing
- Fork the repository
- Create your feature branch (
git checkout -b feature/amazing-feature) - Run tests (
poetry run pytest) - Commit your changes (
git commit -m 'Add amazing feature') - Push to the branch (
git push origin feature/amazing-feature) - Open a Pull Request
License
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file auxn_agent-0.2.0.tar.gz.
File metadata
- Download URL: auxn_agent-0.2.0.tar.gz
- Upload date:
- Size: 8.7 kB
- Tags: Source
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.12.8
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
fa449e2e1c2956cdcc1186e5df522f5d65b470efd023b06cfc8805cb0f66bc65
|
|
| MD5 |
64db56256373f3707581d891892389c3
|
|
| BLAKE2b-256 |
69bab83ccba64c7e4fda9417d4f2f3db733c7a4dc8a5589315813231c82e50bd
|
Provenance
The following attestation bundles were made for auxn_agent-0.2.0.tar.gz:
Publisher:
release.yml on remixonwin/auxn-agent
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
auxn_agent-0.2.0.tar.gz -
Subject digest:
fa449e2e1c2956cdcc1186e5df522f5d65b470efd023b06cfc8805cb0f66bc65 - Sigstore transparency entry: 169139774
- Sigstore integration time:
-
Permalink:
remixonwin/auxn-agent@dea1a49c072027ea2aec3983b9d8d3fb5e000973 -
Branch / Tag:
refs/heads/main - Owner: https://github.com/remixonwin
-
Access:
private
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
release.yml@dea1a49c072027ea2aec3983b9d8d3fb5e000973 -
Trigger Event:
workflow_dispatch
-
Statement type:
File details
Details for the file auxn_agent-0.2.0-py3-none-any.whl.
File metadata
- Download URL: auxn_agent-0.2.0-py3-none-any.whl
- Upload date:
- Size: 11.5 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.12.8
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
0e3921ab141d4de08f677294e2b02e8ecb38f7058f30effc7aa2b9fbd00c6fae
|
|
| MD5 |
8c85e8db39e76e7402b549f6fd9dbcd4
|
|
| BLAKE2b-256 |
3dfd3a30db7e1a6aea9bed8f91f0c8c803c54e3c52017175bcf7d7d6b42c6088
|
Provenance
The following attestation bundles were made for auxn_agent-0.2.0-py3-none-any.whl:
Publisher:
release.yml on remixonwin/auxn-agent
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
auxn_agent-0.2.0-py3-none-any.whl -
Subject digest:
0e3921ab141d4de08f677294e2b02e8ecb38f7058f30effc7aa2b9fbd00c6fae - Sigstore transparency entry: 169139775
- Sigstore integration time:
-
Permalink:
remixonwin/auxn-agent@dea1a49c072027ea2aec3983b9d8d3fb5e000973 -
Branch / Tag:
refs/heads/main - Owner: https://github.com/remixonwin
-
Access:
private
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
release.yml@dea1a49c072027ea2aec3983b9d8d3fb5e000973 -
Trigger Event:
workflow_dispatch
-
Statement type: