Skip to main content

A web scraping and data extraction tool.

Project description

Auxn Agent

Overview

Auxn Agent is a web scraping and data extraction tool designed to automate the process of collecting information from websites. Built with modern Python async capabilities, it uses Playwright for browser automation and SQLite for efficient data storage.

Status: Alpha (v0.1.0)

Current test coverage: 84%

Key Features

  • ✅ Asynchronous web scraping with Playwright
  • ✅ Automatic pagination handling
  • ✅ SQLite database with SQLAlchemy ORM
  • ✅ Comprehensive test suite
  • ✅ Configurable logging system
  • ✅ Type-safe data models with Pydantic

Requirements

  • Python 3.10 or higher
  • Poetry for dependency management
  • System dependencies for Playwright
    # Ubuntu/Debian
    sudo apt-get install -y \
        libevent-2.1-7 \
        libavif16
    

Installation

  1. Clone the repository:

    git clone https://github.com/your-username/auxn-agent.git
    cd auxn-agent
    
  2. Install Poetry (if not already installed):

    curl -sSL https://install.python-poetry.org | python3 -
    
  3. Install dependencies:

    poetry install
    
  4. Install Playwright browsers:

    poetry run playwright install chromium
    
  5. Install browser dependencies:

    poetry run playwright install-deps
    

Running Tests

# Run all tests
poetry run pytest

# Run with coverage report
poetry run pytest --cov=src

# Run specific test file
poetry run pytest tests/test_scraper.py

Usage Example

from src.scraper.scraper_manager import ScraperManager
import asyncio

async def main():
    manager = ScraperManager()
    listings = await manager.scrape_listings(
        url="https://example.com",
        listing_selector=".listing",
        next_button_selector=".next-page"
    )
    print(f"Found {len(listings)} listings")

if __name__ == "__main__":
    asyncio.run(main())

Project Structure

auxn-agent/
├── src/
│   ├── database/        # Database models and CRUD operations
│   ├── models/          # Pydantic data models
│   ├── scraper/         # Web scraping logic
│   └── utils/           # Utilities and helpers
├── tests/              # Test suite
└── poetry.lock        # Dependency lock file

Contributing

  1. Fork the repository
  2. Create your feature branch (git checkout -b feature/amazing-feature)
  3. Run tests (poetry run pytest)
  4. Commit your changes (git commit -m 'Add amazing feature')
  5. Push to the branch (git push origin feature/amazing-feature)
  6. Open a Pull Request

License

MIT

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

auxn_agent-0.2.0.tar.gz (8.7 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

auxn_agent-0.2.0-py3-none-any.whl (11.5 kB view details)

Uploaded Python 3

File details

Details for the file auxn_agent-0.2.0.tar.gz.

File metadata

  • Download URL: auxn_agent-0.2.0.tar.gz
  • Upload date:
  • Size: 8.7 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.12.8

File hashes

Hashes for auxn_agent-0.2.0.tar.gz
Algorithm Hash digest
SHA256 fa449e2e1c2956cdcc1186e5df522f5d65b470efd023b06cfc8805cb0f66bc65
MD5 64db56256373f3707581d891892389c3
BLAKE2b-256 69bab83ccba64c7e4fda9417d4f2f3db733c7a4dc8a5589315813231c82e50bd

See more details on using hashes here.

Provenance

The following attestation bundles were made for auxn_agent-0.2.0.tar.gz:

Publisher: release.yml on remixonwin/auxn-agent

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file auxn_agent-0.2.0-py3-none-any.whl.

File metadata

  • Download URL: auxn_agent-0.2.0-py3-none-any.whl
  • Upload date:
  • Size: 11.5 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.12.8

File hashes

Hashes for auxn_agent-0.2.0-py3-none-any.whl
Algorithm Hash digest
SHA256 0e3921ab141d4de08f677294e2b02e8ecb38f7058f30effc7aa2b9fbd00c6fae
MD5 8c85e8db39e76e7402b549f6fd9dbcd4
BLAKE2b-256 3dfd3a30db7e1a6aea9bed8f91f0c8c803c54e3c52017175bcf7d7d6b42c6088

See more details on using hashes here.

Provenance

The following attestation bundles were made for auxn_agent-0.2.0-py3-none-any.whl:

Publisher: release.yml on remixonwin/auxn-agent

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page