Skip to main content

Acon is the intelligence layer for any web scraper. Pair it with Scrapling, Playwright, or httpx to crawl smarter.

Project description

Acon Logo

Acon — The Intelligent Brain for Any Scraper

Acon doesn't replace Scrapling or Firecrawl. It tells them where to look.


Why Acon?

Most crawlers are dumb. They follow links blindly, return raw HTML, and break the moment a site changes its structure. Before you can extract anything useful, you need to understand what you're dealing with.

Acon is a site intelligence engine. It maps the structural "skeleton" of a website automatically — before any data extraction happens — so your scraper always knows where to look.


🏗️ The Core Thesis

Most modern web scrapers suffer from "URL Exhaustion"—they spend 90% of their bandwidth fetching identical product or blog pages. Acon introduces a Topology Orchestrator that maps, classifies, and samples site structures to find the "Skeleton" of a site before you spend a cent on proxies.

💰 Acon vs. Scrapling (The 1:1 Battle)

Metric Scrapling Alone (Blind) Acon + Scrapling (Brain)
Pages Crawled 1,000 40
Time Taken 870s (14.5 min) 111s (1.8 min)
Bandwidth Used 20.72 MB 1.39 MB
Est. Proxy Cost $1.000 $0.040
Structural DNA 4/4 Found 4/4 Found

96% less crawling. 25x faster structural discovery. Measured on books.toscrape.com.


📊 Elite Benchmarks: Real-World Performance

We tested Acon against a standard BFS crawler on complex, live targets with a shared 50-page budget to measure discovery quality vs. brute force.

Target Request Reduction Discovery Yield (DNA) Outcome
Next.js Showcase 68% Reduction 5/5 Templates Identified PASS
The Hindu (News) 40% Reduction 8 vs 4 Templates Found 🏆 ELITE
books.toscrape 0% (Static Parity) 5 vs 4 Templates Found PASS
Flipkart Mobiles Budget Equalized 8 vs 8 Templates Found ⚖️ STABLE

🧠 The "Brain" Advantage

  • News Sites: Acon finds 2x more structural variations (DNA) than a blind crawler by understanding category vs. article patterns.
  • SPAs: Acon reaches structural saturation on React/Next.js sites 3x faster than standard tools by navigating the virtual DOM.
  • Honest Limitations: On simple static sites, Acon's "Brain" matches BFS but adds rendering overhead. Acon is an Intelligence Engine for complex sites, not a replacement for basic fetchers on simple blogs.

🚀 Use Cases

Price Monitoring & E-Commerce Intelligence
Acon detects pagination patterns and repeating product templates automatically. No manual selector configuration per site.

Content Archival & Research
Feed Acon a publication's root URL. It identifies the site's content structure, prioritizes article pages over navigation noise, and hands you a clean discovery map.

Site Auditing & SEO Analysis
Get an instant structural report — template count, link depth, topology classification (SPA vs static vs paginated) — in a single run.


⚡ What Makes Acon Different

Capability Typical Crawler Acon
JS-rendered sites Manual Playwright setup Autonomous escalation
Site structure Unknown until scraped Detected before extraction
Large site performance Degrades at scale O(log N) priority queue
Bandwidth efficiency Downloads everything Asset blocking (Discovery mode)
Discovery Latency Static only Static-First Hybrid Escalation
Failed crawls Lost progress SQLite resumption (WAL)

🏗️ The Efficiency Pillars

Acon is optimized for production environments where every request costs money:

  • Static-First Discovery: Acon probes pages with raw HTTP first. It only launches a browser if the site is a SPA, saving 90% of compute on standard sites.
  • 🚫 Intelligent Asset Blocking: During discovery, Acon automatically aborts requests for images, fonts, and CSS to slash bandwidth and CPU usage.
  • 📉 Debounced Topology Detection: Structural analysis (DNA mapping) is throttled to key milestones (1, 10, 25, 50 pages) to ensure max throughput.

🏗️ The Unified Intelligence Stack (The Acon Alliance)

Acon doesn't just map sites; it orchestrates the most powerful open-source scraping tools into a single, high-fidelity pipeline.

  • 🕵️ Stealth (Camoufox): Enable use_stealth=True to launch an "invisible" browser engine that bypasses Cloudflare and Akamai automatically.
  • 📄 Content (Trafilatura): Enable extract_content=True to get clean, LLM-ready Markdown from every discovered page natively.
  • 🚀 Speed (Scrapling): Use the scrapling_adapter to export Acon's "DNA Map" into Scrapling for turbo-charged mass extraction at 10x standard speeds.

🛠️ Installation

pip install acon-intel

# To enable the Alliance pillars (Highly Recommended)
pip install trafilatura camoufox scrapling
playwright install chromium

⚡ Quick Start (The Alliance Stack)

import asyncio
from acon import SiteCrawlOrchestrator, CrawlConfig

async def main():
    # Acon discovers the 'skeleton', Trafilatura extracts the 'flesh'
    # Camoufox provides the 'stealth'
    config = CrawlConfig(
        max_pages=10,
        extract_content=True, # Pillar 1: Trafilatura
        use_stealth=True      # Pillar 2: Camoufox
    )
    
    brain = SiteCrawlOrchestrator()
    result = await brain.crawl_site("https://news.ycombinator.com", config)
    
    for page in result["page_summaries"]:
        print(f"URL: {page['url']}")
        if page['content']:
            print(f"Markdown: {page['content'][:100]}...")
            
if __name__ == "__main__":
    asyncio.run(main())

📦 The Output Shape

Acon returns a structured SiteCrawlResult containing everything needed for downstream extraction:

{
  "topology": "paginated",
  "pages_crawled": 42,
  "page_summaries": [
    {
      "url": "https://example.com/p/123",
      "page_type": "standard",
      "js_required": false,
      "content": "# Extracted Markdown Content...",
      "parent_url": "https://example.com/list"
    }
  ],
  "crawl_meta": {
    "reflection": {
      "intelligence_score": 0.85,
      "advice": "Continue current strategy."
    }
  }
}

🛣️ Roadmap

  • Stealth Integration: Native support for Camoufox (Fingerprint bypass).
  • LLM-Ready Pipeline: Native Trafilatura integration for high-fidelity Markdown output.
  • Speed Pillar: Official Scrapling adapter for mass extraction.
  • Discovery API: Expose Acon as a standalone Discovery microservice.

Acon: The connective tissue of the intelligent web.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

acon_intel-0.1.2.tar.gz (59.5 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

acon_intel-0.1.2-py3-none-any.whl (62.0 kB view details)

Uploaded Python 3

File details

Details for the file acon_intel-0.1.2.tar.gz.

File metadata

  • Download URL: acon_intel-0.1.2.tar.gz
  • Upload date:
  • Size: 59.5 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.13.12

File hashes

Hashes for acon_intel-0.1.2.tar.gz
Algorithm Hash digest
SHA256 73aad111d3c9144f85639135bec3781fe88e5f96569fa15dc2a4a396e444fa57
MD5 294103764bf765081b720df61bef1f5e
BLAKE2b-256 bc66a6b7366823ea8a71048a5392d1465c0d50488336c2d13beecde97cb9f76e

See more details on using hashes here.

File details

Details for the file acon_intel-0.1.2-py3-none-any.whl.

File metadata

  • Download URL: acon_intel-0.1.2-py3-none-any.whl
  • Upload date:
  • Size: 62.0 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.13.12

File hashes

Hashes for acon_intel-0.1.2-py3-none-any.whl
Algorithm Hash digest
SHA256 ec4e2f4924eee9a840f1aaf2d080255b4080d2fe64171ebbe0264c003c901fe2
MD5 a8b77e7d0560ab90af76c7858fa7a305
BLAKE2b-256 3e5abb9a814e05be6b4166820586ecbada29115001b8d8a8e6dcfe39fce64e3b

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page