Skip to main content

Deep Web Research Tool - Aggregates 30+ sources, scrapes content, generates AI summaries

Project description

ACHEM - Deep Web Research Tool

ACHEM Banner

ACHEM (Arabic: آشم) is a powerful deep web research tool that aggregates information from 30+ sources, scrapes full content from top results, and generates concise summaries using AI.

Features

  • Deep Web Research: Gathers results from 30+ sources via DuckDuckGo
  • Web Scraping: Extracts full content from top 3 most relevant links
  • Two-Pass Search: Prioritizes technical content (StackOverflow, GitHub, forums)
  • AI Summarization: Uses Hugging Face Inference Providers (free tier)
  • Syntax Highlighting: Color-coded output for easy scanning
  • SQLite Cache: Instant recall for repeated searches
  • Export: Save summaries to Markdown files
  • Multi-language: Supports English, French, and Arabic

Screenshots

╔══════════════════════════════════════════════════════════════════╗
║                    ACHEM - Deep Web Research                      ║
╚══════════════════════════════════════════════════════════════════╝

🔍 Deep Research: how to learn python
==================================================
PASS 1: Gathering 30 sources from DuckDuckGo...
✓ Found 30 sources
PASS 2: Scraped full content from top 3 links
→ Analyzing 35 total sources...
→ Generating deep summary...

╭──────────────────────────────────────────────────────────────────╮
│ UNIFIED RESEARCH SUMMARY                                          │
├──────────────────────────────────────────────────────────────────┤
│ 1. Start with the official Python tutorial:                     │
│    - Visit docs.python.org/3/tutorial                           │
│                                                                  │
│ 2. Use free online tutorials:                                   │
│    - LearnPython.org, pythonbasics.org                         │
╰──────────────────────────────────────────────────────────────────╯

Installation

Prerequisites

  • Python 3.10 or higher
  • pip package manager

Quick Install (PyPI)

pip install achem

Or Install from Source

  1. Clone the repository
git clone https://github.com/achem/achem.git
cd achem
  1. Install in editable mode
pip install -e .
  1. Configure API keys
cp .env.example .env

Then edit .env and add your Hugging Face API token:

HF_API_KEY=hf_your_token_here
HF_MODEL=Qwen/Qwen2.5-7B-Instruct

Getting a Hugging Face API Token

  1. Go to Hugging Face
  2. Create an account (free)
  3. Go to Settings → Access Tokens
  4. Create a new token with "Read" permissions
  5. Copy the token to your .env file

Usage

Interactive Mode

python src/main.py

Command Line Mode

python src/main.py "your search query"

Options

Option Description Default
-l, --limit Wikipedia results per query 10
--lang Language (en/fr/ar/auto) auto
--ddg-limit DuckDuckGo results 30
--min-relevance Minimum relevance % 0
--no-cache Skip cache False
--no-wikipedia Skip Wikipedia False
--clear-cache Clear SQLite cache False

Commands (Interactive Mode)

Command Description
clear / cls Clear screen
export / save Export last summary
help / ? Show help
version / v Show version
exit / quit / q Exit program

Project Structure

ACHEM/
├── src/
│   └── achem/              # Main package
│       ├── __init__.py
│       ├── main.py        # Entry point
│       ├── commands.py    # Command handler
│       ├── config_manager.py    # Config loader
│       ├── duckduckgo_client.py # DDG search
│       ├── export_manager.py    # Export to Documents/ACHEM/
│       ├── huggingface_summarizer.py # AI summarization
│       ├── output_formatter.py # Terminal UI
│       ├── search_router.py    # Source priority
│       ├── sqlite_cache.py     # SQLite cache
│       ├── spell_checker.py    # Typo correction
│       ├── text_analyzer.py    # TF-IDF analysis
│       ├── user_input.py       # Input handler
│       ├── web_scraper.py      # BeautifulSoup scraper
│       └── wikipedia_client.py  # Wikipedia API
├── .env.example            # Config template
├── .gitignore
├── LICENSE
├── README.md
└── pyproject.toml         # Package metadata

How It Works

Two-Pass Search System

┌─────────────────────────────────────────────────────┐
│ PASS 1: DuckDuckGo Search (30 results)              │
│ • Prioritizes technical sites                        │
│ • Filters out cookie/login/consent pages            │
│ • Ranks by domain authority                         │
├─────────────────────────────────────────────────────┤
│ PASS 2: Web Scraping (Top 3)                        │
│ • BeautifulSoup extracts full article text           │
│ • Removes navigation/footer/scripts                 │
│ • Combines up to 10,000 chars per article           │
├─────────────────────────────────────────────────────┤
│ PASS 3: AI Summarization                            │
│ • Neutral technical prompt                           │
│ • No ethical warnings or opinions                   │
│ • 500-4000 character output                        │
│ • Syntax highlighting for steps/commands             │
└─────────────────────────────────────────────────────┘

Source Priority

  1. DuckDuckGo (Primary) - Real-time web results
  2. Wikipedia (Secondary) - Background concepts only
  3. Web Scraping - Full content from top 3

Export Location

Summaries are saved to:

  • Linux/macOS: ~/Documents/ACHEM/
  • Windows: C:\Users\<username>\Documents\ACHEM\

Disclaimer

ACHEM is for educational and research purposes only.

The tool aggregates publicly available information from the web. Any actions taken based on the information provided are the sole responsibility of the user. The developer is not responsible for any misuse of this tool.

License

This project is licensed under the MIT License - see the LICENSE file for details.

Contributing

Contributions are welcome! Please feel free to submit issues and pull requests.

Acknowledgments

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

achem-1.0.1.tar.gz (32.6 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

achem-1.0.1-py3-none-any.whl (34.8 kB view details)

Uploaded Python 3

File details

Details for the file achem-1.0.1.tar.gz.

File metadata

  • Download URL: achem-1.0.1.tar.gz
  • Upload date:
  • Size: 32.6 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for achem-1.0.1.tar.gz
Algorithm Hash digest
SHA256 065ecf07659078098e3ae7ec5f2255e1d294e3aa753c529f7bb615142d4908cd
MD5 ea5bfe968dcb58ed5aa3d94bb36e855f
BLAKE2b-256 692ea40c4b7cd779fe701033fe4a56aaf3a6a882373e777b4b9b010055dfc1f9

See more details on using hashes here.

Provenance

The following attestation bundles were made for achem-1.0.1.tar.gz:

Publisher: release.yml on sarok-exe/achem

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file achem-1.0.1-py3-none-any.whl.

File metadata

  • Download URL: achem-1.0.1-py3-none-any.whl
  • Upload date:
  • Size: 34.8 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for achem-1.0.1-py3-none-any.whl
Algorithm Hash digest
SHA256 f7bafdb3b2262724eba3dba8623c7220a420604dfedf1dff13c912e7d88898ed
MD5 056a854cf3a268fe13d71e0f64c43692
BLAKE2b-256 00a06e0aa36c09ff0a17332f08350dc449151510644618e7b87e444c38dc5e88

See more details on using hashes here.

Provenance

The following attestation bundles were made for achem-1.0.1-py3-none-any.whl:

Publisher: release.yml on sarok-exe/achem

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page