Skip to main content

Deep Web Research Tool - Aggregates 100+ sources, scrapes content, generates AI summaries

Project description

ACHEM - Deep Web Research Tool

ACHEM Banner

ACHEM (Arabic: آشم) is a powerful deep web research tool that aggregates information from 30+ sources, scrapes full content from top results, and generates concise summaries using AI.

Features

  • Deep Web Research: Gathers results from 30+ sources via DuckDuckGo
  • Web Scraping: Extracts full content from top 3 most relevant links
  • Two-Pass Search: Prioritizes technical content (StackOverflow, GitHub, forums)
  • AI Summarization: Uses Hugging Face Inference Providers (free tier)
  • Syntax Highlighting: Color-coded output for easy scanning
  • SQLite Cache: Instant recall for repeated searches
  • Export: Save summaries to Markdown files
  • Multi-language: Supports English, French, and Arabic

Installation

Prerequisites

  • Python 3.10 or higher
  • pip package manager

Quick Install (PyPI)

pipx install achem

Note: pipx is recommended as it manages virtual environments automatically. If you prefer pip, use pip install achem --break-system-packages.

Or Install from Source

  1. Clone the repository
git clone https://github.com/achem/achem.git
cd achem
  1. Install in editable mode
pip install -e .
  1. Configure API keys
cp .env.example .env

Then edit .env and add your Hugging Face API token:

HF_API_KEY=hf_your_token_here
HF_MODEL=Qwen/Qwen2.5-7B-Instruct

Getting a Hugging Face API Token

  1. Go to Hugging Face
  2. Create an account (free)
  3. Go to Settings → Access Tokens
  4. Create a new token with "Read" permissions
  5. Copy the token to your .env file

Usage

Interactive Mode

python src/main.py

Command Line Mode

python src/main.py "your search query"

Options

Option Description Default
-l, --limit Wikipedia results per query 10
--lang Language (en/fr/ar/auto) auto
--ddg-limit DuckDuckGo results 30
--min-relevance Minimum relevance % 0
--no-cache Skip cache False
--no-wikipedia Skip Wikipedia False
--clear-cache Clear SQLite cache False

Commands (Interactive Mode)

Command Description
clear / cls Clear screen
export / save Export last summary
help / ? Show help
version / v Show version
exit / quit / q Exit program

Project Structure

ACHEM/
├── src/
│   └── achem/              # Main package
│       ├── __init__.py
│       ├── main.py        # Entry point
│       ├── commands.py    # Command handler
│       ├── config_manager.py    # Config loader
│       ├── duckduckgo_client.py # DDG search
│       ├── export_manager.py    # Export to Documents/ACHEM/
│       ├── huggingface_summarizer.py # AI summarization
│       ├── output_formatter.py # Terminal UI
│       ├── search_router.py    # Source priority
│       ├── sqlite_cache.py     # SQLite cache
│       ├── spell_checker.py    # Typo correction
│       ├── text_analyzer.py    # TF-IDF analysis
│       ├── user_input.py       # Input handler
│       ├── web_scraper.py      # BeautifulSoup scraper
│       └── wikipedia_client.py  # Wikipedia API
├── .env.example            # Config template
├── .gitignore
├── LICENSE
├── README.md
└── pyproject.toml         # Package metadata

How It Works

Two-Pass Search System

┌─────────────────────────────────────────────────────┐
│ PASS 1: DuckDuckGo Search (30 results)              │
│ • Prioritizes technical sites                        │
│ • Filters out cookie/login/consent pages            │
│ • Ranks by domain authority                         │
├─────────────────────────────────────────────────────┤
│ PASS 2: Web Scraping (Top 3)                        │
│ • BeautifulSoup extracts full article text           │
│ • Removes navigation/footer/scripts                 │
│ • Combines up to 10,000 chars per article           │
├─────────────────────────────────────────────────────┤
│ PASS 3: AI Summarization                            │
│ • Neutral technical prompt                           │
│ • No ethical warnings or opinions                   │
│ • 500-4000 character output                        │
│ • Syntax highlighting for steps/commands             │
└─────────────────────────────────────────────────────┘

Source Priority

  1. DuckDuckGo (Primary) - Real-time web results
  2. Wikipedia (Secondary) - Background concepts only
  3. Web Scraping - Full content from top 3

Export Location

Summaries are saved to:

  • Linux/macOS: ~/Documents/ACHEM/
  • Windows: C:\Users\<username>\Documents\ACHEM\

Disclaimer

ACHEM is for educational and research purposes only.

The tool aggregates publicly available information from the web. Any actions taken based on the information provided are the sole responsibility of the user. The developer is not responsible for any misuse of this tool.

License

This project is licensed under the MIT License - see the LICENSE file for details.

Contributing

Contributions are welcome! Please feel free to submit issues and pull requests.

Acknowledgments

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

achem-1.0.4.tar.gz (35.3 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

achem-1.0.4-py3-none-any.whl (44.1 kB view details)

Uploaded Python 3

File details

Details for the file achem-1.0.4.tar.gz.

File metadata

  • Download URL: achem-1.0.4.tar.gz
  • Upload date:
  • Size: 35.3 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for achem-1.0.4.tar.gz
Algorithm Hash digest
SHA256 1fc6f26191c43bc9131fb10d861a5730a9ed3732a2ee2ff95b5044ed2907f9d0
MD5 0c37f94bda88d77b2af7cab9142f7890
BLAKE2b-256 eafb2e6140c7dd1b5486b9abeb97099042ecc669b18a2df37ae8c29ea9a20997

See more details on using hashes here.

Provenance

The following attestation bundles were made for achem-1.0.4.tar.gz:

Publisher: release.yml on sarok-exe/achem

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file achem-1.0.4-py3-none-any.whl.

File metadata

  • Download URL: achem-1.0.4-py3-none-any.whl
  • Upload date:
  • Size: 44.1 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for achem-1.0.4-py3-none-any.whl
Algorithm Hash digest
SHA256 b494f7851285a26689c77d2d82f8288947387d699fe986c9e139a94b33de183c
MD5 e7cc183da60cbcbf494726518944db8e
BLAKE2b-256 298c61f44c38d2fcd3be705ea757067263bdd92cef2b8381512816c84cc4e290

See more details on using hashes here.

Provenance

The following attestation bundles were made for achem-1.0.4-py3-none-any.whl:

Publisher: release.yml on sarok-exe/achem

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page