Deep Web Research Tool - Aggregates 100+ sources, scrapes content, generates AI summaries
Project description
ACHEM - Deep Web Research Tool
ACHEM (Arabic: آشم) is a powerful deep web research tool that aggregates information from 30+ sources, scrapes full content from top results, and generates concise summaries using AI.
Features
- Deep Web Research: Gathers results from 30+ sources via DuckDuckGo
- Web Scraping: Extracts full content from top 3 most relevant links
- Two-Pass Search: Prioritizes technical content (StackOverflow, GitHub, forums)
- AI Summarization: Uses Hugging Face Inference Providers (free tier)
- Syntax Highlighting: Color-coded output for easy scanning
- SQLite Cache: Instant recall for repeated searches
- Export: Save summaries to Markdown files
- Multi-language: Supports English, French, and Arabic
Installation
Prerequisites
- Python 3.10 or higher
- pip package manager
Quick Install (PyPI)
pipx install achem
Note:
pipxis recommended as it manages virtual environments automatically. If you prefer pip, usepip install achem --break-system-packages.
Or Install from Source
- Clone the repository
git clone https://github.com/achem/achem.git
cd achem
- Install in editable mode
pip install -e .
- Configure API keys
cp .env.example .env
Then edit .env and add your Hugging Face API token:
HF_API_KEY=hf_your_token_here
HF_MODEL=Qwen/Qwen2.5-7B-Instruct
Getting a Hugging Face API Token
- Go to Hugging Face
- Create an account (free)
- Go to Settings → Access Tokens
- Create a new token with "Read" permissions
- Copy the token to your
.envfile
Usage
Interactive Mode
python src/main.py
Command Line Mode
python src/main.py "your search query"
Options
| Option | Description | Default |
|---|---|---|
-l, --limit |
Wikipedia results per query | 10 |
--lang |
Language (en/fr/ar/auto) | auto |
--ddg-limit |
DuckDuckGo results | 30 |
--min-relevance |
Minimum relevance % | 0 |
--no-cache |
Skip cache | False |
--no-wikipedia |
Skip Wikipedia | False |
--clear-cache |
Clear SQLite cache | False |
Commands (Interactive Mode)
| Command | Description |
|---|---|
clear / cls |
Clear screen |
export / save |
Export last summary |
help / ? |
Show help |
version / v |
Show version |
exit / quit / q |
Exit program |
Project Structure
ACHEM/
├── src/
│ └── achem/ # Main package
│ ├── __init__.py
│ ├── main.py # Entry point
│ ├── commands.py # Command handler
│ ├── config_manager.py # Config loader
│ ├── duckduckgo_client.py # DDG search
│ ├── export_manager.py # Export to Documents/ACHEM/
│ ├── huggingface_summarizer.py # AI summarization
│ ├── output_formatter.py # Terminal UI
│ ├── search_router.py # Source priority
│ ├── sqlite_cache.py # SQLite cache
│ ├── spell_checker.py # Typo correction
│ ├── text_analyzer.py # TF-IDF analysis
│ ├── user_input.py # Input handler
│ ├── web_scraper.py # BeautifulSoup scraper
│ └── wikipedia_client.py # Wikipedia API
├── .env.example # Config template
├── .gitignore
├── LICENSE
├── README.md
└── pyproject.toml # Package metadata
How It Works
Two-Pass Search System
┌─────────────────────────────────────────────────────┐
│ PASS 1: DuckDuckGo Search (30 results) │
│ • Prioritizes technical sites │
│ • Filters out cookie/login/consent pages │
│ • Ranks by domain authority │
├─────────────────────────────────────────────────────┤
│ PASS 2: Web Scraping (Top 3) │
│ • BeautifulSoup extracts full article text │
│ • Removes navigation/footer/scripts │
│ • Combines up to 10,000 chars per article │
├─────────────────────────────────────────────────────┤
│ PASS 3: AI Summarization │
│ • Neutral technical prompt │
│ • No ethical warnings or opinions │
│ • 500-4000 character output │
│ • Syntax highlighting for steps/commands │
└─────────────────────────────────────────────────────┘
Source Priority
- DuckDuckGo (Primary) - Real-time web results
- Wikipedia (Secondary) - Background concepts only
- Web Scraping - Full content from top 3
Export Location
Summaries are saved to:
- Linux/macOS:
~/Documents/ACHEM/ - Windows:
C:\Users\<username>\Documents\ACHEM\
Disclaimer
ACHEM is for educational and research purposes only.
The tool aggregates publicly available information from the web. Any actions taken based on the information provided are the sole responsibility of the user. The developer is not responsible for any misuse of this tool.
License
This project is licensed under the MIT License - see the LICENSE file for details.
Contributing
Contributions are welcome! Please feel free to submit issues and pull requests.
Acknowledgments
- Hugging Face - Free inference API
- DuckDuckGo - Privacy-focused search
- Wikipedia - Free encyclopedia
- Qwen - Open source AI models
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file achem-1.0.4.tar.gz.
File metadata
- Download URL: achem-1.0.4.tar.gz
- Upload date:
- Size: 35.3 kB
- Tags: Source
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
1fc6f26191c43bc9131fb10d861a5730a9ed3732a2ee2ff95b5044ed2907f9d0
|
|
| MD5 |
0c37f94bda88d77b2af7cab9142f7890
|
|
| BLAKE2b-256 |
eafb2e6140c7dd1b5486b9abeb97099042ecc669b18a2df37ae8c29ea9a20997
|
Provenance
The following attestation bundles were made for achem-1.0.4.tar.gz:
Publisher:
release.yml on sarok-exe/achem
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
achem-1.0.4.tar.gz -
Subject digest:
1fc6f26191c43bc9131fb10d861a5730a9ed3732a2ee2ff95b5044ed2907f9d0 - Sigstore transparency entry: 1237286042
- Sigstore integration time:
-
Permalink:
sarok-exe/achem@d6d37038a54fc3eaf13f43eeac778267ab2d32f9 -
Branch / Tag:
refs/tags/v1.0.4 - Owner: https://github.com/sarok-exe
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
release.yml@d6d37038a54fc3eaf13f43eeac778267ab2d32f9 -
Trigger Event:
push
-
Statement type:
File details
Details for the file achem-1.0.4-py3-none-any.whl.
File metadata
- Download URL: achem-1.0.4-py3-none-any.whl
- Upload date:
- Size: 44.1 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
b494f7851285a26689c77d2d82f8288947387d699fe986c9e139a94b33de183c
|
|
| MD5 |
e7cc183da60cbcbf494726518944db8e
|
|
| BLAKE2b-256 |
298c61f44c38d2fcd3be705ea757067263bdd92cef2b8381512816c84cc4e290
|
Provenance
The following attestation bundles were made for achem-1.0.4-py3-none-any.whl:
Publisher:
release.yml on sarok-exe/achem
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
achem-1.0.4-py3-none-any.whl -
Subject digest:
b494f7851285a26689c77d2d82f8288947387d699fe986c9e139a94b33de183c - Sigstore transparency entry: 1237286057
- Sigstore integration time:
-
Permalink:
sarok-exe/achem@d6d37038a54fc3eaf13f43eeac778267ab2d32f9 -
Branch / Tag:
refs/tags/v1.0.4 - Owner: https://github.com/sarok-exe
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
release.yml@d6d37038a54fc3eaf13f43eeac778267ab2d32f9 -
Trigger Event:
push
-
Statement type: