Deep Web Research Tool with Structural Positional Search - AI-powered synthesis using Ordinal Distance algorithm
Project description
ACHEM - Deep Web Research Tool
ACHEM (Arabic: آشم) is a powerful deep web research tool that extracts content from 100+ sources, scrapes full article text, filters relevant content, and generates AI-powered conclusions.
Features
- 100+ Sources: Searches DuckDuckGo for up to 100 results
- Full Content Extraction: Scrapes full article text using Trafilatura
- Smart Content Filtering: Removes ads/boilerplate, keeps only relevant sentences
- AI Conclusions: Generates synthesized final verdicts with probability predictions
- Multi-AI Providers: OpenRouter (free), Groq, Gemini, Ollama
- Markdown Export: Saves complete reports with all sources to
~/Documents/ACHEM/ - Multi-language: Supports English, French, and Arabic
- Rate Limit Retry: Automatic retry on 429 errors
Installation
Prerequisites
- Python 3.10 or higher
- uv package manager (recommended)
Quick Install
git clone https://github.com/sarok-exe/achem.git
cd achem
uv venv .venv && source .venv/bin/activate
uv pip install -e .
API Configuration
Create config at ~/.ACHEM/api.env or ~/Documents/ACHEM/api.env:
# OpenRouter (free, recommended)
OPENROUTER_API_KEY=your_openrouter_key_here
OPENROUTER_MODEL=google/gemma-4-31b-it:free
# Ollama (local AI)
OLLAMA_BASE_URL=http://localhost:11434
OLLAMA_MODEL=llama3.2
OLLAMA_PRIMARY=false
Get OpenRouter API key: https://openrouter.ai/settings
Usage
Command Line
achem "your research query" --ddg-limit 100
Options
--ddg-limit N Number of DuckDuckGo results (default: 100)
--mode ai Use AI for conclusions (default)
--mode local Use local TF-IDF (no API needed)
--lang en/fr/ar Response language
--no-wikipedia Skip Wikipedia sources
--no-cache Skip cache
How It Works
┌─────────────────────────────────────────────────────┐
│ 1. SEARCH (100+ sources) │
│ • DuckDuckGo web search │
│ • Prioritizes relevant content │
├─────────────────────────────────────────────────────┤
│ 2. SCRAPE (Full article text) │
│ • Extracts full content from URLs │
│ • Uses Trafilatura for clean text │
│ • Scrapes up to 100 pages concurrently │
├─────────────────────────────────────────────────────┤
│ 3. FILTER (Relevant content only) │
│ • Removes boilerplate and ads │
│ • Keeps sentences matching keywords │
│ • Deduplicates similar content │
├─────────────────────────────────────────────────────┤
│ 4. AI CONCLUSION │
│ • Analyzes all content │
│ • Generates final prediction │
│ • Includes probability percentages │
│ • Provides key reasons │
└─────────────────────────────────────────────────────┘
Output
Reports saved to ~/Documents/ACHEM/ include:
- AI Conclusion: Synthesized final prediction
- All Articles: Full extracted content from each source
- Keywords: Identified topics
- Extracted Web Content: Combined filtered content
License
MIT License - see LICENSE file
Acknowledgments
- OpenRouter - Free AI models
- DuckDuckGo - Privacy-focused search
- Trafilatura - Web content extraction
- Sumy - Text summarization
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file achem-1.1.0.tar.gz.
File metadata
- Download URL: achem-1.1.0.tar.gz
- Upload date:
- Size: 48.7 kB
- Tags: Source
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
5d2ac4630f07adb0ee0b99b2d5b3dd6e1df4663bc8e59bd67107e8e6b0e959fb
|
|
| MD5 |
bab5c8618553049db4d1db087a352946
|
|
| BLAKE2b-256 |
30be3a183283871182a2d9cf58feaf87290214a32e4dddc19c743c442cd022b8
|
Provenance
The following attestation bundles were made for achem-1.1.0.tar.gz:
Publisher:
release.yml on sarok-exe/achem
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
achem-1.1.0.tar.gz -
Subject digest:
5d2ac4630f07adb0ee0b99b2d5b3dd6e1df4663bc8e59bd67107e8e6b0e959fb - Sigstore transparency entry: 1290016624
- Sigstore integration time:
-
Permalink:
sarok-exe/achem@e054af55b8df2b9d180c303501789f6ec97a8ec1 -
Branch / Tag:
refs/tags/v1.1.0 - Owner: https://github.com/sarok-exe
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
release.yml@e054af55b8df2b9d180c303501789f6ec97a8ec1 -
Trigger Event:
push
-
Statement type:
File details
Details for the file achem-1.1.0-py3-none-any.whl.
File metadata
- Download URL: achem-1.1.0-py3-none-any.whl
- Upload date:
- Size: 59.1 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
e76027bbabd0ebf6e017149be21637843d1df7398bb8d7652f341567a2d828be
|
|
| MD5 |
816f6670dc045d3e5d7184fd8f3f51ad
|
|
| BLAKE2b-256 |
418d324fc72c2b68e721613537db45318a08843beeb2b09168a7fec0049380ad
|
Provenance
The following attestation bundles were made for achem-1.1.0-py3-none-any.whl:
Publisher:
release.yml on sarok-exe/achem
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
achem-1.1.0-py3-none-any.whl -
Subject digest:
e76027bbabd0ebf6e017149be21637843d1df7398bb8d7652f341567a2d828be - Sigstore transparency entry: 1290016754
- Sigstore integration time:
-
Permalink:
sarok-exe/achem@e054af55b8df2b9d180c303501789f6ec97a8ec1 -
Branch / Tag:
refs/tags/v1.1.0 - Owner: https://github.com/sarok-exe
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
release.yml@e054af55b8df2b9d180c303501789f6ec97a8ec1 -
Trigger Event:
push
-
Statement type: