A lexical, ultra-fast cross-platform search engine
Project description
Smart Lexical Search Engine
A powerful desktop search application that searches inside PDF and Word documents based on content keywords given by the user.
Features
Core Features
- Content-based search: Find documents by keywords inside them
- Multi-format support: PDF and DOCX files
- Fast search: ~0.02 seconds after initial indexing
- Smart ranking: Results sorted by relevance
- Auto-complete: Suggests keywords dynamically as you type
Enhanced Features (New)
- Standalone Executable: No Python needed! Run straight from your desktop.
- Cross-Platform Package: Install via
pip install smartlex-search - Modern UI: Clean, beautifully designed Qt interface with dynamic status indicators.
- Parallel processing: Uses multithreading for blazingly fast background indexing.
- Automated Setup: Completely self-contained, handles NLTK and indexing without scripts!
Installation
Method 1: The Standalone App (No Python Required!)
- Go to the Releases page of this GitHub repository.
- Download
SmartLex.exe. - Double-click it! The app will automatically handle indexing and open right up.
Method 2: Terminal Package (For Developers)
You can install this directly into your Python environment as a global package.
- Install via pip
pip install smartlex-search
- Run it from anywhere
smartlex
(Note: To install the developer version from source, simply clone the repo and run pip install -e .)
Usage
First Run (Indexing)
Whether you use the .exe or the smartlex terminal command, the application will automatically detect if it is your first time running it. It will launch a background scanner that sweeps your drives for PDFs and Word documents, extracts their keywords using advanced NLP, and builds your local index.
⏱️ First run: Depends on the number of files (usually a few minutes) ⏱️ Subsequent runs: Instant! The UI will pop open in less than a second.
Regular Usage
- Type keywords in the modern search bar (e.g., "machine learning").
- The dynamic auto-complete will suggest words based specifically on the contents of your own files!
- Press Enter or click Search.
- Click or double-click any beautifully formatted result to instantly open the document.
Keyboard Shortcuts
Enter: Search / Open selected documentTab: Switch between search bar and results↑/↓: Navigate through resultsCtrl+F: Focus the search barEsc: Clear search / Close application
How It Works
Architecture
User Query → RAKE Extraction → Keyword Matching → Ranking → Results
↑
Document Index ← Multithreading ← Automated File Scanner
Step-by-Step Process
1️ File Collection (First Run Only)
- The internal Python scanner recursively sweeps all connected system drives.
- It strictly filters out system folders (
Windows,Program Files,.git) for maximum speed. - It collects all PDF and DOCX file paths.
2️ Parallel Indexing (First Run Only)
- Multiple background threads parse the documents simultaneously to maximize CPU efficiency.
3️ Keyword Extraction (RAKE Algorithm)
Input: "Machine learning algorithms use neural networks"
↓
Remove stopwords: "Machine learning algorithms use neural networks"
↓
Extract phrases: ["machine learning algorithms", "neural networks"]
↓
Split to words: ["machine", "learning", "algorithms", "neural", "networks"]
4️ Index Storage
The system builds an ultra-fast lookup dictionary (output.json):
{
"C:/docs/paper1.pdf": ["machine", "learning", "neural", "network"],
"C:/docs/paper2.pdf": ["algorithm", "optimization", "training"]
}
5️ Search Process
User types: "neural network training"
↓
Extract keywords: ["neural", "network", "training"]
↓
Find matching documents:
paper1.pdf: 2 matches (neural, network)
paper2.pdf: 1 match (training)
↓
Rank by relevance:
1. paper1.pdf (score: 2)
2. paper2.pdf (score: 1)
Project Structure
SmartLex/
│
├── pyproject.toml # Package configuration
├── run.py # PyInstaller entry point
│
├── src/smartlex/
│ ├── main.py # Core application entry
│ ├── core/ # NLP, parsing, and indexing logic
│ └── gui/ # PyQt5 interface and background threads
│
├── requirements.txt
├── config.json
└── README.md
Reference
Sample Screenshots
-
🪟 Initial Window
📄 Output Window
Future enhancement
- Working on adding support for additional file types such as TXT, CSV, XLSX, PPTX, HTML, etc.
- Currently working on similar functionality for images, audios etc.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file smartlex_search-1.0.0.tar.gz.
File metadata
- Download URL: smartlex_search-1.0.0.tar.gz
- Upload date:
- Size: 16.3 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.12.3
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
a75204edc8d2110c00f61c8b8a8f2e8654177c46558cff54a69946c2ba4e4ecd
|
|
| MD5 |
8ac11d7f513672d02ad63f9244dd258f
|
|
| BLAKE2b-256 |
c500d31661aeb61d84e170c3d7213d5cc78becb54eaf8936f14396b7f60270e2
|
File details
Details for the file smartlex_search-1.0.0-py3-none-any.whl.
File metadata
- Download URL: smartlex_search-1.0.0-py3-none-any.whl
- Upload date:
- Size: 16.6 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.12.3
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
609a8b10344130d08e960d15eaf35f9d5c51c7a1a800bae5969b5f02ae1f0012
|
|
| MD5 |
2a2a9eb2101f1ddec79cfe17da913031
|
|
| BLAKE2b-256 |
96e076dd32da1b9b94c51315ef28a404ac0a9ae7b2e76e9f99155134414c52e2
|