A lexical, ultra-fast cross-platform search engine

These details have not been verified by PyPI

Project description

Smart Lexical Search Engine

A powerful desktop search application that searches inside PDF and Word documents based on content keywords given by the user.

Features

Core Features

Content-based search: Find documents by keywords inside them
Multi-format support: PDF and DOCX files
Fast search: ~0.02 seconds after initial indexing
Smart ranking: Results sorted by relevance
Auto-complete: Suggests keywords dynamically as you type

Enhanced Features (New)

Standalone Executable: No Python needed! Run straight from your desktop.
Cross-Platform Package: Install via pip install smartlex-search
Modern UI: Clean, beautifully designed Qt interface with dynamic status indicators.
Parallel processing: Uses multithreading for blazingly fast background indexing.
Automated Setup: Completely self-contained, handles NLTK and indexing without scripts!

Installation

Method 1: The Standalone App (No Python Required!)

Go to the Releases page of this GitHub repository.
Download SmartLex.exe.
Double-click it! The app will automatically handle indexing and open right up.

Method 2: Terminal Package (For Developers)

You can install this directly into your Python environment as a global package.

Install via pip

pip install smartlex-search

Run it from anywhere

smartlex

(Note: To install the developer version from source, simply clone the repo and run pip install -e .)

Usage

First Run (Indexing)

Whether you use the .exe or the smartlex terminal command, the application will automatically detect if it is your first time running it. It will launch a background scanner that sweeps your drives for PDFs and Word documents, extracts their keywords using advanced NLP, and builds your local index.

⏱️ First run: Depends on the number of files (usually a few minutes) ⏱️ Subsequent runs: Instant! The UI will pop open in less than a second.

Regular Usage

Type keywords in the modern search bar (e.g., "machine learning").
The dynamic auto-complete will suggest words based specifically on the contents of your own files!
Press Enter or click Search.
Click or double-click any beautifully formatted result to instantly open the document.

Keyboard Shortcuts

Enter: Search / Open selected document
Tab: Switch between search bar and results
↑/↓: Navigate through results
Ctrl+F: Focus the search bar
Esc: Clear search / Close application

How It Works

Architecture

User Query → RAKE Extraction → Keyword Matching → Ranking → Results
                                       ↑
                Document Index ← Multithreading ← Automated File Scanner

Step-by-Step Process

1️ File Collection (First Run Only)

The internal Python scanner recursively sweeps all connected system drives.
It strictly filters out system folders (Windows, Program Files, .git) for maximum speed.
It collects all PDF and DOCX file paths.

2️ Parallel Indexing (First Run Only)

Multiple background threads parse the documents simultaneously to maximize CPU efficiency.

3️ Keyword Extraction (RAKE Algorithm)

Input: "Machine learning algorithms use neural networks"
        ↓
Remove stopwords: "Machine learning algorithms use neural networks"
        ↓
Extract phrases: ["machine learning algorithms", "neural networks"]
        ↓
Split to words: ["machine", "learning", "algorithms", "neural", "networks"]

4️ Index Storage

The system builds an ultra-fast lookup dictionary (output.json):

{
    "C:/docs/paper1.pdf": ["machine", "learning", "neural", "network"],
    "C:/docs/paper2.pdf": ["algorithm", "optimization", "training"]
}

5️ Search Process

User types: "neural network training"
        ↓
Extract keywords: ["neural", "network", "training"]
        ↓
Find matching documents:
  paper1.pdf: 2 matches (neural, network)
  paper2.pdf: 1 match (training)
        ↓
Rank by relevance:
  1. paper1.pdf (score: 2)
  2. paper2.pdf (score: 1)

Project Structure

SmartLex/
│
├── pyproject.toml         # Package configuration
├── run.py                 # PyInstaller entry point
│
├── src/smartlex/
│   ├── main.py            # Core application entry
│   ├── core/              # NLP, parsing, and indexing logic
│   └── gui/               # PyQt5 interface and background threads
│
├── requirements.txt
├── config.json
└── README.md

Reference

Base Paper / Reference PDF

Sample Screenshots

🪟 Initial Window

Initial Window

📄 Output Window

Output Window

Future enhancement

Working on adding support for additional file types such as TXT, CSV, XLSX, PPTX, HTML, etc.
Currently working on similar functionality for images, audios etc.

Project details

These details have not been verified by PyPI

Release history Release notifications | RSS feed

This version

1.0.0

May 10, 2026

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

smartlex_search-1.0.0.tar.gz (16.3 kB view details)

Uploaded May 10, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

smartlex_search-1.0.0-py3-none-any.whl (16.6 kB view details)

Uploaded May 10, 2026 Python 3

File details

Details for the file smartlex_search-1.0.0.tar.gz.

File metadata

Download URL: smartlex_search-1.0.0.tar.gz
Upload date: May 10, 2026
Size: 16.3 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.12.3

File hashes

Hashes for smartlex_search-1.0.0.tar.gz
Algorithm	Hash digest
SHA256	`a75204edc8d2110c00f61c8b8a8f2e8654177c46558cff54a69946c2ba4e4ecd`
MD5	`8ac11d7f513672d02ad63f9244dd258f`
BLAKE2b-256	`c500d31661aeb61d84e170c3d7213d5cc78becb54eaf8936f14396b7f60270e2`

See more details on using hashes here.

File details

Details for the file smartlex_search-1.0.0-py3-none-any.whl.

File metadata

Download URL: smartlex_search-1.0.0-py3-none-any.whl
Upload date: May 10, 2026
Size: 16.6 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.12.3

File hashes

Hashes for smartlex_search-1.0.0-py3-none-any.whl
Algorithm	Hash digest
SHA256	`609a8b10344130d08e960d15eaf35f9d5c51c7a1a800bae5969b5f02ae1f0012`
MD5	`2a2a9eb2101f1ddec79cfe17da913031`
BLAKE2b-256	`96e076dd32da1b9b94c51315ef28a404ac0a9ae7b2e76e9f99155134414c52e2`

See more details on using hashes here.

smartlex-search 1.0.0

Navigation

Verified details

Maintainers

Unverified details

Meta

Classifiers

Project description

Smart Lexical Search Engine

A powerful desktop search application that searches inside PDF and Word documents based on content keywords given by the user.

Features

Core Features

Enhanced Features (New)

Installation

Method 1: The Standalone App (No Python Required!)

Method 2: Terminal Package (For Developers)

Usage

First Run (Indexing)

Regular Usage

Keyboard Shortcuts

How It Works

Architecture

Step-by-Step Process

1️ File Collection (First Run Only)

2️ Parallel Indexing (First Run Only)

3️ Keyword Extraction (RAKE Algorithm)

4️ Index Storage

5️ Search Process

Project Structure

Reference

Sample Screenshots

🪟 Initial Window

📄 Output Window

Future enhancement

Project details

Verified details

Maintainers

Unverified details

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes