Skip to main content

A lexical, ultra-fast cross-platform search engine

Project description

Smart Lexical Search Engine

A powerful desktop search application that searches inside PDF and Word documents based on content keywords given by the user.

Features

Core Features

  • Content-based search: Find documents by keywords inside them
  • Multi-format support: PDF and DOCX files
  • Fast search: ~0.02 seconds after initial indexing
  • Smart ranking: Results sorted by relevance
  • Auto-complete: Suggests keywords dynamically as you type

Enhanced Features (New)

  • Standalone Executable: No Python needed! Run straight from your desktop.
  • Cross-Platform Package: Install via pip install smartlex-search
  • Modern UI: Clean, beautifully designed Qt interface with dynamic status indicators.
  • Parallel processing: Uses multithreading for blazingly fast background indexing.
  • Automated Setup: Completely self-contained, handles NLTK and indexing without scripts!

Installation

Method 1: The Standalone App (No Python Required!)

  1. Go to the Releases page of this GitHub repository.
  2. Download SmartLex.exe.
  3. Double-click it! The app will automatically handle indexing and open right up.

Method 2: Terminal Package (For Developers)

You can install this directly into your Python environment as a global package.

  1. Install via pip
pip install smartlex-search
  1. Run it from anywhere
smartlex

(Note: To install the developer version from source, simply clone the repo and run pip install -e .)


Usage

First Run (Indexing)

Whether you use the .exe or the smartlex terminal command, the application will automatically detect if it is your first time running it. It will launch a background scanner that sweeps your drives for PDFs and Word documents, extracts their keywords using advanced NLP, and builds your local index.

⏱️ First run: Depends on the number of files (usually a few minutes) ⏱️ Subsequent runs: Instant! The UI will pop open in less than a second.

Regular Usage

  1. Type keywords in the modern search bar (e.g., "machine learning").
  2. The dynamic auto-complete will suggest words based specifically on the contents of your own files!
  3. Press Enter or click Search.
  4. Click or double-click any beautifully formatted result to instantly open the document.

Keyboard Shortcuts

  • Enter: Search / Open selected document
  • Tab: Switch between search bar and results
  • ↑/↓: Navigate through results
  • Ctrl+F: Focus the search bar
  • Esc: Clear search / Close application

How It Works

Architecture

User Query → RAKE Extraction → Keyword Matching → Ranking → Results
                                       ↑
                Document Index ← Multithreading ← Automated File Scanner

Step-by-Step Process

1️ File Collection (First Run Only)

  • The internal Python scanner recursively sweeps all connected system drives.
  • It strictly filters out system folders (Windows, Program Files, .git) for maximum speed.
  • It collects all PDF and DOCX file paths.

2️ Parallel Indexing (First Run Only)

  • Multiple background threads parse the documents simultaneously to maximize CPU efficiency.

3️ Keyword Extraction (RAKE Algorithm)

Input: "Machine learning algorithms use neural networks"
        ↓
Remove stopwords: "Machine learning algorithms use neural networks"
        ↓
Extract phrases: ["machine learning algorithms", "neural networks"]
        ↓
Split to words: ["machine", "learning", "algorithms", "neural", "networks"]

4️ Index Storage

The system builds an ultra-fast lookup dictionary (output.json):

{
    "C:/docs/paper1.pdf": ["machine", "learning", "neural", "network"],
    "C:/docs/paper2.pdf": ["algorithm", "optimization", "training"]
}

5️ Search Process

User types: "neural network training"
        ↓
Extract keywords: ["neural", "network", "training"]
        ↓
Find matching documents:
  paper1.pdf: 2 matches (neural, network)
  paper2.pdf: 1 match (training)
        ↓
Rank by relevance:
  1. paper1.pdf (score: 2)
  2. paper2.pdf (score: 1)

Project Structure

SmartLex/
│
├── pyproject.toml         # Package configuration
├── run.py                 # PyInstaller entry point
│
├── src/smartlex/
│   ├── main.py            # Core application entry
│   ├── core/              # NLP, parsing, and indexing logic
│   └── gui/               # PyQt5 interface and background threads
│
├── requirements.txt
├── config.json
└── README.md

Reference


Sample Screenshots

  • 🪟 Initial Window

Initial Window

📄 Output Window

Output Window

Future enhancement

  • Working on adding support for additional file types such as TXT, CSV, XLSX, PPTX, HTML, etc.
  • Currently working on similar functionality for images, audios etc.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

smartlex_search-1.0.0.tar.gz (16.3 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

smartlex_search-1.0.0-py3-none-any.whl (16.6 kB view details)

Uploaded Python 3

File details

Details for the file smartlex_search-1.0.0.tar.gz.

File metadata

  • Download URL: smartlex_search-1.0.0.tar.gz
  • Upload date:
  • Size: 16.3 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.12.3

File hashes

Hashes for smartlex_search-1.0.0.tar.gz
Algorithm Hash digest
SHA256 a75204edc8d2110c00f61c8b8a8f2e8654177c46558cff54a69946c2ba4e4ecd
MD5 8ac11d7f513672d02ad63f9244dd258f
BLAKE2b-256 c500d31661aeb61d84e170c3d7213d5cc78becb54eaf8936f14396b7f60270e2

See more details on using hashes here.

File details

Details for the file smartlex_search-1.0.0-py3-none-any.whl.

File metadata

File hashes

Hashes for smartlex_search-1.0.0-py3-none-any.whl
Algorithm Hash digest
SHA256 609a8b10344130d08e960d15eaf35f9d5c51c7a1a800bae5969b5f02ae1f0012
MD5 2a2a9eb2101f1ddec79cfe17da913031
BLAKE2b-256 96e076dd32da1b9b94c51315ef28a404ac0a9ae7b2e76e9f99155134414c52e2

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page