Skip to main content

An intelligent literature review tool that uses AI-powered embeddings to find the most relevant research papers based on your research interests.

Project description

SmartReview

PyPI version License: MIT Python 3.8+

SmartReview is an AI-powered literature review tool that uses OpenAI text embeddings to rank a large corpus of research papers by how closely they match a free-text description of your research interests.


Features

  • 🔍 Semantic ranking – embed every paper (title + abstract) and your interest statement, then rank by cosine similarity.
  • 📊 Flexible top-K selection – choose a fixed K or derive it automatically (e.g. top 20 % by similarity score).
  • 💾 Multiple export formats – CSV, Excel (.xlsx), and BibTeX (.bib).
  • 🗄️ Embedding cache – save / reload embeddings with pickle so you don't re-call the API on every run.
  • 🔑 Safe API-key handling – reads OPENAI_API_KEY from the environment (or a .env file) and raises a clear error if it is missing.

Installation

pip install smartreview

For development / editable installs:

git clone https://github.com/geonextgis/smartreview.git
cd smartreview
pip install -e .

Quick Start

1 – Set your OpenAI API key

# Option A: environment variable
export OPENAI_API_KEY="sk-..."

# Option B: .env file (recommended)
echo 'OPENAI_API_KEY=sk-...' > .env

2 – Generate embeddings and find top papers

from dotenv import load_dotenv
import pandas as pd
from smartreview import (
    create_openai_client, get_embedding,
    calculate_cosine_similarity, get_top_k_papers,
    create_top_k_dataframe, save_top_k_papers,
    generate_bibtex_file, save_embeddings, load_embeddings,
)

load_dotenv()  # reads OPENAI_API_KEY from .env

# 1. Load your Web of Science export
data = pd.read_excel("data/papers.xls")
summary = {i: (row["Article Title"], row["Abstract"]) for i, row in data.iterrows()}

# 2. Create OpenAI client
client = create_openai_client()  # raises ValueError if key is missing

# 3. Embed all papers
paper_embeddings = {}
for idx, (title, abstract) in summary.items():
    text = title + " " + (str(abstract) if pd.notna(abstract) else "")
    paper_embeddings[idx] = get_embedding(text, client=client)

# 4. Embed your research interest
interest_text = "Machine learning for crop yield prediction using remote sensing data."
interest_embedding = get_embedding(interest_text, client=client)

# 5. Save embeddings (avoids re-calling the API next time)
save_embeddings(paper_embeddings, interest_embedding, interest_text)

# 6. Rank papers
similarities = calculate_cosine_similarity(interest_embedding, paper_embeddings)
top_k = get_top_k_papers(similarities, k=100)

# 7. Export
df = create_top_k_dataframe(top_k, data, summary)
save_top_k_papers(df, output_dir="data", k=100)
generate_bibtex_file(df, output_dir="data", k=100)
print("Done! Check the data/ folder for your results.")

3 – Re-use cached embeddings

from dotenv import load_dotenv
from smartreview import load_embeddings, calculate_cosine_similarity, get_top_k_papers

load_dotenv()
paper_embeddings, interest_embedding, interest_text = load_embeddings()
similarities = calculate_cosine_similarity(interest_embedding, paper_embeddings)
top_k = get_top_k_papers(similarities, k=50)

API Reference

OpenAI helpers (smartreview.embeddings)

Function Description
create_openai_client(api_key=None) Return an openai.OpenAI client; reads OPENAI_API_KEY from env if api_key is omitted.
get_embedding(text, client=None, model="text-embedding-3-large") Embed a single string and return a NumPy array.
get_embeddings_batch(texts, client=None, ...) Embed a list of strings with optional progress logging.

Similarity (smartreview.smartreview)

Function Description
calculate_cosine_similarity(query_emb, paper_emb_dict) Return a list of (idx, score) tuples sorted by descending similarity.
get_top_k_papers(similarities, k=100) Slice the top-K entries from a similarity list.

DataFrame & Export

Function Description
create_top_k_dataframe(top_k, data, summary) Build a ranked pd.DataFrame from top-K results.
save_top_k_papers(df, output_dir, k) Write CSV + Excel files; returns a dict of file paths.
print_top_k_summary(df, k, show_rows) Pretty-print a summary table.
generate_bibtex_file(df, output_dir, k) Write a .bib file; returns a dict with path and entry count.

Embedding Persistence

Function Description
save_embeddings(paper_emb, interest_emb, interest_text, output_dir) Pickle embeddings to output_dir.
load_embeddings(output_dir) Load and return (paper_emb, interest_emb, interest_text).

Example Notebook

An end-to-end walkthrough is provided in docs/examples/example.ipynb.
Place your Web of Science .xls export in docs/examples/data/ before running.


Requirements

Package Purpose
openai Text embeddings via the OpenAI API
numpy Numerical arrays
pandas DataFrame I/O
scikit-learn Cosine similarity
tiktoken Token counting
openpyxl Excel export
python-dotenv .env file support

License

MIT © Krishnagopal Halder

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

smartreview-0.0.1.tar.gz (623.9 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

smartreview-0.0.1-py2.py3-none-any.whl (14.9 kB view details)

Uploaded Python 2Python 3

File details

Details for the file smartreview-0.0.1.tar.gz.

File metadata

  • Download URL: smartreview-0.0.1.tar.gz
  • Upload date:
  • Size: 623.9 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.14.4

File hashes

Hashes for smartreview-0.0.1.tar.gz
Algorithm Hash digest
SHA256 648b4b5e5bd5014c94d36e8676e9c816d98dd01c882fc80cf50f3a53eef8b8e2
MD5 aa8fdfcd676b230d3aaef7a515069c32
BLAKE2b-256 53c43fa6703b2a91a8de5668622f33c6b1d8e9d90dc5f5a41ba12312cf04a50c

See more details on using hashes here.

File details

Details for the file smartreview-0.0.1-py2.py3-none-any.whl.

File metadata

  • Download URL: smartreview-0.0.1-py2.py3-none-any.whl
  • Upload date:
  • Size: 14.9 kB
  • Tags: Python 2, Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.14.4

File hashes

Hashes for smartreview-0.0.1-py2.py3-none-any.whl
Algorithm Hash digest
SHA256 979a8caa94aebbb5512591233e689c20255edfe9a616e3a8a2cd67854b2039b9
MD5 c0f02b4b190aee56422e2980205f1e3e
BLAKE2b-256 ac8227f63f0570fd6699771f9c93960d570cda6e6ecf142b537772d17d4520c6

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page