Skip to main content

A package for creating ML research assistant models through paper dataset creation and model fine-tuning

Project description

PaperTuner

PaperTuner is a Python package for creating research assistant models by processing academic papers and fine-tuning language models to provide methodology guidance and research approaches.

Features

  • Automated extraction of research papers from arXiv
  • Section extraction to identify problem statements, methodologies, and results
  • Generation of high-quality question-answer pairs for research methodology
  • Fine-tuning of language models with GRPO (Growing Rank Pruned Optimization)
  • Integration with Hugging Face for dataset and model sharing

Installation

pip install papertuner

Basic Usage

As a Command-Line Tool

1. Create a dataset from research papers

# Set up your environment variables
export GEMINI_API_KEY="your-api-key"
export HF_TOKEN="your-huggingface-token"  # Optional, for uploading to HF

# Run the dataset creation
papertuner-dataset --max-papers 100

2. Train a model

# Train using the created or an existing dataset
papertuner-train --model "Qwen/Qwen2.5-3B-Instruct" --dataset "densud2/ml_qa_dataset"

As a Python Library

from papertuner import ResearchPaperProcessor, ResearchAssistantTrainer

# Create a dataset
processor = ResearchPaperProcessor(
    api_key="your-api-key",
    hf_repo_id="your-username/dataset-name"
)
papers = processor.process_papers(max_papers=10)

# Train a model
trainer = ResearchAssistantTrainer(
    model_name="Qwen/Qwen2.5-3B-Instruct",
    lora_rank=64,
    output_dir="./model_output"
)
results = trainer.train("your-username/dataset-name")

# Test the model
question = "How would you design a transformer model for time series forecasting?"
response = trainer.run_inference(
    results["model"],
    results["tokenizer"],
    question,
    results["lora_path"]
)
print(response)

Configuration

You can configure the tool using environment variables or when initializing the classes:

  • GEMINI_API_KEY: API key for generating QA pairs
  • HF_TOKEN: Hugging Face token for uploading datasets and models
  • HF_REPO_ID: Hugging Face repository ID for the dataset
  • PAPERTUNER_DATA_DIR: Custom directory for storing data (default: ~/.papertuner/data)

License

MIT License

Contributing

Contributions are welcome! Please feel free to submit a Pull Request.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

papertuner-0.0.6.tar.gz (20.2 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

papertuner-0.0.6-py3-none-any.whl (20.7 kB view details)

Uploaded Python 3

File details

Details for the file papertuner-0.0.6.tar.gz.

File metadata

  • Download URL: papertuner-0.0.6.tar.gz
  • Upload date:
  • Size: 20.2 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.12.9

File hashes

Hashes for papertuner-0.0.6.tar.gz
Algorithm Hash digest
SHA256 1b1b726729d6ca7cbdd147d036f63df5d99e99052f15d033d32322ac278ddee1
MD5 69439331b61a963598e1cb36405f4bf1
BLAKE2b-256 94c8bd073476ea724139be1230ae36ab02693112d4b21526c190c0e284ae8d54

See more details on using hashes here.

Provenance

The following attestation bundles were made for papertuner-0.0.6.tar.gz:

Publisher: release.yaml on Lyra-Lab/papertuner

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file papertuner-0.0.6-py3-none-any.whl.

File metadata

  • Download URL: papertuner-0.0.6-py3-none-any.whl
  • Upload date:
  • Size: 20.7 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.12.9

File hashes

Hashes for papertuner-0.0.6-py3-none-any.whl
Algorithm Hash digest
SHA256 32d781da30944e7e9fbfdef5d53335f510d08f60f1c7f6a28333a6efa5aa94ca
MD5 5f3a37807a8e27411e57104934c4f22e
BLAKE2b-256 a7c305b3782574760a5c4bb93e13e6b39dde352b9b9d1d005a9db4fa3992007c

See more details on using hashes here.

Provenance

The following attestation bundles were made for papertuner-0.0.6-py3-none-any.whl:

Publisher: release.yaml on Lyra-Lab/papertuner

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page