Skip to main content

Preprint article recommender

Project description

PreprintScout

PreprintScout is a tool designed to help researchers, including student researchers, discover preprints that can inform current and future research endeavors. It provides an effective means to stay updated on recent preprints, track current trends in your field, and explore trends in other disciplines. PreprintScout uses LLMs and NLP methods to create personalized recommendations.

Features

  • Recommendations: Receive tailored recommendations for recent academic preprints based on your unique background and interests.
  • LLMs and NLP: Leverage your preferred LLMs (e.g., OpenAI GPT, Google Gemini) and NLP methods (e.g., cosine similarity) to obtain the most relevant recommendations
  • Interdiscipinary: Explore research both near and distance from your home discipline based on your preferences; including preprints from arxiv, OSF (which aggregates 25+ preprint services), and PhilArchive.

Installation

    pip install preprintscout
or
    git clone https://github.com/yourusername/PreprintScout.git

Dependencies

arxiv>=2.1.3
beautifulsoup4>=4.12.3
google-generativeai>=0.7.0
langdetect>=1.0.9
openai>=1.35.7
pandas>=2.2.2
pytz>=2024.1
Requests>=2.32.3
retry>=0.9.2
scikit_learn>=1.4.2

Run the application:

import preprintscout

preprintscout(your_short_biography, huggingface_api_key, openai_api_key="xxxxxx", google_api_key=None, interdisciplinary="3", output_path="/path/to/output")

Configuration

Required -- Write a short (150 to 200 words) biographical statement about your research, interests, academic background, and so forth.

This is used for creating recommendations that will be of interest to you!

your_short_biography = "I am a professor of engineering management. My research is in the application of artificial intelligence in managing engineering systems for electical vehicles... "

Required -- For recommendations include an API key from your HuggingFace account. For security you can also store this as an environment variable. For example, "${HF_API_KEY}"

huggingface_api_key = "xxxxxxxxxxxxxxxxxxx"

Required -- Must have one, either an OpenAI or Google Gemini API key.

  • Optional -- For LLM based recommendations include an API key from your OpenAI account. For security you can also store this as an environment variable. For example, "${OPENAI_API_KEY}"
openai_api_key =  "xxxxxxxxxxxxxxxxxxx"
  • Optional -- For LLM based recommendations include an API key from your OpenAI account. For security you can also store this as an environment variable. For example, "${GOOGLE_API_KEY}"
google_api_key =  "xxxxxxxxxxxxxxxxxxx"

Optional -- Using 1 to 4, indicate your interest in research outside of your home discipline

  • 1 = connected disciplines
  • 2 = adjacent disciplines (default)
  • 3 = tangential disciplines
  • 4 = peripheral disciplines
interdisciplinary = "3"

Optional -- You can save JSON copies of recommendations include the path.

output_path = "/path/to/output"
This will create the directory if it doesn't exist yet. Be sure that has a the leading slash. For instance, on a Mac you could have ""/Users/your_name/recommendations/preprints"

Here is a complete example (note that opetional arguments have to be labeled in the function):

import preprintscout as pps

your_short_biography = "I am a professor of engineering management. My research is in the application of artificial intelligence in managing engineering systems for electical vehicles... "
huggingface_api_key = "xxxxxxxxxxxxxxxxxxx"

pps(your_short_biography, huggingface_api_key, openai_api_key="xxxxxx", google_api_key=None, interdisciplinary="3", output_path="/path/to/output")

Output

The return is a JSON file grouped by type of recommendation, with 5 recommendations for each category. Here an example with one recommendation for each category and reduced descriptions.

  • arxiv_llm_recs = recommendations of recent preprints from arxiv.org determined by your selected LLM and based on your profile description.
  • osf_phil_llm_recs = recommendations of recent preprints from OSF.io and PhilArchive.org determined by your selected LLM and based on your profile description.
  • arxiv_cosine_ranked = recommendations of recent preprints from arxiv.org determined by using cosine-similary of the article abstract and your profile description.
  • osf_phil_cosine_ranked = recommendations of recent preprints from OSF.io and PhilArchive.org determined by using cosine-similary of the article abstract and your profile description.
  • interdisciplinary_ranked = a reranking of all recent prepreints (arxiv, OSF, and PhilArchive) based on the semanitic distance from your home discipline and your interdiscplinary interest setting.
  • score = cosine-similarity of the article abstract and your profile description.
  • dissim_value = Semanitic distance between the article abstract and your home discipline.
{
    "arxiv_llm_recs": [
        {
            "article": [
                "1.2 Computer and information sciences",
                "http://arxiv.org/abs/2406.19334v1",
                "Multi-RIS-Empowered Multiple Access: A Distributed Sum-Rate Maximization Approach",
                "The plethora of wirelessly connected devices, whose deployment density ...",
                "This article presents a new communication scheme for 6G wireless networks..."
            ],
            "dissim_value": 0.120610034
        },

    ],
    "osf_phil_llm_recs": [
        {
            "article": [
                "5.1 Psychology",
                "https://osf.io/preprints/psyarxiv/fqzd2/",
                "Who is We: Capturing (European) Identity Content by Integrating Qualitative Methods in Survey-Based Approaches",
                "European identity can mean different things to different people. Yet, past quantitative research ...",
                "This article presents two methods to assess European identity content that can be implemented in survey research."
            ],
            "dissim_value": 0.455544972
        },

    ],
    "arxiv_cosine_ranked": [
        {
            "article": [
                "1.2 Computer and information sciences",
                "http://arxiv.org/abs/2406.19296v1",
                "Vehicle-to-Grid Technology meets Packetized Energy Management",
                "The global energy landscape is experiencing a significant transformation...",
                "This article presents a co-simulation platform to investigate integration of V2G with PET in microgrid..."
            ],
            "score": 0.18289190091265142,
            "dissim_value": 0.120610034
        },

    ],
    "osf_phil_cosine_ranked": [
        {
            "article": [
                "6.3 Philosophy ethics and religion",
                "https://philarchive.org/rec/JIAAAD",
                "AGGA: A Dataset of Academic Guidelines for Generative AIs.",
                "AGGA (Academic Guidelines for Generative AIs) is a dataset of 80 academic guidelines for the usage of generative AIs and large language models in academia...",
                "The article introduces a dataset of 80 academic guidelines for the usage of generative AIs...",
                "The article introduces a dataset of 80 academic guidelines for the usage of generative AIs..."
            ],
            "score": 0.33518822100626167,
            "dissim_value": 0.559099337
        },

    ],
    "interdisciplinary_ranked": [
        {
            "article": [
                "2.2 Electrical engineering; electronic engineering; information engineering",
                "http://arxiv.org/abs/2406.19305v1",
                "A Max Pressure Algorithm for Traffic Signals Considering Pedestrian Queues",
                "This paper proposes a novel max-pressure (MP) algorithm that incorporates\npedestrian traffic into the MP control architecture. Pedestrians are modeled as\nbeing included in one of two groups: those walking on sidewalks...",
                "This article proposes a novel max-pressure algorithm that incorporates pedestrian traffic..."
            ],
            "score": 0.12583606498802957,
            "dissim_value": 0.1
        },
    ]
}

Usage

  • Launch the app and explore recent preprints that are recommended based on your profile.
  • Get interdisciplinary recommendations from fields close or far from your home discipline.
  • Save preprints to your personalized output directory for easy access.
  • Easily add a daily email of recommended preprint using GitHub Actions (link coming soon)

Known Issues

Check Github.

Contributing

Contributions are welcome! Please read our contributing guidelines for more details (coming soon).

License

This project is licensed under the MIT License. See the LICENSE file for details.

Contact

For questions or feedback, please contact me on Github.

Troubleshooting

If you encounter issues importing 'preprintscout' after installing the package, follow these steps to find the installation path and update the 'PYTHONPATH'.

Step 1: Verify and find the installation path:

Use the '>>> pip show preprintscout' command and note the location. For example, if the location is /Users/yourusername/.local/lib/python3.9/site-packages, you will use this path.
pip show preprintscout

Step 2: Update PYTHONPATH:

Temporarily update PYTHONPATH in your terminal session:
export PYTHONPATH=$PYTHONPATH:/Users/yourusername/.local/lib/python3.9/site-packages

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

preprintscout-0.1.4.tar.gz (44.4 kB view details)

Uploaded Source

Built Distribution

preprintscout-0.1.4-py2.py3-none-any.whl (46.7 kB view details)

Uploaded Python 2 Python 3

File details

Details for the file preprintscout-0.1.4.tar.gz.

File metadata

  • Download URL: preprintscout-0.1.4.tar.gz
  • Upload date:
  • Size: 44.4 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: python-httpx/0.26.0

File hashes

Hashes for preprintscout-0.1.4.tar.gz
Algorithm Hash digest
SHA256 7de71ea6b28c73652ba28eb89df2e7db0e59a57471fd78e0fc4c6f8a99831746
MD5 e06f0b4202220ea69df33ce49a8ed4b2
BLAKE2b-256 703ceb24715b84f67d60c1edd3061acd1d4cd942f97116d48129bcb98d4a4529

See more details on using hashes here.

File details

Details for the file preprintscout-0.1.4-py2.py3-none-any.whl.

File metadata

File hashes

Hashes for preprintscout-0.1.4-py2.py3-none-any.whl
Algorithm Hash digest
SHA256 ae2e174327cf40c772182c6e61f27ac14029cf02e5986df1b2f973d86ff86acf
MD5 d3e457ba1f9bb9e8316059e0febca75b
BLAKE2b-256 421ee57dd9d20ffbef84dd5250ce7f1ed73d4822d8e4db81120795d088f9b90a

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page