Preprint article recommender
Project description
PreprintScout
PreprintScout is a tool designed to help researchers, including student researchers, discover preprints that can inform current and future research endeavors. It provides an effective means to stay updated on recent preprints, track current trends in your field, and explore trends in other disciplines. PreprintScout uses LLMs and NLP methods to create personalized recommendations.
Features
- Recommendations: Receive tailored recommendations for recent academic preprints based on your unique background and interests.
- LLMs and NLP: Leverage your preferred LLMs (e.g., OpenAI GPT, Google Gemini) and NLP methods (e.g., cosine similarity) to obtain the most relevant recommendations
- Interdiscipinary: Explore research both near and distance from your home discipline based on your preferences; including preprints from arxiv, OSF (which aggregates 25+ preprint services), and PhilArchive.
Installation
pip install preprintscout
or
git clone https://github.com/yourusername/PreprintScout.git
Dependencies
arxiv>=2.1.3
beautifulsoup4>=4.12.3
google-generativeai>=0.7.0
langdetect>=1.0.9
openai>=1.35.7
pandas>=2.2.2
pytz>=2024.1
Requests>=2.32.3
retry>=0.9.2
scikit_learn>=1.4.2
Run the application:
import preprintscout
preprintscout(your_short_biography, huggingface_api_key, openai_api_key="xxxxxx", google_api_key=None, interdisciplinary="3", output_path="/path/to/output")
Configuration
Required -- Write a short (150 to 200 words) biographical statement about your research, interests, academic background, and so forth.
This is used for creating recommendations that will be of interest to you!
your_short_biography = "I am a professor of engineering management. My research is in the application of artificial intelligence in managing engineering systems for electical vehicles... "
Required -- For recommendations include an API key from your HuggingFace account. For security you can also store this as an environment variable. For example, "${HF_API_KEY}"
huggingface_api_key = "xxxxxxxxxxxxxxxxxxx"
Required -- Must have one, either an OpenAI or Google Gemini API key.
-
Optional -- For LLM based recommendations include an API key from your OpenAI account. For security you can also store this as an environment variable. For example, "${OPENAI_API_KEY}"
openai_api_key = "xxxxxxxxxxxxxxxxxxx"
-
Optional -- For LLM based recommendations include an API key from your OpenAI account. For security you can also store this as an environment variable. For example, "${GOOGLE_API_KEY}"
google_api_key = "xxxxxxxxxxxxxxxxxxx"
Optional -- Using 1 to 4, indicate your interest in research outside of your home discipline
-
1 = connected disciplines
-
2 = adjacent disciplines (default)
-
3 = tangential disciplines
-
4 = peripheral disciplines
interdisciplinary = "3"
Optional -- You can save JSON copies of recommendations include the path.
output_path = "/path/to/output"
This will create the directory if it doesn't exist yet. Be sure that has a the leading slash. For instance, on a Mac you could have ""/Users/your_name/recommendations/preprints"
Here is a complete example (note that opetional arguments have to be labeled in the function):
import preprintscout as pps
your_short_biography = "I am a professor of engineering management. My research is in the application of artificial intelligence in managing engineering systems for electical vehicles... "
huggingface_api_key = "xxxxxxxxxxxxxxxxxxx"
pps(your_short_biography, huggingface_api_key, openai_api_key="xxxxxx", google_api_key=None, interdisciplinary="3", output_path="/path/to/output")
Output
The return is a JSON file grouped by type of recommendation, with 5 recommendations for each category. Here an example with one recommendation for each category and reduced descriptions.
-
arxiv_llm_recs = recommendations of recent preprints from arxiv.org determined by your selected LLM and based on your profile description.
-
osf_phil_llm_recs = recommendations of recent preprints from OSF.io and PhilArchive.org determined by your selected LLM and based on your profile description.
-
arxiv_cosine_ranked = recommendations of recent preprints from arxiv.org determined by using cosine-similary of the article abstract and your profile description.
-
osf_phil_cosine_ranked = recommendations of recent preprints from OSF.io and PhilArchive.org determined by using cosine-similary of the article abstract and your profile description.
-
interdisciplinary_ranked = a reranking of all recent prepreints (arxiv, OSF, and PhilArchive) based on the semanitic distance from your home discipline and your interdiscplinary interest setting.
-
score = cosine-similarity of the article abstract and your profile description.
-
dissim_value = Semanitic distance between the article abstract and your home discipline.
{
"arxiv_llm_recs": [
{
"article": [
"1.2 Computer and information sciences",
"http://arxiv.org/abs/2406.19334v1",
"Multi-RIS-Empowered Multiple Access: A Distributed Sum-Rate Maximization Approach",
"The plethora of wirelessly connected devices, whose deployment density ...",
"This article presents a new communication scheme for 6G wireless networks..."
],
"dissim_value": 0.120610034
},
],
"osf_phil_llm_recs": [
{
"article": [
"5.1 Psychology",
"https://osf.io/preprints/psyarxiv/fqzd2/",
"Who is We: Capturing (European) Identity Content by Integrating Qualitative Methods in Survey-Based Approaches",
"European identity can mean different things to different people. Yet, past quantitative research ...",
"This article presents two methods to assess European identity content that can be implemented in survey research."
],
"dissim_value": 0.455544972
},
],
"arxiv_cosine_ranked": [
{
"article": [
"1.2 Computer and information sciences",
"http://arxiv.org/abs/2406.19296v1",
"Vehicle-to-Grid Technology meets Packetized Energy Management",
"The global energy landscape is experiencing a significant transformation...",
"This article presents a co-simulation platform to investigate integration of V2G with PET in microgrid..."
],
"score": 0.18289190091265142,
"dissim_value": 0.120610034
},
],
"osf_phil_cosine_ranked": [
{
"article": [
"6.3 Philosophy ethics and religion",
"https://philarchive.org/rec/JIAAAD",
"AGGA: A Dataset of Academic Guidelines for Generative AIs.",
"AGGA (Academic Guidelines for Generative AIs) is a dataset of 80 academic guidelines for the usage of generative AIs and large language models in academia...",
"The article introduces a dataset of 80 academic guidelines for the usage of generative AIs...",
"The article introduces a dataset of 80 academic guidelines for the usage of generative AIs..."
],
"score": 0.33518822100626167,
"dissim_value": 0.559099337
},
],
"interdisciplinary_ranked": [
{
"article": [
"2.2 Electrical engineering; electronic engineering; information engineering",
"http://arxiv.org/abs/2406.19305v1",
"A Max Pressure Algorithm for Traffic Signals Considering Pedestrian Queues",
"This paper proposes a novel max-pressure (MP) algorithm that incorporates\npedestrian traffic into the MP control architecture. Pedestrians are modeled as\nbeing included in one of two groups: those walking on sidewalks...",
"This article proposes a novel max-pressure algorithm that incorporates pedestrian traffic..."
],
"score": 0.12583606498802957,
"dissim_value": 0.1
},
]
}
Usage
- Launch the app and explore recent preprints that are recommended based on your profile.
- Get interdisciplinary recommendations from fields close or far from your home discipline.
- Save preprints to your personalized output directory for easy access.
- Easily add a daily email of recommended preprint using GitHub Actions (link coming soon)
Known Issues
Check Github.
Contributing
Contributions are welcome! Please read our contributing guidelines for more details (coming soon).
License
This project is licensed under the MIT License. See the LICENSE file for details.
Contact
For questions or feedback, please contact me on Github.
Troubleshooting
If you encounter issues importing 'preprintscout' after installing the package, follow these steps to find the installation path and update the 'PYTHONPATH'.
Step 1: Verify and find the installation path:
Use the '>>> pip show preprintscout' command and note the location. For example, if the location is /Users/yourusername/.local/lib/python3.9/site-packages, you will use this path.
pip show preprintscout
Step 2: Update PYTHONPATH:
Temporarily update PYTHONPATH in your terminal session:
export PYTHONPATH=$PYTHONPATH:/Users/yourusername/.local/lib/python3.9/site-packages
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
File details
Details for the file preprintscout-0.1.4.tar.gz
.
File metadata
- Download URL: preprintscout-0.1.4.tar.gz
- Upload date:
- Size: 44.4 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: python-httpx/0.26.0
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 7de71ea6b28c73652ba28eb89df2e7db0e59a57471fd78e0fc4c6f8a99831746 |
|
MD5 | e06f0b4202220ea69df33ce49a8ed4b2 |
|
BLAKE2b-256 | 703ceb24715b84f67d60c1edd3061acd1d4cd942f97116d48129bcb98d4a4529 |
File details
Details for the file preprintscout-0.1.4-py2.py3-none-any.whl
.
File metadata
- Download URL: preprintscout-0.1.4-py2.py3-none-any.whl
- Upload date:
- Size: 46.7 kB
- Tags: Python 2, Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: python-httpx/0.26.0
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | ae2e174327cf40c772182c6e61f27ac14029cf02e5986df1b2f973d86ff86acf |
|
MD5 | d3e457ba1f9bb9e8316059e0febca75b |
|
BLAKE2b-256 | 421ee57dd9d20ffbef84dd5250ce7f1ed73d4822d8e4db81120795d088f9b90a |