Skip to main content

llama-index readers semanticscholar integration

Project description

Semantic Scholar Loader

pip install llama-index-readers-semanticscholar

pip install llama-index-llms-openai

Welcome to Semantic Scholar Loader. This module serves as a crucial utility for researchers and professionals looking to get scholarly articles and publications from the Semantic Scholar database.

For any research topic you are interested in, this loader reads relevant papers from a search result in Semantic Scholar into Documents.

Please go through demo_s2.ipynb

Some preliminaries -

  • query_space : broad area of research
  • query_string : a specific question to the documents in the query space

UPDATE :

To download the open access pdfs and extract text from them, simply mark the full_text flag as True :

s2reader = SemanticScholarReader()
documents = s2reader.load_data(query_space, total_papers, full_text=True)

Usage

Here is an example of how to use this loader in llama_index and get citations for a given query.

LlamaIndex

from llama_index.llms.openai import OpenAI
from llama_index.core.query_engine import CitationQueryEngine
from llama_index.core import VectorStoreIndex, ServiceContext
from llama_index.readers.semanticscholar import SemanticScholarReader

s2reader = SemanticScholarReader()

# narrow down the search space
query_space = "large language models"

# increase limit to get more documents
documents = s2reader.load_data(query=query_space, limit=10)

service_context = ServiceContext.from_defaults(
    llm=OpenAI(model="gpt-3.5-turbo", temperature=0)
)
index = VectorStoreIndex.from_documents(
    documents, service_context=service_context
)

query_engine = CitationQueryEngine.from_args(
    index,
    similarity_top_k=3,
    citation_chunk_size=512,
)

# query the index
response = query_engine.query("limitations of using large language models")
print("Answer: ", response)
print("Source nodes: ")
for node in response.source_nodes:
    print(node.node.metadata)

Output

Answer:  The limitations of using large language models include the struggle to learn long-tail knowledge [2], the need for scaling by many orders of magnitude to reach competitive performance on questions with little support in the pre-training data [2], and the difficulty in synthesizing complex programs from natural language descriptions [3].
Source nodes:
{'venue': 'arXiv.org', 'year': 2022, 'paperId': '3eed4de25636ac90f39f6e1ef70e3507ed61a2a6', 'citationCount': 35, 'openAccessPdf': None, 'authors': ['M. Shanahan'], 'title': 'Talking About Large Language Models'}
{'venue': 'arXiv.org', 'year': 2022, 'paperId': '6491980820d9c255b9d798874c8fce696750e0d9', 'citationCount': 31, 'openAccessPdf': None, 'authors': ['Nikhil Kandpal', 'H. Deng', 'Adam Roberts', 'Eric Wallace', 'Colin Raffel'], 'title': 'Large Language Models Struggle to Learn Long-Tail Knowledge'}
{'venue': 'arXiv.org', 'year': 2021, 'paperId': 'a38e0f993e4805ba8a9beae4c275c91ffcec01df', 'citationCount': 305, 'openAccessPdf': None, 'authors': ['Jacob Austin', 'Augustus Odena', 'Maxwell Nye', 'Maarten Bosma', 'H. Michalewski', 'David Dohan', 'Ellen Jiang', 'Carrie J. Cai', 'Michael Terry', 'Quoc V. Le', 'Charles Sutton'], 'title': 'Program Synthesis with Large Language Models'}

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

llama_index_readers_semanticscholar-0.5.0.tar.gz (6.6 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

File details

Details for the file llama_index_readers_semanticscholar-0.5.0.tar.gz.

File metadata

  • Download URL: llama_index_readers_semanticscholar-0.5.0.tar.gz
  • Upload date:
  • Size: 6.6 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: uv/0.10.9 {"installer":{"name":"uv","version":"0.10.9","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"Ubuntu","version":"24.04","id":"noble","libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":true}

File hashes

Hashes for llama_index_readers_semanticscholar-0.5.0.tar.gz
Algorithm Hash digest
SHA256 12a5a0ec935df8ecbfc5280f8b12efc86cf52368aef0a31341e2a7b80a62bdbc
MD5 ae0939153df3ffc6dd508c189b7adab2
BLAKE2b-256 b90afb359a2bc335c417b028a78b60f61a629e606ba627b4bc6b90be0bf978dc

See more details on using hashes here.

File details

Details for the file llama_index_readers_semanticscholar-0.5.0-py3-none-any.whl.

File metadata

  • Download URL: llama_index_readers_semanticscholar-0.5.0-py3-none-any.whl
  • Upload date:
  • Size: 6.3 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: uv/0.10.9 {"installer":{"name":"uv","version":"0.10.9","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"Ubuntu","version":"24.04","id":"noble","libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":true}

File hashes

Hashes for llama_index_readers_semanticscholar-0.5.0-py3-none-any.whl
Algorithm Hash digest
SHA256 3c49630f33adee35dd0b5d4a3094a8f18f509657545e46dfb3b116cd5066098a
MD5 1b5121a533a9c5e68d0c88d8ac4efc03
BLAKE2b-256 70891a90b4db7dae16cd1957f21157b03786607398a949756f95e4c1ded7134e

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page