Skip to main content

llama-index readers semanticscholar integration

Project description

Semantic Scholar Loader

pip install llama-index-readers-semanticscholar

pip install llama-index-llms-openai

Welcome to Semantic Scholar Loader. This module serves as a crucial utility for researchers and professionals looking to get scholarly articles and publications from the Semantic Scholar database.

For any research topic you are interested in, this loader reads relevant papers from a search result in Semantic Scholar into Documents.

Please go through demo_s2.ipynb

Some preliminaries -

  • query_space : broad area of research
  • query_string : a specific question to the documents in the query space

UPDATE :

To download the open access pdfs and extract text from them, simply mark the full_text flag as True :

s2reader = SemanticScholarReader()
documents = s2reader.load_data(query_space, total_papers, full_text=True)

Usage

Here is an example of how to use this loader in llama_index and get citations for a given query.

LlamaIndex

from llama_index.llms.openai import OpenAI
from llama_index.core.query_engine import CitationQueryEngine
from llama_index.core import VectorStoreIndex, ServiceContext
from llama_index.readers.semanticscholar import SemanticScholarReader

s2reader = SemanticScholarReader()

# narrow down the search space
query_space = "large language models"

# increase limit to get more documents
documents = s2reader.load_data(query=query_space, limit=10)

service_context = ServiceContext.from_defaults(
    llm=OpenAI(model="gpt-3.5-turbo", temperature=0)
)
index = VectorStoreIndex.from_documents(
    documents, service_context=service_context
)

query_engine = CitationQueryEngine.from_args(
    index,
    similarity_top_k=3,
    citation_chunk_size=512,
)

# query the index
response = query_engine.query("limitations of using large language models")
print("Answer: ", response)
print("Source nodes: ")
for node in response.source_nodes:
    print(node.node.metadata)

Output

Answer:  The limitations of using large language models include the struggle to learn long-tail knowledge [2], the need for scaling by many orders of magnitude to reach competitive performance on questions with little support in the pre-training data [2], and the difficulty in synthesizing complex programs from natural language descriptions [3].
Source nodes:
{'venue': 'arXiv.org', 'year': 2022, 'paperId': '3eed4de25636ac90f39f6e1ef70e3507ed61a2a6', 'citationCount': 35, 'openAccessPdf': None, 'authors': ['M. Shanahan'], 'title': 'Talking About Large Language Models'}
{'venue': 'arXiv.org', 'year': 2022, 'paperId': '6491980820d9c255b9d798874c8fce696750e0d9', 'citationCount': 31, 'openAccessPdf': None, 'authors': ['Nikhil Kandpal', 'H. Deng', 'Adam Roberts', 'Eric Wallace', 'Colin Raffel'], 'title': 'Large Language Models Struggle to Learn Long-Tail Knowledge'}
{'venue': 'arXiv.org', 'year': 2021, 'paperId': 'a38e0f993e4805ba8a9beae4c275c91ffcec01df', 'citationCount': 305, 'openAccessPdf': None, 'authors': ['Jacob Austin', 'Augustus Odena', 'Maxwell Nye', 'Maarten Bosma', 'H. Michalewski', 'David Dohan', 'Ellen Jiang', 'Carrie J. Cai', 'Michael Terry', 'Quoc V. Le', 'Charles Sutton'], 'title': 'Program Synthesis with Large Language Models'}

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

Built Distribution

File details

Details for the file llama_index_readers_semanticscholar-0.3.0.tar.gz.

File metadata

File hashes

Hashes for llama_index_readers_semanticscholar-0.3.0.tar.gz
Algorithm Hash digest
SHA256 ed1b77fe31f84b2b3572f6f14df597e798afd43c18bdd3986a62ad01cceaa82f
MD5 1bedc83fa2496904b8940d59a44f0602
BLAKE2b-256 1caf0044de3f2aa71ba6b4ff906b11557443d1d6ecc632c77d8513aa455744e1

See more details on using hashes here.

File details

Details for the file llama_index_readers_semanticscholar-0.3.0-py3-none-any.whl.

File metadata

File hashes

Hashes for llama_index_readers_semanticscholar-0.3.0-py3-none-any.whl
Algorithm Hash digest
SHA256 41e4c8e6a90d665e0193079477346512f2282a4bbac7d1be7eb8872d9c7157a5
MD5 d7dcae90c3ec79d9e881f748fa9b2dce
BLAKE2b-256 1580502d6c0a8172be7413ba662aac6fa76f07e403c86adf78099f1af68ceb0b

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page