Skip to main content

llama-index readers semanticscholar integration

Project description

Semantic Scholar Loader

pip install llama-index-readers-semanticscholar

pip install llama-index-llms-openai

Welcome to Semantic Scholar Loader. This module serves as a crucial utility for researchers and professionals looking to get scholarly articles and publications from the Semantic Scholar database.

For any research topic you are interested in, this loader reads relevant papers from a search result in Semantic Scholar into Documents.

Please go through demo_s2.ipynb

Some preliminaries -

  • query_space : broad area of research
  • query_string : a specific question to the documents in the query space

UPDATE :

To download the open access pdfs and extract text from them, simply mark the full_text flag as True :

s2reader = SemanticScholarReader()
documents = s2reader.load_data(query_space, total_papers, full_text=True)

Usage

Here is an example of how to use this loader in llama_index and get citations for a given query.

LlamaIndex

from llama_index.llms.openai import OpenAI
from llama_index.core.query_engine import CitationQueryEngine
from llama_index.core import VectorStoreIndex, ServiceContext
from llama_index.readers.semanticscholar import SemanticScholarReader

s2reader = SemanticScholarReader()

# narrow down the search space
query_space = "large language models"

# increase limit to get more documents
documents = s2reader.load_data(query=query_space, limit=10)

service_context = ServiceContext.from_defaults(
    llm=OpenAI(model="gpt-3.5-turbo", temperature=0)
)
index = VectorStoreIndex.from_documents(
    documents, service_context=service_context
)

query_engine = CitationQueryEngine.from_args(
    index,
    similarity_top_k=3,
    citation_chunk_size=512,
)

# query the index
response = query_engine.query("limitations of using large language models")
print("Answer: ", response)
print("Source nodes: ")
for node in response.source_nodes:
    print(node.node.metadata)

Output

Answer:  The limitations of using large language models include the struggle to learn long-tail knowledge [2], the need for scaling by many orders of magnitude to reach competitive performance on questions with little support in the pre-training data [2], and the difficulty in synthesizing complex programs from natural language descriptions [3].
Source nodes:
{'venue': 'arXiv.org', 'year': 2022, 'paperId': '3eed4de25636ac90f39f6e1ef70e3507ed61a2a6', 'citationCount': 35, 'openAccessPdf': None, 'authors': ['M. Shanahan'], 'title': 'Talking About Large Language Models'}
{'venue': 'arXiv.org', 'year': 2022, 'paperId': '6491980820d9c255b9d798874c8fce696750e0d9', 'citationCount': 31, 'openAccessPdf': None, 'authors': ['Nikhil Kandpal', 'H. Deng', 'Adam Roberts', 'Eric Wallace', 'Colin Raffel'], 'title': 'Large Language Models Struggle to Learn Long-Tail Knowledge'}
{'venue': 'arXiv.org', 'year': 2021, 'paperId': 'a38e0f993e4805ba8a9beae4c275c91ffcec01df', 'citationCount': 305, 'openAccessPdf': None, 'authors': ['Jacob Austin', 'Augustus Odena', 'Maxwell Nye', 'Maarten Bosma', 'H. Michalewski', 'David Dohan', 'Ellen Jiang', 'Carrie J. Cai', 'Michael Terry', 'Quoc V. Le', 'Charles Sutton'], 'title': 'Program Synthesis with Large Language Models'}

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

Built Distribution

File details

Details for the file llama_index_readers_semanticscholar-0.2.0.tar.gz.

File metadata

File hashes

Hashes for llama_index_readers_semanticscholar-0.2.0.tar.gz
Algorithm Hash digest
SHA256 62f2711da3855ab3a3f630da6f36b27bb0a9217d2c038442c4d67a367ec2b042
MD5 bd3b88ba20bcc995634829dcebc8fb40
BLAKE2b-256 691668dca3f78b373929bb766a73fe897606e97f22a774da8ee628e1bed9d94a

See more details on using hashes here.

File details

Details for the file llama_index_readers_semanticscholar-0.2.0-py3-none-any.whl.

File metadata

File hashes

Hashes for llama_index_readers_semanticscholar-0.2.0-py3-none-any.whl
Algorithm Hash digest
SHA256 1099bd2fe4d906f852d8b6d30dacf0edeb0f64c8471bcc5712bf6a2b78157c8a
MD5 4f0f353a504af53d19691c74b945d4e7
BLAKE2b-256 bdfbf7c7da1377c03817e5940380c0fe771b3e1f4fbb6d6aea3877f5f763d491

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page