Skip to main content

llama-index readers semanticscholar integration

Project description

Semantic Scholar Loader

pip install llama-index-readers-semanticscholar

pip install llama-index-llms-openai

Welcome to Semantic Scholar Loader. This module serves as a crucial utility for researchers and professionals looking to get scholarly articles and publications from the Semantic Scholar database.

For any research topic you are interested in, this loader reads relevant papers from a search result in Semantic Scholar into Documents.

Please go through demo_s2.ipynb

Some preliminaries -

  • query_space : broad area of research
  • query_string : a specific question to the documents in the query space

UPDATE :

To download the open access pdfs and extract text from them, simply mark the full_text flag as True :

s2reader = SemanticScholarReader()
documents = s2reader.load_data(query_space, total_papers, full_text=True)

Usage

Here is an example of how to use this loader in llama_index and get citations for a given query.

LlamaIndex

from llama_index.llms.openai import OpenAI
from llama_index.core.query_engine import CitationQueryEngine
from llama_index.core import VectorStoreIndex, ServiceContext
from llama_index.readers.semanticscholar import SemanticScholarReader

s2reader = SemanticScholarReader()

# narrow down the search space
query_space = "large language models"

# increase limit to get more documents
documents = s2reader.load_data(query=query_space, limit=10)

service_context = ServiceContext.from_defaults(
    llm=OpenAI(model="gpt-3.5-turbo", temperature=0)
)
index = VectorStoreIndex.from_documents(
    documents, service_context=service_context
)

query_engine = CitationQueryEngine.from_args(
    index,
    similarity_top_k=3,
    citation_chunk_size=512,
)

# query the index
response = query_engine.query("limitations of using large language models")
print("Answer: ", response)
print("Source nodes: ")
for node in response.source_nodes:
    print(node.node.metadata)

Output

Answer:  The limitations of using large language models include the struggle to learn long-tail knowledge [2], the need for scaling by many orders of magnitude to reach competitive performance on questions with little support in the pre-training data [2], and the difficulty in synthesizing complex programs from natural language descriptions [3].
Source nodes:
{'venue': 'arXiv.org', 'year': 2022, 'paperId': '3eed4de25636ac90f39f6e1ef70e3507ed61a2a6', 'citationCount': 35, 'openAccessPdf': None, 'authors': ['M. Shanahan'], 'title': 'Talking About Large Language Models'}
{'venue': 'arXiv.org', 'year': 2022, 'paperId': '6491980820d9c255b9d798874c8fce696750e0d9', 'citationCount': 31, 'openAccessPdf': None, 'authors': ['Nikhil Kandpal', 'H. Deng', 'Adam Roberts', 'Eric Wallace', 'Colin Raffel'], 'title': 'Large Language Models Struggle to Learn Long-Tail Knowledge'}
{'venue': 'arXiv.org', 'year': 2021, 'paperId': 'a38e0f993e4805ba8a9beae4c275c91ffcec01df', 'citationCount': 305, 'openAccessPdf': None, 'authors': ['Jacob Austin', 'Augustus Odena', 'Maxwell Nye', 'Maarten Bosma', 'H. Michalewski', 'David Dohan', 'Ellen Jiang', 'Carrie J. Cai', 'Michael Terry', 'Quoc V. Le', 'Charles Sutton'], 'title': 'Program Synthesis with Large Language Models'}

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

llama_index_readers_semanticscholar-0.4.0.tar.gz (6.6 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

File details

Details for the file llama_index_readers_semanticscholar-0.4.0.tar.gz.

File metadata

File hashes

Hashes for llama_index_readers_semanticscholar-0.4.0.tar.gz
Algorithm Hash digest
SHA256 e05c8b83d20c4fb94a1f942f01a248bdd565658cfe77d1ed62bc2f8b1da8297e
MD5 0f066519b9f604d657038de034ffafa1
BLAKE2b-256 78e08cba68020e6c2b614bfb99f44d34be59a1c13c043ea6094814d7c4c921ab

See more details on using hashes here.

File details

Details for the file llama_index_readers_semanticscholar-0.4.0-py3-none-any.whl.

File metadata

File hashes

Hashes for llama_index_readers_semanticscholar-0.4.0-py3-none-any.whl
Algorithm Hash digest
SHA256 03c021e2c736432c603da3fec0681e243053fa49c2323baa190a31c4bf116965
MD5 3887665700953acbdef4efcf05282ab9
BLAKE2b-256 0d08fbd8f53e3aca2acde5ae761a99750f39cb2378371a1ae3cc64b2757c2c2e

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page