A library for fetching scientific articles from various sources
Project description
Scista
Scista - Python library for searching and downloading scientific articles from various sources, including OpenAlex, CORE and Unpaywall.
Functionality
- Search for articles by topic, category and date
- Get article metadata (title, authors, DOI, etc.)
- Download full texts and PDF versions of articles
- Support for multiple data sources
Installation
pip install scista
Requirements
- Python 3.7+
- API key for CORE (get it on CORE API)
- Email for Unpaywall
Search Filters
The library provides several optional search filters that can be used individually or in combination:
articles = fetcher.fetch_articles(
topic="quantum computing", # Search by topic in title
num_articles=5, # Number of articles to fetch (default: 5)
categories=["Physics"], # Filter by scientific categories
from_date="2023-01-01", # Start date (format: YYYY-MM-DD)
to_date="2023-12-31", # End date (format: YYYY-MM-DD)
sort_by_date=True, # Sort by date (newest first if True)
journals=["1234-5678"] # Filter by journal ISSN(s)
)
All filters are optional. You can use any combination of them:
# Search only by topic
articles = fetcher.fetch_articles(topic="quantum computing")
# Search by category and date range
articles = fetcher.fetch_articles(
categories=["Physics"],
from_date="2023-01-01",
to_date="2023-12-31"
)
# Get latest articles from specific journals
articles = fetcher.fetch_articles(
journals=["1234-5678", "8765-4321"],
sort_by_date=True,
num_articles=10
)
Filter Details
topic: Search for articles with this topic in the titlenum_articles: Maximum number of articles to fetch (default: 5)categories: Scientific categories to filter by. Can be a single category or a listfrom_date: Start date in YYYY-MM-DD formatto_date: End date in YYYY-MM-DD formatsort_by_date: If True, sorts by date descending (newest first)journals: Filter by journal ISSN(s). Can be a single ISSN or a list
Return Values
The fetch_articles() method returns a list of Article objects. Each Article object contains:
class Article:
title: str # Title of the article
doi: str # Digital Object Identifier
publication_date: str # Publication date in YYYY-MM-DD format
text: str | None # Full text or abstract (if available)
pdf_url: str | None # URL to download PDF (if available)
Example of returned data:
articles = fetcher.fetch_articles(topic="quantum computing", num_articles=1)
article = articles[0]
print(article)
# Output:
# Title: Quantum Computing: A New Era of Computation
# DOI: 10.1234/example.doi.2023
# Date: 2023-12-25
# Text: This paper explores the fundamentals of quantum computing...
# PDF URL: https://example.com/article.pdf
# Access individual fields
print(article.title) # Get article title
print(article.doi) # Get DOI
print(article.publication_date)# Get publication date
print(article.text) # Get full text/abstract
print(article.pdf_url) # Get PDF URL
# Save PDF if available
if article.pdf_url:
article.save_pdf("article.pdf")
Notes about returned data:
text: Can contain either full text (from CORE) or abstract (from Unpaywall)pdf_url: URL for PDF download, available if article is found in CORE or Unpaywalldoi: May be None for some articlespublication_date: Always provided, but format may vary depending on source- Articles are returned in order specified by
sort_by_dateparameter
Methods
Article class methods
save_pdf(path: str) -> bool
Saves the article's PDF to a file if available.
article = articles[0]
success = article.save_pdf("article.pdf")
if success:
print("PDF successfully saved")
else:
print("Failed to save PDF or PDF not available")
Parameters:
path: Path where to save the PDF file
Returns:
bool: True if PDF was saved successfully, False if PDF is not available or there was an error
Error handling:
- Returns False if PDF URL is not available
- Returns False if download fails (HTTP error)
- Returns False if file cannot be written
- Logs appropriate error messages through the logging system
Usage
import logging
from scista import ArticleFetcher
# Configure logging (optional)
logging.basicConfig(
level=logging.INFO, # Logging level (DEBUG, INFO, WARNING, ERROR, CRITICAL)
format='%(asctime)s - %(name)s - %(levelname)s - %(message)s'
)
# Initialize with your API keys
fetcher = ArticleFetcher(
core_api_key="your_core_api_key",
email_for_unpaywall="your_email@example.com"
)
# Search for articles
articles = fetcher.fetch_articles(
topic="quantum computing", # Topic to search
num_articles=5, # Number of articles
categories=["Physics"], # Category
from_date="2023-01-01", # Start date
to_date="2023-12-31", # End date
sort_by_date=True # Sort by date
)
# Process results
for i, article in enumerate(articles, 1):
print(f"\nArticle {i}:")
print(article)
# Save PDF if available
if article.pdf_url:
article.save_pdf(f"article_{i}.pdf")
Logging
The library uses the standard logging Python module. You can configure logging to your needs:
import logging
# Basic configuration
logging.basicConfig(
level=logging.INFO, # Logging level
format='%(asctime)s - %(name)s - %(levelname)s - %(message)s'
)
# Or more complex configuration
logger = logging.getLogger('scista')
handler = logging.FileHandler('scista.log')
handler.setFormatter(logging.Formatter('%(asctime)s - %(name)s - %(levelname)s - %(message)s'))
logger.addHandler(handler)
logger.setLevel(logging.DEBUG)
Logging levels:
- DEBUG: Detailed debug information
- INFO: Confirmation of successful operations
- WARNING: Warnings about potential problems
- ERROR: Errors that do not interrupt the program
- CRITICAL: Critical errors
License
MIT License
Author
AlestackOverglow
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file scista-0.1.1.tar.gz.
File metadata
- Download URL: scista-0.1.1.tar.gz
- Upload date:
- Size: 7.8 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.1.0 CPython/3.11.1
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
56dabecfa08f0a3c8f6ed7dfb4ec02b5441b1be1db2053826d22c9c83de3f9b3
|
|
| MD5 |
14754d1b029c3e604b8893b000785964
|
|
| BLAKE2b-256 |
d8b5308266f9041a7d0e477223fcce6edff68138582f2863f8e3641b20e2763e
|
File details
Details for the file scista-0.1.1-py3-none-any.whl.
File metadata
- Download URL: scista-0.1.1-py3-none-any.whl
- Upload date:
- Size: 7.8 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.1.0 CPython/3.11.1
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
f6b257703b6c2c2a5bf8acaa2a6959e6d55f802d56858633e84821045e001465
|
|
| MD5 |
3f08ae574ca45cdc4b2af31255814907
|
|
| BLAKE2b-256 |
79682aa982a2a9c2d59a48b2b839ac2ff300d64c361ef1f74b0e86958f214162
|