Skip to main content

the easiest way to download many articles from biorxiv

Project description

bio2csv

bio2csv is a Python package that allows you to easily scrape biology research papers from BioRxiv. It retrieves details such as the title, authors, and link of each paper, and it can also fetch the abstract and the full text if specified.

Installation

You can install the bio2csv package with pip:

pip install bio2csv

Usage

Here's a simple usage example:

from bio2csv import scrape_biorxiv

# Scrape the first 10 pages of genetics collection and get the abstract and full text of each paper
df = scrape_biorxiv(pages=10, get_abstract=True, get_full_text=True)

# Print the resulting DataFrame
print(df)

You can also use the fetch_paper_details function to fetch the abstract and full text of a single paper:

from bio2csv import fetch_paper_details
import requests

# Initialize a session
session = requests.Session()

# URL of the paper
paper_url = "https://www.biorxiv.org/content/10.1101/2023.06.16.448484v1"

# Fetch details
abstract, full_text = fetch_paper_details(paper_url, session)

# Print details
print(f"Abstract: {abstract}")
print(f"Full Text: {full_text}")

Please note that the fetch_paper_details function needs an active requests.Session() to work.

Contributing

Contributions to bio2csv are welcome! If you have a feature request, bug report, or proposal, please open an issue on this repository. If you wish to contribute code, please fork the repository, make your changes, and submit a pull request.

License

bio2csv is released under the MIT License. For more details, see the LICENSE file in this repository.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

bio2csv-0.0.1.tar.gz (4.0 kB view hashes)

Uploaded Source

Built Distribution

bio2csv-0.0.1-py3-none-any.whl (3.9 kB view hashes)

Uploaded Python 3

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page