Skip to main content

A Python package for merging, preprocessing, and enriching bibliographic data from WoS, Scopus, and OpenAlex.

Project description

Preprocessing Package

This package provides utilities to preprocess, clean, and harmonize bibliographic data from multiple scientific sources, primarily Web of Science (WoS) and Scopus.
It is designed to support bibliometric and scientometric analyses by transforming raw exports into structured pandas DataFrames.

⚠️ Status: under active development. APIs and internal structures may change.


Features

  • Parsing of raw bibliographic exports into structured DataFrames
  • Support for multiple data sources (WoS, Scopus, Crossref, OpenAlex)
  • Reference enrichment and linkage across sources
  • Designed for reproducible research workflows

Core Functions

  • wos_df()
    Transforms Web of Science .txt export files into pandas DataFrames.

  • scopus_df()
    Converts Scopus .bib export files into pandas DataFrames.

  • doi_crossref()
    Queries the Crossref API to retrieve metadata associated with a given DOI.

  • scopus_ref()
    Processes and links article references, identifying relationships between cited documents.


Installation

Clone the repository and install the required dependencies:

pip install -r requirements.txt

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

bibfusion-1.0.0.tar.gz (60.5 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

bibfusion-1.0.0-py3-none-any.whl (76.9 kB view details)

Uploaded Python 3

File details

Details for the file bibfusion-1.0.0.tar.gz.

File metadata

  • Download URL: bibfusion-1.0.0.tar.gz
  • Upload date:
  • Size: 60.5 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.13.1

File hashes

Hashes for bibfusion-1.0.0.tar.gz
Algorithm Hash digest
SHA256 bcc991f0da9988d711b80f7d8a042d7b25ca939a494870547680b1206609a15c
MD5 cfaa068be201744c1b0769456abf1170
BLAKE2b-256 3f6fcf8a852b9e00c400f753d4101bb487d6221276808e5295f53e6fab4bf3a2

See more details on using hashes here.

File details

Details for the file bibfusion-1.0.0-py3-none-any.whl.

File metadata

  • Download URL: bibfusion-1.0.0-py3-none-any.whl
  • Upload date:
  • Size: 76.9 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.13.1

File hashes

Hashes for bibfusion-1.0.0-py3-none-any.whl
Algorithm Hash digest
SHA256 44d997906f8063cfb2140dd8be07a6b128119d894450d5395801950a392ae033
MD5 1c6caefa8a06ef417a9f59b0b5fc7561
BLAKE2b-256 aa9fa94d76927372ecf5eb2451fc9b1e39a2187ee4002bccbc6834375e5549f6

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page