Skip to main content

Identify and merge duplicates in bibliographic records

Project description

status PyPI - Python Version
pre-commit GitHub Actions Workflow Status GitHub Actions Workflow Status GitHub Actions Workflow Status

Overview

BibDedupe is an open-source Python library for deduplication of bibliographic records, tailored for literature reviews. Unlike traditional deduplication methods, BibDedupe focuses on entity resolution, linking duplicate records instead of simply deleting them.

Features

  • Automated Duplicate Linking with Zero False Positives: BibDedupe automates the duplicate linking process with a focus on eliminating false positives.
  • Preprocessing Approach: BibDedupe uses a preprocessing approach that reflects the unique error generation process in academic databases, such as author re-formatting, journal abbreviation or translations.
  • Entity Resolution: BibDedupe does not simply delete duplicates, but it links duplicates to resolve the entitity and integrates the data. This allows for validation, and undo operations.
  • Programmatic Access: BibDedupe is designed for seamless integration into existing research workflows, providing programmatic access for easy incorporation into scripts and applications.
  • Transparent and Reproducible Rules: BibDedupe's blocking and matching rules are transparent and easily reproducible to promote reproducibility in deduplication processes.
  • Continuous Benchmarking: Continuous integration tests running on GitHub Actions ensure ongoing benchmarking, maintaining the library's reliability and performance across datasets.
  • Efficient and Parallel Computation: BibDedupe implements computations efficiently and in parallel, using appropriate data structures and functions for optimal performance.

Documentation

Explore the official documentation for comprehensive information on installation, usage, and customization of BibDedupe.

Citation

If you use BibDedupe in your research, please cite it as follows:

Wagner, G. (2024) BibDedupe - An open-source Python library for deduplication of bibliographic records. Journal of Open Source Software, 9(97), 6318, https://doi.org/10.21105/joss.06318.

Contribution Guidelines

We welcome contributions from the community to enhance and expand BibDedupe. If you would like to contribute, please follow our contribution guidelines.

License

BibDedupe is released under the MIT License, allowing free and open use and modification.

Contact

For any questions, issues, or feedback, please open an issue on our GitHub repository.

Happy deduplicating with BibDedupe!

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

bib_dedupe-0.11.0.tar.gz (405.0 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

bib_dedupe-0.11.0-py3-none-any.whl (69.2 kB view details)

Uploaded Python 3

File details

Details for the file bib_dedupe-0.11.0.tar.gz.

File metadata

  • Download URL: bib_dedupe-0.11.0.tar.gz
  • Upload date:
  • Size: 405.0 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: uv/0.9.11 {"installer":{"name":"uv","version":"0.9.11"},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"Ubuntu","version":"24.04","id":"noble","libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":true}

File hashes

Hashes for bib_dedupe-0.11.0.tar.gz
Algorithm Hash digest
SHA256 d61c26575e5f054e5207d8cea933c112a8b9e0151a8265ec8e9d0969325a95bf
MD5 d65b09839a2b12f14216c2dcfe122d95
BLAKE2b-256 a827c17aed45c04a5a5c43662af47d88dd3a9615c0c5348187d03f4b51ed2f32

See more details on using hashes here.

File details

Details for the file bib_dedupe-0.11.0-py3-none-any.whl.

File metadata

  • Download URL: bib_dedupe-0.11.0-py3-none-any.whl
  • Upload date:
  • Size: 69.2 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: uv/0.9.11 {"installer":{"name":"uv","version":"0.9.11"},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"Ubuntu","version":"24.04","id":"noble","libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":true}

File hashes

Hashes for bib_dedupe-0.11.0-py3-none-any.whl
Algorithm Hash digest
SHA256 acc4f20f7ead74f88fba1a8e194e8cff8a79f6e99fbe5233c6aba38dda4a98e0
MD5 36c5616fa7d9cff400952994ec1a23fe
BLAKE2b-256 9ee75528c262f460baee7ce7a3121cc065339bde62bf1ad6f7aeabfab5ef4da7

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page