Identify and merge duplicates in bibliographic records
Project description
Overview
BibDedupe is an open-source Python library for deduplication of bibliographic records, tailored for literature reviews. Unlike traditional deduplication methods, BibDedupe focuses on entity resolution, linking duplicate records instead of simply deleting them.
Features
- Automated Duplicate Linking with Zero False Positives: BibDedupe automates the duplicate linking process with a focus on eliminating false positives.
- Preprocessing Approach: BibDedupe uses a preprocessing approach that reflects the unique error generation process in academic databases, such as author re-formatting, journal abbreviation or translations.
- Entity Resolution: BibDedupe does not simply delete duplicates, but it links duplicates to resolve the entitity and integrates the data. This allows for validation, and undo operations.
- Programmatic Access: BibDedupe is designed for seamless integration into existing research workflows, providing programmatic access for easy incorporation into scripts and applications.
- Transparent and Reproducible Rules: BibDedupe's blocking and matching rules are transparent and easily reproducible to promote reproducibility in deduplication processes.
- Continuous Benchmarking: Continuous integration tests running on GitHub Actions ensure ongoing benchmarking, maintaining the library's reliability and performance across datasets.
- Efficient and Parallel Computation: BibDedupe implements computations efficiently and in parallel, using appropriate data structures and functions for optimal performance.
Documentation
Explore the official documentation for comprehensive information on installation, usage, and customization of BibDedupe.
Citation
If you use BibDedupe in your research, please cite it as follows:
Wagner, G. (2024) BibDedupe - An open-source Python library for deduplication of bibliographic records. Journal of Open Source Software, 9(97), 6318, https://doi.org/10.21105/joss.06318.
Contribution Guidelines
We welcome contributions from the community to enhance and expand BibDedupe. If you would like to contribute, please follow our contribution guidelines.
License
BibDedupe is released under the MIT License, allowing free and open use and modification.
Contact
For any questions, issues, or feedback, please open an issue on our GitHub repository.
Happy deduplicating with BibDedupe!
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
File details
Details for the file bib_dedupe-0.9.0.tar.gz
.
File metadata
- Download URL: bib_dedupe-0.9.0.tar.gz
- Upload date:
- Size: 64.4 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: poetry/1.8.4 CPython/3.10.12 Linux/6.5.0-1025-azure
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 10c1e59c50290b05e612a08bc1565d49a2e362bbbd346f0bb6d190e6a8c8f99c |
|
MD5 | c202f816fe15d188d8484e346a56b682 |
|
BLAKE2b-256 | defa1a5b2e9fdea4024a588de95fbedea325d04efcb7fa5a01b616def27af322 |
File details
Details for the file bib_dedupe-0.9.0-py3-none-any.whl
.
File metadata
- Download URL: bib_dedupe-0.9.0-py3-none-any.whl
- Upload date:
- Size: 71.7 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: poetry/1.8.4 CPython/3.10.12 Linux/6.5.0-1025-azure
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 88cb1e89285f1a0c865bc3dc97d6e82be6ad9d415f6f3fee9875f373322945b1 |
|
MD5 | 5a105e01e67575d87bbca5c28498f570 |
|
BLAKE2b-256 | 55244e81410fff7fbd73243d693167197fcc2111f8758ad4ac3f0d35bd02b98d |