DESCRIPTION
Project description
ICDR
Contrastive Data Retrieval with Inverted Indexes
Efficient Approximate/Precise retrieval of similar documents for fine-tuning language models. The library can be used to quickly create contrastive pairs/triplets from large document collections.
ICDR builds an inverted index structure and several fast look-up tables with the aim of retrieving similar texts from a corpus. The library is ideal for efficient entity matching, entity resolution, record linkage, and deduplication applications in the NLP realm. ICDR allows for very fast retrieval of similar, positive (i.e. matching), and negative (i.e. non-matching) text samples which can be used either directly, or to fine-tune LLMs and other models.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file icdr-0.0.11.tar.gz.
File metadata
- Download URL: icdr-0.0.11.tar.gz
- Upload date:
- Size: 997.2 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.12.2
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
5612cca1de8df1e4ec16073490f2885b8fae48718f02bba301dff57214b6b83c
|
|
| MD5 |
ae706a82c0c091c4de0dc85681bc8389
|
|
| BLAKE2b-256 |
30664263dbc9a753fe0d1375a7963164afa68a0a97108f7358ea61af70c7166a
|
File details
Details for the file icdr-0.0.11-py3-none-any.whl.
File metadata
- Download URL: icdr-0.0.11-py3-none-any.whl
- Upload date:
- Size: 1.0 MB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.12.2
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
04f2c6cbdfad670870179ee8d828736e8cf91535c5f72b17a0936d8842c0e1d6
|
|
| MD5 |
1bd4c0257fd4abe0772352401d467db8
|
|
| BLAKE2b-256 |
b6f145ab3a5e81be3e9aeafb314b51983992a3c93a448e8d29c981a0c576905f
|