marin-dupekit

Optimized code for text de-duplication, written in Rust

These details have been verified by PyPI

Project links

Owner

Marin Community

GitHub Statistics

Maintainers

alxmrs WillHeld

These details have not been verified by PyPI

Project description

dupekit

Raison d'être: Home for the Rust code used for text deduplication.

Install

Locally: This code is auto-magically built by uv via Cargo and Maturin. You might need to install them (e.g., brew install maturin rust on macOS).
Cluster: This code is compiled as part of the Docker build (uv pip install -e ... step): Maturin builds the Rust code and places it in the system site-packages (e.g., /home/ray/anaconda3/lib/python3.12/site-packages/dupekit/dupekit.abi3.so).

[!NOTE] What about making dupekit a hybrid Python/Rust Maturin workspace? We tried and experienced issues getting the Docker build to work while keeping it simple—a simple Rust workspace helps keep the setup clean.

[!NOTE] Building from source requires a Rust toolchain (Cargo). Pre-built wheels are available from GitHub Releases for users who don't want to compile locally.

Benchmarking

The goal of these benchmarks is to test different ways of marshaling large text content between Python and Rust "foreign function interface" (wiki:FFI). These tests are designed to isolate the overhead of marshaling from the actual Rust computation (by doing minimal processing in Rust).

Dataset: 1 shard of HuggingFaceFW/fineweb-edu/sample/10BT (2.15 GB Parquet file, benchmarked on 250k out of 726k documents)

Install:

uv sync --all-packages --extra=benchmark --group dev

Benchmark (Takes a few minutes):

uv run pytest lib/dupekit/tests/bench/test_dedupe.py --run-benchmark --benchmark-min-rounds=20
uv run pytest lib/dupekit/tests/bench/test_marshaling.py --run-benchmark
uv run pytest lib/dupekit/tests/bench/test_batch_tuning.py --run-benchmark
uv run pytest lib/dupekit/tests/bench/test_io.py --run-benchmark
uv run pytest lib/dupekit/tests/bench/test_hashing.py --run-benchmark
uv run pytest lib/dupekit/tests/bench/test_minhash.py --run-benchmark

Note: Run separated by type of benchmark (otherwise results are mixed within one table)

Footprint (Note: sampling the stack might taint the mem measurements, so we disable benchmarking):

uv run pytest lib/dupekit/tests/bench/test_marshaling.py \
  --run-benchmark \
  --benchmark-disable \
  --memray \
  --native \
  --most-allocations=0

Results

Dedup: Rust vs. Python

---------------------------------------------------------------------------- benchmark 'Documents: Exact Deduplication': 2 tests ----------------------------------------------------------------------------
Name (time in ms)                             Min                 Max                Mean            StdDev              Median               IQR            Outliers       OPS            Rounds  Iterations
-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
test_deduplication[rust-documents]         3.9872 (1.0)        5.2516 (1.0)        4.2341 (1.0)      0.1949 (1.0)        4.2247 (1.0)      0.2845 (1.0)          52;2  236.1805 (1.0)         188           1
test_deduplication[python-documents]     133.8747 (33.58)    157.3844 (29.97)    139.6233 (32.98)    7.7842 (39.94)    135.3300 (32.03)    9.6717 (34.00)         5;0    7.1621 (0.03)         20           1
-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------

--------------------------------------------------------------------------- benchmark 'Documents: Hash Generation': 2 tests ---------------------------------------------------------------------------
Name (time in ms)                       Min                 Max                Mean            StdDev              Median               IQR            Outliers       OPS            Rounds  Iterations
-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
test_hashing[rust-documents]         2.2755 (1.0)        2.4842 (1.0)        2.3041 (1.0)      0.0301 (1.0)        2.2938 (1.0)      0.0381 (1.0)          50;9  434.0169 (1.0)         375           1
test_hashing[python-documents]     130.0445 (57.15)    132.3783 (53.29)    130.7795 (56.76)    0.6663 (22.13)    130.5910 (56.93)    0.6259 (16.44)         5;3    7.6465 (0.02)         20           1
-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------

----------------------------------------------------------------------------- benchmark 'Paragraphs: Exact Deduplication': 2 tests ----------------------------------------------------------------------------
Name (time in ms)                              Min                 Max                Mean             StdDev              Median                IQR            Outliers      OPS            Rounds  Iterations
---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
test_deduplication[rust-paragraphs]        85.4666 (1.0)      109.3652 (1.0)       90.2916 (1.0)       6.8294 (1.0)       87.3405 (1.0)       2.0275 (1.0)           4;4  11.0752 (1.0)          20           1
test_deduplication[python-paragraphs]     303.0885 (3.55)     342.9836 (3.14)     321.3022 (3.56)     13.8377 (2.03)     329.4886 (3.77)     25.1111 (12.39)         9;0   3.1123 (0.28)         20           1
---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------

--------------------------------------------------------------------------- benchmark 'Paragraphs: Hash Generation': 2 tests ---------------------------------------------------------------------------
Name (time in ms)                        Min                 Max                Mean             StdDev              Median               IQR            Outliers      OPS            Rounds  Iterations
--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
test_hashing[rust-paragraphs]        23.9739 (1.0)       26.9860 (1.0)       25.3823 (1.0)       0.4419 (1.0)       25.3160 (1.0)      0.2099 (1.0)           5;5  39.3975 (1.0)          38           1
test_hashing[python-paragraphs]     247.5415 (10.33)    321.4654 (11.91)    255.3421 (10.06)    19.0653 (43.15)    249.1948 (9.84)     1.7899 (8.53)          2;2   3.9163 (0.10)         20           1
--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------

Marshaling

-------------------------------------------------------------------------------------------- benchmark: 7 tests -------------------------------------------------------------------------------------------
Name (time in ms)                    Min                   Max                  Mean             StdDev                Median                IQR            Outliers      OPS            Rounds  Iterations
-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
test_arrow_giant                 86.4414 (1.0)         96.0537 (1.01)        90.0259 (1.0)       2.8582 (31.54)       90.4363 (1.0)       4.1787 (27.96)         3;0  11.1079 (1.0)          11           1
test_arrow_small                 94.4010 (1.09)        94.6679 (1.0)         94.5616 (1.05)      0.0906 (1.0)         94.5570 (1.05)      0.1494 (1.0)           5;0  10.5751 (0.95)         11           1
test_dicts_batched_stream     3,975.1581 (45.99)    3,979.7102 (42.04)    3,977.7639 (44.18)     1.8357 (20.26)    3,978.3399 (43.99)     2.8370 (18.98)         2;0   0.2514 (0.02)          5           1
test_dicts_batch              4,398.7191 (50.89)    4,421.9632 (46.71)    4,410.0489 (48.99)     8.7694 (96.78)    4,411.2232 (48.78)    12.0295 (80.50)         2;0   0.2268 (0.02)          5           1
test_dicts_loop               4,411.8727 (51.04)    4,457.0985 (47.08)    4,431.9081 (49.23)    19.8323 (218.86)   4,430.5465 (48.99)    35.6846 (238.78)        2;0   0.2256 (0.02)          5           1
test_rust_structs             4,449.5728 (51.47)    4,479.8173 (47.32)    4,465.2999 (49.60)    14.1041 (155.65)   4,472.5336 (49.46)    24.8971 (166.60)        3;0   0.2239 (0.02)          5           1
test_arrow_tiny               7,023.5789 (81.25)    7,064.2094 (74.62)    7,044.9691 (78.25)    19.4414 (214.55)   7,047.1538 (77.92)    37.8036 (252.96)        1;0   0.1419 (0.01)          5           1
-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------

PyArrow Batch Size

--------------------------------------------------------------------------------------------- benchmark: 11 tests ----------------------------------------------------------------------------------------------
Name (time in ms)                         Min                   Max                  Mean             StdDev                Median                IQR            Outliers      OPS            Rounds  Iterations
----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
test_arrow_batch_sizes[8192]          28.6030 (1.0)         32.0178 (1.07)        29.2802 (1.0)       0.8846 (4.75)        28.9333 (1.0)       0.7970 (3.52)          5;3  34.1528 (1.0)          34           1
test_arrow_batch_sizes[16384]         28.7303 (1.00)        30.8987 (1.03)        29.3111 (1.00)      0.5447 (2.92)        29.1404 (1.01)      0.5907 (2.61)          9;2  34.1168 (1.00)         33           1
test_arrow_batch_sizes[4096]          28.8488 (1.01)        30.1474 (1.01)        29.2876 (1.00)      0.3776 (2.03)        29.2212 (1.01)      0.6339 (2.80)         12;0  34.1441 (1.00)         34           1
test_arrow_batch_sizes[2048]          29.1493 (1.02)        30.4442 (1.02)        29.5710 (1.01)      0.3013 (1.62)        29.5505 (1.02)      0.3483 (1.54)         10;1  33.8169 (0.99)         32           1
test_arrow_batch_sizes[32768]         29.2200 (1.02)        29.9410 (1.0)         29.5896 (1.01)      0.1863 (1.0)         29.5706 (1.02)      0.2423 (1.07)         11;0  33.7956 (0.99)         34           1
test_arrow_batch_sizes[65536]         30.3973 (1.06)        31.3805 (1.05)        30.9409 (1.06)      0.2453 (1.32)        30.9829 (1.07)      0.2263 (1.0)           9;3  32.3197 (0.95)         33           1
test_arrow_batch_sizes[131072]        30.7074 (1.07)        33.1845 (1.11)        31.4322 (1.07)      0.6799 (3.65)        31.1102 (1.08)      0.8739 (3.86)          6;1  31.8145 (0.93)         32           1
test_arrow_batch_sizes[1024]          30.7724 (1.08)        32.6049 (1.09)        31.6173 (1.08)      0.5506 (2.96)        31.6311 (1.09)      0.9233 (4.08)         13;0  31.6283 (0.93)         30           1
test_arrow_batch_sizes[512]           33.8866 (1.18)        36.2981 (1.21)        34.5224 (1.18)      0.6189 (3.32)        34.2960 (1.19)      0.5087 (2.25)          6;3  28.9667 (0.85)         29           1
test_arrow_batch_sizes[128]           51.0530 (1.78)        56.3190 (1.88)        53.5492 (1.83)      1.6124 (8.65)        53.7474 (1.86)      2.3557 (10.41)         7;0  18.6744 (0.55)         18           1
test_arrow_batch_sizes[1]          2,781.2088 (97.23)    2,812.2547 (93.93)    2,797.8572 (95.55)    11.6892 (62.74)    2,801.0024 (96.81)    15.3956 (68.03)         2;0   0.3574 (0.01)          5           1
----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------

I/O

------------------------------------------------------------------------------- benchmark: 4 tests ------------------------------------------------------------------------------
Name (time in s)          Min               Max              Mean            StdDev            Median               IQR            Outliers     OPS            Rounds  Iterations
---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
test_rust_native       1.6757 (1.0)      1.6848 (1.0)      1.6794 (1.0)      0.0035 (1.26)     1.6783 (1.0)      0.0047 (1.73)          2;0  0.5955 (1.0)           5           1
test_arrow_giant       2.9501 (1.76)     2.9570 (1.76)     2.9521 (1.76)     0.0028 (1.0)      2.9511 (1.76)     0.0027 (1.0)           1;0  0.3387 (0.57)          5           1
test_arrow_small       3.3476 (2.00)     3.6588 (2.17)     3.5583 (2.12)     0.1241 (44.48)    3.5726 (2.13)     0.1289 (47.18)         1;0  0.2810 (0.47)          5           1
test_dicts_loop_io     7.3664 (4.40)     7.3913 (4.39)     7.3837 (4.40)     0.0101 (3.63)     7.3871 (4.40)     0.0113 (4.14)          1;0  0.1354 (0.23)          5           1
---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------

Hashing

--------------------------------------------------------------------------------------- benchmark: 6 tests ---------------------------------------------------------------------------------------
Name (time in ms)                     Min                Max               Mean            StdDev             Median               IQR            Outliers       OPS            Rounds  Iterations
--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
test_hash_rust_xxh3_64_batch       4.4886 (1.0)       4.9466 (1.0)       4.5860 (1.0)      0.0616 (1.57)      4.5939 (1.0)      0.0957 (2.52)         74;1  218.0558 (1.0)         210           1
test_hash_rust_xxh3_64_scalar      5.0276 (1.12)      5.3367 (1.08)      5.1276 (1.12)     0.0393 (1.0)       5.1307 (1.12)     0.0379 (1.0)         41;12  195.0244 (0.89)        190           1
test_hash_rust_xxh3_128            6.1686 (1.37)      6.5772 (1.33)      6.2901 (1.37)     0.1098 (2.79)      6.2334 (1.36)     0.1731 (4.56)         37;0  158.9811 (0.73)        160           1
test_hash_rust_blake3             28.7743 (6.41)     29.0392 (5.87)     28.8919 (6.30)     0.0593 (1.51)     28.8799 (6.29)     0.0709 (1.87)         10;1   34.6118 (0.16)         35           1
test_hash_rust_blake2             54.1043 (12.05)    55.0271 (11.12)    54.4180 (11.87)    0.3711 (9.43)     54.1916 (11.80)    0.7337 (19.34)         5;0   18.3763 (0.08)         19           1
test_hash_python_blake2b          84.0109 (18.72)    84.1698 (17.02)    84.0611 (18.33)    0.0465 (1.18)     84.0469 (18.30)    0.0595 (1.57)          3;0   11.8961 (0.05)         12           1
--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------

Mem Footprin (sorted from high to low):

Allocation results for lib/dupekit/tests/bench/test_marshaling.py::test_rust_structs at the high watermark

	 📦 Total memory allocated: 4.3GiB
	 📏 Total allocations: 21
	 📊 Histogram of allocation sizes: | ▃█▁▃|

Allocation results for lib/dupekit/tests/bench/test_marshaling.py::test_dicts_batch at the high watermark

	 📦 Total memory allocated: 3.3GiB
	 📏 Total allocations: 20
	 📊 Histogram of allocation sizes: |  ▁█▂|

Allocation results for lib/dupekit/tests/bench/test_marshaling.py::test_dicts_loop at the high watermark

	 📦 Total memory allocated: 3.3GiB
	 📏 Total allocations: 19
	 📊 Histogram of allocation sizes: |  ▁█▂|

Allocation results for lib/dupekit/tests/bench/test_marshaling.py::test_arrow_giant at the high watermark

	 📦 Total memory allocated: 64.9MiB
	 📏 Total allocations: 36
	 📊 Histogram of allocation sizes: |▅█   |

Allocation results for lib/dupekit/tests/bench/test_marshaling.py::test_dicts_batched_stream at the high watermark

	 📦 Total memory allocated: 28.1MiB
	 📏 Total allocations: 7
	 📊 Histogram of allocation sizes: |█▄▄▄▄|

Allocation results for lib/dupekit/tests/bench/test_marshaling.py::test_arrow_tiny at the high watermark

	 📦 Total memory allocated: 22.0MiB
	 📏 Total allocations: 37
	 📊 Histogram of allocation sizes: |█▇   |

Allocation results for lib/dupekit/tests/bench/test_marshaling.py::test_arrow_small at the high watermark

	 📦 Total memory allocated: 551.7KiB
	 📏 Total allocations: 42
	 📊 Histogram of allocation sizes: |▂█▁  |

Statement of attribution:

This code was seeded from nelson-liu/rbloom-gcs.
Bloom filters were originally proposed in (Bloom, 1970). Furthermore, this implementation makes use of a constant recommended by (L'Ecuyer, 1999) for redistributing the entropy of a single hash over multiple integers using a linear congruential generator.

Project details

These details have been verified by PyPI

Project links

Owner

Marin Community

GitHub Statistics

Maintainers

alxmrs WillHeld

These details have not been verified by PyPI

Release history Release notifications | RSS feed

This version

0.1.2.dev202606260851 pre-release

Jun 26, 2026

0.1.2.dev202606191013 pre-release

Jun 19, 2026

0.1.2.dev202606111008 pre-release

Jun 11, 2026

0.1.2.dev202606081019 pre-release

Jun 8, 2026

0.1.2.dev202606070845 pre-release

Jun 7, 2026

0.1.2.dev202606060824 pre-release

Jun 6, 2026

0.1.2.dev202605310835 pre-release

May 31, 2026

0.1.2.dev202605300817 pre-release

May 30, 2026

0.1.2.dev202605080717 pre-release

May 8, 2026

0.1.2.dev202605072146 pre-release

May 7, 2026

0.1.1

May 7, 2026

0.1.0

May 7, 2026

0.0.1

May 6, 2026

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

marin_dupekit-0.1.2.dev202606260851.tar.gz (29.9 kB view details)

Uploaded Jun 26, 2026 Source

Built Distributions

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

marin_dupekit-0.1.2.dev202606260851-cp312-abi3-manylinux_2_28_x86_64.whl (4.8 MB view details)

Uploaded Jun 26, 2026 CPython 3.12+manylinux: glibc 2.28+ x86-64

marin_dupekit-0.1.2.dev202606260851-cp312-abi3-manylinux_2_28_aarch64.whl (4.4 MB view details)

Uploaded Jun 26, 2026 CPython 3.12+manylinux: glibc 2.28+ ARM64

marin_dupekit-0.1.2.dev202606260851-cp312-abi3-macosx_11_0_arm64.whl (4.2 MB view details)

Uploaded Jun 26, 2026 CPython 3.12+macOS 11.0+ ARM64

marin_dupekit-0.1.2.dev202606260851-cp312-abi3-macosx_10_12_x86_64.whl (4.6 MB view details)

Uploaded Jun 26, 2026 CPython 3.12+macOS 10.12+ x86-64

File details

Details for the file marin_dupekit-0.1.2.dev202606260851.tar.gz.

File metadata

Download URL: marin_dupekit-0.1.2.dev202606260851.tar.gz
Upload date: Jun 26, 2026
Size: 29.9 kB
Tags: Source
Uploaded using Trusted Publishing? Yes
Uploaded via: twine/6.1.0 CPython/3.13.13

File hashes

Hashes for marin_dupekit-0.1.2.dev202606260851.tar.gz
Algorithm	Hash digest
SHA256	`c8ed8d4366d6c20cd2a6b7ef8b5743dc5dbf49e4029dd4709d4e4184df9a7426`
MD5	`e79148ef3ae1b3293553627fa8108ced`
BLAKE2b-256	`94aa622804c90bada2f7d94c235cd6001d70c83f15d992886a62fb20923e2bfc`

See more details on using hashes here.

Provenance

The following attestation bundles were made for marin_dupekit-0.1.2.dev202606260851.tar.gz:

Publisher: dupekit-release-wheels.yaml on marin-community/marin

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Statement:
- Statement type: https://in-toto.io/Statement/v1
- Predicate type: https://docs.pypi.org/attestations/publish/v1
- Subject name: marin_dupekit-0.1.2.dev202606260851.tar.gz
- Subject digest: c8ed8d4366d6c20cd2a6b7ef8b5743dc5dbf49e4029dd4709d4e4184df9a7426
- Sigstore transparency entry: 1965137609
- Sigstore integration time: Jun 26, 2026
Source repository:
- Permalink: marin-community/marin@1f688921a0189b019a26bbb18f1a737d256d8845
- Branch / Tag: refs/heads/main
- Owner: https://github.com/marin-community
- Access: public
Publication detail:
- Token Issuer: https://token.actions.githubusercontent.com
- Runner Environment: github-hosted
- Publication workflow: dupekit-release-wheels.yaml@1f688921a0189b019a26bbb18f1a737d256d8845
- Trigger Event: schedule

File details

Details for the file marin_dupekit-0.1.2.dev202606260851-cp312-abi3-manylinux_2_28_x86_64.whl.

File metadata

Download URL: marin_dupekit-0.1.2.dev202606260851-cp312-abi3-manylinux_2_28_x86_64.whl
Upload date: Jun 26, 2026
Size: 4.8 MB
Tags: CPython 3.12+, manylinux: glibc 2.28+ x86-64
Uploaded using Trusted Publishing? Yes
Uploaded via: twine/6.1.0 CPython/3.13.13

File hashes

Hashes for marin_dupekit-0.1.2.dev202606260851-cp312-abi3-manylinux_2_28_x86_64.whl
Algorithm	Hash digest
SHA256	`75e2c283cb95c2e05fa48222b0e928208dff2bd3369a6b41271be9865ae85bec`
MD5	`f5ec92dc636f585bc4b92479043bf226`
BLAKE2b-256	`f81c3080673801ac06db971d037ecb22925f6c2254063c376d2bb527da2ddd20`

See more details on using hashes here.

Provenance

The following attestation bundles were made for marin_dupekit-0.1.2.dev202606260851-cp312-abi3-manylinux_2_28_x86_64.whl:

Publisher: dupekit-release-wheels.yaml on marin-community/marin

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Statement:
- Statement type: https://in-toto.io/Statement/v1
- Predicate type: https://docs.pypi.org/attestations/publish/v1
- Subject name: marin_dupekit-0.1.2.dev202606260851-cp312-abi3-manylinux_2_28_x86_64.whl
- Subject digest: 75e2c283cb95c2e05fa48222b0e928208dff2bd3369a6b41271be9865ae85bec
- Sigstore transparency entry: 1965138891
- Sigstore integration time: Jun 26, 2026
Source repository:
- Permalink: marin-community/marin@1f688921a0189b019a26bbb18f1a737d256d8845
- Branch / Tag: refs/heads/main
- Owner: https://github.com/marin-community
- Access: public
Publication detail:
- Token Issuer: https://token.actions.githubusercontent.com
- Runner Environment: github-hosted
- Publication workflow: dupekit-release-wheels.yaml@1f688921a0189b019a26bbb18f1a737d256d8845
- Trigger Event: schedule

File details

Details for the file marin_dupekit-0.1.2.dev202606260851-cp312-abi3-manylinux_2_28_aarch64.whl.

File metadata

Download URL: marin_dupekit-0.1.2.dev202606260851-cp312-abi3-manylinux_2_28_aarch64.whl
Upload date: Jun 26, 2026
Size: 4.4 MB
Tags: CPython 3.12+, manylinux: glibc 2.28+ ARM64
Uploaded using Trusted Publishing? Yes
Uploaded via: twine/6.1.0 CPython/3.13.13

File hashes

Hashes for marin_dupekit-0.1.2.dev202606260851-cp312-abi3-manylinux_2_28_aarch64.whl
Algorithm	Hash digest
SHA256	`bee351c5ffa210551472e596d2d91d0bdd8aca89c7daf4e2b166d485f1dfa405`
MD5	`36fd12b46d1b02f326f895caba800beb`
BLAKE2b-256	`1c83412c88864a31942490c1ae63443f2d8ee1e869f0bcb3933bbf5c7968638d`

See more details on using hashes here.

Provenance

The following attestation bundles were made for marin_dupekit-0.1.2.dev202606260851-cp312-abi3-manylinux_2_28_aarch64.whl:

Publisher: dupekit-release-wheels.yaml on marin-community/marin

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Statement:
- Statement type: https://in-toto.io/Statement/v1
- Predicate type: https://docs.pypi.org/attestations/publish/v1
- Subject name: marin_dupekit-0.1.2.dev202606260851-cp312-abi3-manylinux_2_28_aarch64.whl
- Subject digest: bee351c5ffa210551472e596d2d91d0bdd8aca89c7daf4e2b166d485f1dfa405
- Sigstore transparency entry: 1965137955
- Sigstore integration time: Jun 26, 2026
Source repository:
- Permalink: marin-community/marin@1f688921a0189b019a26bbb18f1a737d256d8845
- Branch / Tag: refs/heads/main
- Owner: https://github.com/marin-community
- Access: public
Publication detail:
- Token Issuer: https://token.actions.githubusercontent.com
- Runner Environment: github-hosted
- Publication workflow: dupekit-release-wheels.yaml@1f688921a0189b019a26bbb18f1a737d256d8845
- Trigger Event: schedule

File details

Details for the file marin_dupekit-0.1.2.dev202606260851-cp312-abi3-macosx_11_0_arm64.whl.

File metadata

Download URL: marin_dupekit-0.1.2.dev202606260851-cp312-abi3-macosx_11_0_arm64.whl
Upload date: Jun 26, 2026
Size: 4.2 MB
Tags: CPython 3.12+, macOS 11.0+ ARM64
Uploaded using Trusted Publishing? Yes
Uploaded via: twine/6.1.0 CPython/3.13.13

File hashes

Hashes for marin_dupekit-0.1.2.dev202606260851-cp312-abi3-macosx_11_0_arm64.whl
Algorithm	Hash digest
SHA256	`e3f4b58489121de314cc329d64e18e529687aeddba6eb102f41adac001ce7858`
MD5	`18c3ddda37eb11ba1efea8e69af6ceaf`
BLAKE2b-256	`5d375ad376afc0afbf0045446b31e9f072acd3c7e3a275fc614b1953c7421c2e`

See more details on using hashes here.

Provenance

The following attestation bundles were made for marin_dupekit-0.1.2.dev202606260851-cp312-abi3-macosx_11_0_arm64.whl:

Publisher: dupekit-release-wheels.yaml on marin-community/marin

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Statement:
- Statement type: https://in-toto.io/Statement/v1
- Predicate type: https://docs.pypi.org/attestations/publish/v1
- Subject name: marin_dupekit-0.1.2.dev202606260851-cp312-abi3-macosx_11_0_arm64.whl
- Subject digest: e3f4b58489121de314cc329d64e18e529687aeddba6eb102f41adac001ce7858
- Sigstore transparency entry: 1965138274
- Sigstore integration time: Jun 26, 2026
Source repository:
- Permalink: marin-community/marin@1f688921a0189b019a26bbb18f1a737d256d8845
- Branch / Tag: refs/heads/main
- Owner: https://github.com/marin-community
- Access: public
Publication detail:
- Token Issuer: https://token.actions.githubusercontent.com
- Runner Environment: github-hosted
- Publication workflow: dupekit-release-wheels.yaml@1f688921a0189b019a26bbb18f1a737d256d8845
- Trigger Event: schedule

File details

Details for the file marin_dupekit-0.1.2.dev202606260851-cp312-abi3-macosx_10_12_x86_64.whl.

File metadata

Download URL: marin_dupekit-0.1.2.dev202606260851-cp312-abi3-macosx_10_12_x86_64.whl
Upload date: Jun 26, 2026
Size: 4.6 MB
Tags: CPython 3.12+, macOS 10.12+ x86-64
Uploaded using Trusted Publishing? Yes
Uploaded via: twine/6.1.0 CPython/3.13.13

File hashes

Hashes for marin_dupekit-0.1.2.dev202606260851-cp312-abi3-macosx_10_12_x86_64.whl
Algorithm	Hash digest
SHA256	`25fb016c5dcc799fac1807d9d80976999849b4845f378fe20742bd47d3334c93`
MD5	`f12d6224150d6fe8e39e4a28aaa88794`
BLAKE2b-256	`f1d537fc9c9fd41e353af97326aee544bcbd0ad490e5da90e0fc14f8f026ef1d`

See more details on using hashes here.

Provenance

The following attestation bundles were made for marin_dupekit-0.1.2.dev202606260851-cp312-abi3-macosx_10_12_x86_64.whl:

Publisher: dupekit-release-wheels.yaml on marin-community/marin

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Statement:
- Statement type: https://in-toto.io/Statement/v1
- Predicate type: https://docs.pypi.org/attestations/publish/v1
- Subject name: marin_dupekit-0.1.2.dev202606260851-cp312-abi3-macosx_10_12_x86_64.whl
- Subject digest: 25fb016c5dcc799fac1807d9d80976999849b4845f378fe20742bd47d3334c93
- Sigstore transparency entry: 1965138545
- Sigstore integration time: Jun 26, 2026
Source repository:
- Permalink: marin-community/marin@1f688921a0189b019a26bbb18f1a737d256d8845
- Branch / Tag: refs/heads/main
- Owner: https://github.com/marin-community
- Access: public
Publication detail:
- Token Issuer: https://token.actions.githubusercontent.com
- Runner Environment: github-hosted
- Publication workflow: dupekit-release-wheels.yaml@1f688921a0189b019a26bbb18f1a737d256d8845
- Trigger Event: schedule

marin-dupekit 0.1.2.dev202606260851

Navigation

Verified details

Project links

Owner

GitHub Statistics

Maintainers

Unverified details

Meta

Classifiers

Project description

dupekit

Install

Benchmarking

Results

Dedup: Rust vs. Python

Marshaling

PyArrow Batch Size

I/O

Hashing

Project details

Verified details

Project links

Owner

GitHub Statistics

Maintainers

Unverified details

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distributions

File details

File metadata

File hashes

Provenance

File details

File metadata

File hashes

Provenance

File details

File metadata

File hashes

Provenance

File details

File metadata

File hashes

Provenance

File details

File metadata

File hashes

Provenance