Skip to main content

Polars plugin exposing PlHashSet for efficient filtering with persistent sets across LazyFrames

Project description

polars-hashfilter

A Polars plugin exposing PlHashSet for efficient filtering with persistent sets across LazyFrames.

Features

  • Zero-copy StringHashSet: A Python-accessible wrapper around Polars' PlHashSet<String> that can be shared between Python and Rust without copying the underlying data
  • Persistent filtering: Use the same hashset across multiple LazyFrames for deduplication
  • Three filtering expressions:
    • is_in: Check if values exist in the set
    • not_in: Check if values do NOT exist in the set
    • not_in_and_update: Check if values are NOT in the set, then add them (anti-join pattern)

Installation

# From source with uv
uv pip install .

# Development mode
just dev

Usage

Basic Example

import polars as pl
from polars_hashfilter import StringHashSet

# Create a persistent set
seen = StringHashSet.from_values(["alice", "bob"])

# Filter using the set
df = pl.DataFrame({"user": ["alice", "charlie", "bob", "dave"]})

# Using standalone functions
from polars_hashfilter import is_in_hashset, not_in_hashset

df.filter(is_in_hashset(pl.col("user"), seen))
# shape: (2, 1)
# ┌───────┐
# │ user  │
# │ ---   │
# │ str   │
# ╞═══════╡
# │ alice │
# │ bob   │
# └───────┘

# Using expression namespace
df.filter(pl.col("user").hashfilter.not_in(seen))
# shape: (2, 1)
# ┌─────────┐
# │ user    │
# │ ---     │
# │ str     │
# ╞═════════╡
# │ charlie │
# │ dave    │
# └─────────┘

Deduplication Across Multiple LazyFrames (Anti-Join Pattern)

This is the primary use case - efficiently deduplicate records across many large LazyFrames:

import polars as pl
from polars_hashfilter import StringHashSet

# Create a persistent set to track seen IDs
seen = StringHashSet()

# Process multiple LazyFrames, keeping only new records
lazy_frames = [
    pl.LazyFrame({"id": ["a", "b"], "value": [1, 2]}),
    pl.LazyFrame({"id": ["b", "c"], "value": [3, 4]}),
    pl.LazyFrame({"id": ["c", "d"], "value": [5, 6]}),
]

for lf in lazy_frames:
    # Keep only rows we haven't seen before, and remember them
    df = lf.filter(pl.col("id").hashfilter.not_in_and_update(seen)).collect()
    print(df)
    # First:  id=["a", "b"], value=[1, 2]
    # Second: id=["c"],      value=[4]      (b already seen)
    # Third:  id=["d"],      value=[6]      (c already seen)

# The set now contains all unique IDs
print(seen.to_list())  # ["a", "b", "c", "d"]

StringHashSet API

from polars_hashfilter import StringHashSet

# Creation
s = StringHashSet()                      # Empty set
s = StringHashSet.with_capacity(1000)    # Pre-allocated
s = StringHashSet.from_values(["a", "b"])  # From iterable

# Operations
s.insert("value")      # Insert, returns True if new
s.contains("value")    # Check membership
s.remove("value")      # Remove, returns True if existed
s.extend(["x", "y"])   # Bulk insert
s.clear()              # Remove all elements

# Inspection
len(s)                 # Number of elements
s.is_empty()           # Check if empty
s.to_list()            # Export as Python list (copies data)

# Debug
s._ptr()               # Memory address (for verifying zero-copy)

Zero-Copy Guarantee

The StringHashSet is stored behind Arc<RwLock>, meaning:

  1. No copies when passing to expressions: The set's memory address remains stable
  2. Thread-safe: Multiple readers OR one writer at a time
  3. Copies only when necessary:
    • StringHashSet.from_values() - copying Python strings to Rust
    • StringHashSet.extend() - copying Python strings to Rust
    • StringHashSet.to_list() - copying Rust strings to Python

You can verify zero-copy behavior:

seen = StringHashSet()
ptr1 = seen._ptr()

# Use in many expressions...
df.filter(pl.col("id").hashfilter.not_in_and_update(seen))

ptr2 = seen._ptr()
assert ptr1 == ptr2  # Same memory address

Development

# Setup
just venv

# Build (debug)
just dev

# Build (release)
just release

# Test
just test

# Format
just fmt

# Lint
just lint

License

MIT

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

polars_hashfilter-0.1.0.tar.gz (51.8 kB view details)

Uploaded Source

Built Distributions

If you're not sure about the file name format, learn more about wheel file names.

polars_hashfilter-0.1.0-cp314-cp314-win_amd64.whl (4.4 MB view details)

Uploaded CPython 3.14Windows x86-64

polars_hashfilter-0.1.0-cp314-cp314-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (4.4 MB view details)

Uploaded CPython 3.14manylinux: glibc 2.17+ x86-64

polars_hashfilter-0.1.0-cp314-cp314-macosx_11_0_arm64.whl (4.0 MB view details)

Uploaded CPython 3.14macOS 11.0+ ARM64

polars_hashfilter-0.1.0-cp313-cp313-win_amd64.whl (4.4 MB view details)

Uploaded CPython 3.13Windows x86-64

polars_hashfilter-0.1.0-cp313-cp313-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (4.4 MB view details)

Uploaded CPython 3.13manylinux: glibc 2.17+ x86-64

polars_hashfilter-0.1.0-cp313-cp313-macosx_11_0_arm64.whl (4.0 MB view details)

Uploaded CPython 3.13macOS 11.0+ ARM64

polars_hashfilter-0.1.0-cp312-cp312-win_amd64.whl (4.4 MB view details)

Uploaded CPython 3.12Windows x86-64

polars_hashfilter-0.1.0-cp312-cp312-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (4.4 MB view details)

Uploaded CPython 3.12manylinux: glibc 2.17+ x86-64

polars_hashfilter-0.1.0-cp312-cp312-macosx_11_0_arm64.whl (4.0 MB view details)

Uploaded CPython 3.12macOS 11.0+ ARM64

polars_hashfilter-0.1.0-cp311-cp311-win_amd64.whl (4.4 MB view details)

Uploaded CPython 3.11Windows x86-64

polars_hashfilter-0.1.0-cp311-cp311-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (4.4 MB view details)

Uploaded CPython 3.11manylinux: glibc 2.17+ x86-64

polars_hashfilter-0.1.0-cp311-cp311-macosx_11_0_arm64.whl (4.0 MB view details)

Uploaded CPython 3.11macOS 11.0+ ARM64

polars_hashfilter-0.1.0-cp310-cp310-win_amd64.whl (4.4 MB view details)

Uploaded CPython 3.10Windows x86-64

polars_hashfilter-0.1.0-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (4.4 MB view details)

Uploaded CPython 3.10manylinux: glibc 2.17+ x86-64

polars_hashfilter-0.1.0-cp310-cp310-macosx_11_0_arm64.whl (4.0 MB view details)

Uploaded CPython 3.10macOS 11.0+ ARM64

File details

Details for the file polars_hashfilter-0.1.0.tar.gz.

File metadata

  • Download URL: polars_hashfilter-0.1.0.tar.gz
  • Upload date:
  • Size: 51.8 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for polars_hashfilter-0.1.0.tar.gz
Algorithm Hash digest
SHA256 ce960ef5c585ac4fb3c189d04a129612fcdc68bec0811a38caa13e22ed3e491b
MD5 dc0aa2183d9f07051db70fd857b9c3c7
BLAKE2b-256 77ab5f74705b9b596d9a24c803dcfbbf89389baceeb0fdad81a3396d009f0581

See more details on using hashes here.

Provenance

The following attestation bundles were made for polars_hashfilter-0.1.0.tar.gz:

Publisher: release.yml on jpfeuffer/polars-hashfilter

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file polars_hashfilter-0.1.0-cp314-cp314-win_amd64.whl.

File metadata

File hashes

Hashes for polars_hashfilter-0.1.0-cp314-cp314-win_amd64.whl
Algorithm Hash digest
SHA256 4d47dafc842f3ec4dfac6082aef4c360d7da8de136b702505aab534c554205e6
MD5 80c182dc29575bde0ef73e4156c12c94
BLAKE2b-256 9a0eaa3f51b625c2195b2b1ebeb842590805483a2f90629260ce7c96d076e9dd

See more details on using hashes here.

Provenance

The following attestation bundles were made for polars_hashfilter-0.1.0-cp314-cp314-win_amd64.whl:

Publisher: release.yml on jpfeuffer/polars-hashfilter

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file polars_hashfilter-0.1.0-cp314-cp314-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.

File metadata

File hashes

Hashes for polars_hashfilter-0.1.0-cp314-cp314-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm Hash digest
SHA256 11f4be4e47674de3f7edfe47a6b2a70758119e79dcfd340fdf5fa6c12195fdae
MD5 4aac8cc1ad170594c4946840cdd86d71
BLAKE2b-256 f6519525f6ef48488a975cc2ef23766e9c63fe60b87221856745ae36313b14f7

See more details on using hashes here.

Provenance

The following attestation bundles were made for polars_hashfilter-0.1.0-cp314-cp314-manylinux_2_17_x86_64.manylinux2014_x86_64.whl:

Publisher: release.yml on jpfeuffer/polars-hashfilter

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file polars_hashfilter-0.1.0-cp314-cp314-macosx_11_0_arm64.whl.

File metadata

File hashes

Hashes for polars_hashfilter-0.1.0-cp314-cp314-macosx_11_0_arm64.whl
Algorithm Hash digest
SHA256 35d98a05cf025798d75cdb90180e134bb168313f2b50bde4ae03856b2a959d64
MD5 de73d186a8633c01d514c1afc66cad1d
BLAKE2b-256 2b2dace79055a78759fff6a4efb357501ff99bc9b9d74909dca1102876dec4db

See more details on using hashes here.

Provenance

The following attestation bundles were made for polars_hashfilter-0.1.0-cp314-cp314-macosx_11_0_arm64.whl:

Publisher: release.yml on jpfeuffer/polars-hashfilter

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file polars_hashfilter-0.1.0-cp313-cp313-win_amd64.whl.

File metadata

File hashes

Hashes for polars_hashfilter-0.1.0-cp313-cp313-win_amd64.whl
Algorithm Hash digest
SHA256 00951eed23a9b696aec4a7195072b56b69366060ae4dba780520da38c9a05b17
MD5 7c9142f361dbf9e0d10b8c750e2d2ddd
BLAKE2b-256 eb8d4a7223b42630a1bac13e7234844041073baf3111da70877eac09aab83a47

See more details on using hashes here.

Provenance

The following attestation bundles were made for polars_hashfilter-0.1.0-cp313-cp313-win_amd64.whl:

Publisher: release.yml on jpfeuffer/polars-hashfilter

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file polars_hashfilter-0.1.0-cp313-cp313-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.

File metadata

File hashes

Hashes for polars_hashfilter-0.1.0-cp313-cp313-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm Hash digest
SHA256 05a715e640835b884a023d2033e108ad5115d90b0cab885be9207ac71bc64f59
MD5 6035f1c56d7d8d7b3ecf5ef2ea68e49a
BLAKE2b-256 050a021d0bb1ffbe694de050ee93096849e58c1e9305f9426a828339fa7bae85

See more details on using hashes here.

Provenance

The following attestation bundles were made for polars_hashfilter-0.1.0-cp313-cp313-manylinux_2_17_x86_64.manylinux2014_x86_64.whl:

Publisher: release.yml on jpfeuffer/polars-hashfilter

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file polars_hashfilter-0.1.0-cp313-cp313-macosx_11_0_arm64.whl.

File metadata

File hashes

Hashes for polars_hashfilter-0.1.0-cp313-cp313-macosx_11_0_arm64.whl
Algorithm Hash digest
SHA256 bbd5b3ccf918d3f8858757a6f5346bcebf83c0f726c95d375d7b199084fac304
MD5 d6e6f8eea06d9b7d3fe544f771c36ddd
BLAKE2b-256 dea29e457febbd396bc70014d62ef2b977e352ec4e266601215955f2995b8dbd

See more details on using hashes here.

Provenance

The following attestation bundles were made for polars_hashfilter-0.1.0-cp313-cp313-macosx_11_0_arm64.whl:

Publisher: release.yml on jpfeuffer/polars-hashfilter

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file polars_hashfilter-0.1.0-cp312-cp312-win_amd64.whl.

File metadata

File hashes

Hashes for polars_hashfilter-0.1.0-cp312-cp312-win_amd64.whl
Algorithm Hash digest
SHA256 2c613bd8fcd3e6ea09c730c04c0410b4627d6b29a79aa271e65aff994e400803
MD5 0caf87b9c4a493e7b1637281db042275
BLAKE2b-256 5ffc4eceb19026985e3bc8d98bc3581cc4b6097bffabc450e23d0dbbd89f4733

See more details on using hashes here.

Provenance

The following attestation bundles were made for polars_hashfilter-0.1.0-cp312-cp312-win_amd64.whl:

Publisher: release.yml on jpfeuffer/polars-hashfilter

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file polars_hashfilter-0.1.0-cp312-cp312-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.

File metadata

File hashes

Hashes for polars_hashfilter-0.1.0-cp312-cp312-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm Hash digest
SHA256 722f46d69e9832c7d6ce8c25d8b99bbc9d6f7625b9f719ca7126447e2366ee8b
MD5 c39bc54b680f1d9018d3de54f3c5d976
BLAKE2b-256 65b12f16487bde6d1aa3a4d58323bab2dc1b35072b2b129cf71d640b88e4dc7d

See more details on using hashes here.

Provenance

The following attestation bundles were made for polars_hashfilter-0.1.0-cp312-cp312-manylinux_2_17_x86_64.manylinux2014_x86_64.whl:

Publisher: release.yml on jpfeuffer/polars-hashfilter

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file polars_hashfilter-0.1.0-cp312-cp312-macosx_11_0_arm64.whl.

File metadata

File hashes

Hashes for polars_hashfilter-0.1.0-cp312-cp312-macosx_11_0_arm64.whl
Algorithm Hash digest
SHA256 eaf0a2e4e2a41bd349c0112041538d0cbdbfde6c9ef885a928c8bff979f921a2
MD5 4c56876c88d76f8ff993c3a1b9a52287
BLAKE2b-256 dc4b57f95e0daf640d94ee0319e2b8077985d1da4f9cb0bc079a8f6762b1de7d

See more details on using hashes here.

Provenance

The following attestation bundles were made for polars_hashfilter-0.1.0-cp312-cp312-macosx_11_0_arm64.whl:

Publisher: release.yml on jpfeuffer/polars-hashfilter

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file polars_hashfilter-0.1.0-cp311-cp311-win_amd64.whl.

File metadata

File hashes

Hashes for polars_hashfilter-0.1.0-cp311-cp311-win_amd64.whl
Algorithm Hash digest
SHA256 189f1f04db7d2f97ef7e8064ef61ba1024a377bdec15163ab022524414117876
MD5 51bc4c74547e8e515088ae10b17d8149
BLAKE2b-256 3abd60b39685922a66f9e03675d864e126382c3588d506928bf0211415f6a496

See more details on using hashes here.

Provenance

The following attestation bundles were made for polars_hashfilter-0.1.0-cp311-cp311-win_amd64.whl:

Publisher: release.yml on jpfeuffer/polars-hashfilter

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file polars_hashfilter-0.1.0-cp311-cp311-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.

File metadata

File hashes

Hashes for polars_hashfilter-0.1.0-cp311-cp311-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm Hash digest
SHA256 0ee1658c23d9c40ae1f5838b89402488034c79d9735ef8e6d19d21838071464b
MD5 464cfe516531a66a5dd2d1e6c552fbe0
BLAKE2b-256 df8548e39d4ebe650b9d2fdeffbf85f7c343fc3275674878b25ef5ff2bb89ae2

See more details on using hashes here.

Provenance

The following attestation bundles were made for polars_hashfilter-0.1.0-cp311-cp311-manylinux_2_17_x86_64.manylinux2014_x86_64.whl:

Publisher: release.yml on jpfeuffer/polars-hashfilter

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file polars_hashfilter-0.1.0-cp311-cp311-macosx_11_0_arm64.whl.

File metadata

File hashes

Hashes for polars_hashfilter-0.1.0-cp311-cp311-macosx_11_0_arm64.whl
Algorithm Hash digest
SHA256 50746de6d25d75423041b9953999e06e60f375df8b06b4f13027c2fe0a8c805d
MD5 8e5594d35843620d732794fcb59f33f9
BLAKE2b-256 0c0d8524a6640d8ce25d279f5fa98a1c063b7ce6edbe00acf9a5755629fbef18

See more details on using hashes here.

Provenance

The following attestation bundles were made for polars_hashfilter-0.1.0-cp311-cp311-macosx_11_0_arm64.whl:

Publisher: release.yml on jpfeuffer/polars-hashfilter

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file polars_hashfilter-0.1.0-cp310-cp310-win_amd64.whl.

File metadata

File hashes

Hashes for polars_hashfilter-0.1.0-cp310-cp310-win_amd64.whl
Algorithm Hash digest
SHA256 9613b28e7b4da18e56810b2d652eefb4aa9d364698e548cf592621b3cbeb6cf9
MD5 c9842df5a151a3a0003565421be64a42
BLAKE2b-256 8829f26f4ecf4f1a9fb7a4302b3b6d0115007318c272fa3e10b8cca70cd4f0a5

See more details on using hashes here.

Provenance

The following attestation bundles were made for polars_hashfilter-0.1.0-cp310-cp310-win_amd64.whl:

Publisher: release.yml on jpfeuffer/polars-hashfilter

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file polars_hashfilter-0.1.0-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.

File metadata

File hashes

Hashes for polars_hashfilter-0.1.0-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm Hash digest
SHA256 e87e60257236305715c853d322d1c91ce9f3fa5390bfc11184f9f67df2809c2a
MD5 fee63ee891357d2303231c79ad1f0a7f
BLAKE2b-256 71153fdb2aa7a16495e24c109427106b6efbf0196f3baeab920bcc8d4843c9dc

See more details on using hashes here.

Provenance

The following attestation bundles were made for polars_hashfilter-0.1.0-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl:

Publisher: release.yml on jpfeuffer/polars-hashfilter

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file polars_hashfilter-0.1.0-cp310-cp310-macosx_11_0_arm64.whl.

File metadata

File hashes

Hashes for polars_hashfilter-0.1.0-cp310-cp310-macosx_11_0_arm64.whl
Algorithm Hash digest
SHA256 214bebdc661cfcbbb1f29d58ca5cc6e4c24230a3cf12728e80ae79f5b60c411a
MD5 1e0d3f701096e491b8f6e346b0fc396d
BLAKE2b-256 8d092cf4b70272187952d0ecb563143eeed8fb72897a0802cecdbfc0a7f4d1d8

See more details on using hashes here.

Provenance

The following attestation bundles were made for polars_hashfilter-0.1.0-cp310-cp310-macosx_11_0_arm64.whl:

Publisher: release.yml on jpfeuffer/polars-hashfilter

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page