No project description provided
Project description
Polars Extension for General Data Science Use
Currently in Alpha. Feel free to submit feature requests in the issues section of the repo.
The goal for this package is to provide data scientists/analysts/engineers/quants more tools to manipulate, transform, and make sense of data, without the need to leave DataFrame land (aka Wonderland).
This package will also be a "lower level" backend for another package of mine called dsds. See here. This package will change the ways of how many functions work in dsds.
Performance is a focus, but sometimes it's impossible to beat NumPy/SciPy performance for a single operation on a single array. There can be many reasons: Interop cost (sometimes copies needed), null checks, lack of support for complex number (e.g We have to do multiple copies in the FFT implementation), or we haven't found the most optimized way to write some algorithm, etc.
However, there are greater benefits for staying in DataFrame land:
- Works with Polars expression engine and more expressions can be executed in parallel. E.g. running fft for 1 series may be slower than NumPy, but if you are running some fft, together with some other non-trivial operations, the story changes completely.
- Works in group_by context. E.g. run multiple linear regressions in parallel in a group_by context.
- Staying in DataFrame land typically keeps code cleaner and less confusing.
Some examples:
df.group_by("dummy").agg(
pl.col("y").num_ext.lstsq(pl.col("a"), pl.col("b"), add_bias = True).alias("list_float")
)
shape: (2, 2)
┌───────┬─────────────┐
│ dummy ┆ list_float │
│ --- ┆ --- │
│ str ┆ list[f64] │
╞═══════╪═════════════╡
│ b ┆ [2.0, -1.0] │
│ a ┆ [2.0, -1.0] │
└───────┴─────────────┘
df.group_by("dummy_groups").agg(
pl.col("actual").num_ext.l2_loss(pl.col("predicted")).alias("l2"),
pl.col("actual").num_ext.bce(pl.col("predicted")).alias("log loss"),
pl.col("actual").num_ext.roc_auc(pl.col("predicted")).alias("roc_auc")
)
shape: (2, 4)
┌──────────────┬──────────┬──────────┬──────────┐
│ dummy_groups ┆ l2 ┆ log loss ┆ roc_auc │
│ --- ┆ --- ┆ --- ┆ --- │
│ str ┆ f64 ┆ f64 ┆ f64 │
╞══════════════╪══════════╪══════════╪══════════╡
│ b ┆ 0.333887 ┆ 0.999602 ┆ 0.498913 │
│ a ┆ 0.332575 ┆ 0.997049 ┆ 0.501997 │
└──────────────┴──────────┴──────────┴──────────┘
To avoid Chunked array is not contiguous
error, try to rechunk your dataframe.
The package right now contains two extensions:
Numeric Extension
Existing Features
- GCD, LCM for integers
- harmonic mean, geometric mean, other common, simple metrics used in industry.
- Common loss functions, e.g. L1, L2, L infinity, huber loss, MAPE, SMAPE, wMAPE, etc.
- Common mini-models, lstsq, condition entropy.
- Discrete Fourier Transform, returning the real and complex part of the new series.
String Extension
Existing Features
- Levenshtein distance, Hamming distance, str Jaccard similarity
- Simple Tokenize
- Stemming (Right now only Snowball stemmer for English)
Todo list
- Longest common subsequence as string distance metric
- Vectorizers (Count + TFIDF)?
- Similarity version of the distances, and more variations and parameters.
Other Extensions ?
E.g. stats_ext, dist_ext (L^p distance for vectors (scalar version is implemented) etc.) etc.
Simple unsupervised clusters can also be done. It is simply a matter of willingness and market demand.
Disclaimer
Rust Snowball Stemmer is taken from Tsoding's Seroost project (MIT). See here
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distributions
Hashes for polars_ds-0.1.1-pp310-pypy310_pp73-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 2389f28dff7228ded40e016d27a937e53dcbb3bea21b95dfe3749e9882040220 |
|
MD5 | 7aa150bdf1b09b9e9f79d13f19a96ec0 |
|
BLAKE2b-256 | ee7a4fe565d12d5dcf298e0b8aa157c0756eb21300095e0670e26eb9c02b74bf |
Hashes for polars_ds-0.1.1-pp310-pypy310_pp73-manylinux_2_17_aarch64.manylinux2014_aarch64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 58c6d15c56ca44b47397dd0e9642bc316f90e52c7ae6a1b3d6cfd5f5ac30e5d7 |
|
MD5 | 1f57932708290b0f63ec1f17de380371 |
|
BLAKE2b-256 | 820b655d0f07eb7a75e2dc6b9f9b3982494f0f7ac49ce1dff87b53bfaf5da490 |
Hashes for polars_ds-0.1.1-pp39-pypy39_pp73-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | e1dd6d0775c8016826866c6d2caeb91ae7665d5d4c8bb7aedb4033d11558a31d |
|
MD5 | 0ed1ff42f2e016a1ec0e723070225a05 |
|
BLAKE2b-256 | 426c65a793cc203030f92916b5a475ae41eae749a9c3f46526408acd5dc901ea |
Hashes for polars_ds-0.1.1-pp39-pypy39_pp73-manylinux_2_17_aarch64.manylinux2014_aarch64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | bf29a9192bd9f3f83ce64b4ef31cbc01f10f2bfe695c0d1db58bb982cf948422 |
|
MD5 | 949faee5ff8788ea989789e69ebfad2f |
|
BLAKE2b-256 | 235bbd03e04b49083be13a87179cbcca637203b9dc2d0d0ddd60f1d8624c491d |
Hashes for polars_ds-0.1.1-pp38-pypy38_pp73-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 4e1dd30e8d1dfa34bf59cba617dae6bd3b5bb69df79d15982deed9bffe1eed9e |
|
MD5 | e870580d90c118ca90400b3547189a95 |
|
BLAKE2b-256 | 2a632a6430365d51e963e94574dd62caa35599bd9e82fa26668f33a076a32d61 |
Hashes for polars_ds-0.1.1-pp38-pypy38_pp73-manylinux_2_17_aarch64.manylinux2014_aarch64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 9d52e53049f6452d217bf3e4349f731042538af4db78ab4b6cadfde2036072cf |
|
MD5 | 08824d42d6f80b0790f05d7f94757127 |
|
BLAKE2b-256 | 4848fd76c2bf33937beb669ceb5a7df337638d0137544b5f9771ad39bccfa5e6 |
Hashes for polars_ds-0.1.1-cp313-cp313-manylinux_2_17_aarch64.manylinux2014_aarch64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | e46f4b5f35972ed28ae65c0169196b505ed735fd23028590c92cb42102a0e4b5 |
|
MD5 | 60f556f784c2dd873e3b2370c22c213d |
|
BLAKE2b-256 | 45b64656df3b459a30ce0411754ba2abf57880dac15dd2c4ae2778a634d20076 |
Hashes for polars_ds-0.1.1-cp312-none-win32.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 85fa0cf8ec11670e2026b7a0bf424b115b298b10afd148f627e36fb45bd17b6f |
|
MD5 | 358614f1424be8db6e741a345861e75c |
|
BLAKE2b-256 | c67d22e6541ab23fc4e2ca19a4502c79df4c578f7cb4cb4094ef01d9e307ba37 |
Hashes for polars_ds-0.1.1-cp312-cp312-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 96b78750a76a8c766ee9b6c76ea0bef296b79340474ed567b79aa89156462670 |
|
MD5 | e65abb89a8456ad4707bd7fe463e4c36 |
|
BLAKE2b-256 | f4c262d5f718e03c98283caa2c822b1b42e9479a587268e9bde853a76be8d2e4 |
Hashes for polars_ds-0.1.1-cp312-cp312-manylinux_2_17_aarch64.manylinux2014_aarch64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | f838106aac1322399d3e1db605a5ec3c5392ab6c9b4163bf7e71b8037a72f000 |
|
MD5 | 48e4e09da57a2268e5a09a8e3c754dd3 |
|
BLAKE2b-256 | 5aec9bdade6cc5aeee2f5b9350a7eae37d36b3d15eb7ee6172cc4466c90c3ed2 |
Hashes for polars_ds-0.1.1-cp312-cp312-macosx_10_12_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 20a41af0049ddae439324401d0b894435047e8545f26e0940c2e0372e3bfb548 |
|
MD5 | 1dd7e01314623cf9f44b4632bb0ca668 |
|
BLAKE2b-256 | 857ca470b5f193b6cf118a974b9853215bd1f374045d736b07f47d5afe6a845d |
Hashes for polars_ds-0.1.1-cp311-none-win32.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | fee91d8e028b1f29d3396a42f1b5847f26648298f7bb4e1e2d547cc7fd183fcf |
|
MD5 | 2b935975cbdaffc57f7a4c90b1aed58a |
|
BLAKE2b-256 | daf7741df58646ed3e7e76f28f0f1bdb90480d82939a659c3e7b560b11dcbbe4 |
Hashes for polars_ds-0.1.1-cp311-cp311-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 1e530ef507be2aa231fd06e0a8cb9a210940ab7feeff5bbdd77f38902f74472d |
|
MD5 | ccee4527b19a91f4d339d94cfd68bb46 |
|
BLAKE2b-256 | 617029e334320abf9019983434cc16fd1687bbe4f9628f14012307894d8be838 |
Hashes for polars_ds-0.1.1-cp311-cp311-manylinux_2_17_aarch64.manylinux2014_aarch64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | a7e272090a608d34566d9ecca66e8d817fd4ce686e48924c2696a18666ff9b10 |
|
MD5 | 1c6774744f3b8d5b02ecbd0894860515 |
|
BLAKE2b-256 | 112a7e691bca9c30bc5361b554214ebe7136cba9ecb0c96f338cb73d8503a466 |
Hashes for polars_ds-0.1.1-cp311-cp311-macosx_10_12_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 14286109e024e730cd53a675a4b3a66f4ca23966d13006c35c2aa03dcdbed19e |
|
MD5 | 422f21ae9567d20946a0d1ba4d15a8d8 |
|
BLAKE2b-256 | 99a26fe99c12adca8f6dfe827b041875e621ad779cf55f65926ddc80290ec5ae |
Hashes for polars_ds-0.1.1-cp310-none-win32.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | c2fa07eb92020db065458be623b97776cfca5e52080fd736893ab53527308132 |
|
MD5 | 10d349bfdf7a13c8c9f36aab58b7776a |
|
BLAKE2b-256 | 3443f41287ffd5abfd776f512c4db2e4b16c99fdf97390fb88a154031cdf417c |
Hashes for polars_ds-0.1.1-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 2c0e675b2743ad1565518269bd0f07fdfc9133791ed080f4a362bf44faf615e0 |
|
MD5 | 34ed0b71a49945dee37aa4ede97291d6 |
|
BLAKE2b-256 | 936071b8300a9366c18fa1d0794c90b4464dc3f8e420a21f067be7a732a2f0b0 |
Hashes for polars_ds-0.1.1-cp310-cp310-manylinux_2_17_aarch64.manylinux2014_aarch64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | c8569006ff58854710fb1b5c5cd8eea3e2a62e1184dfd08d9127ac551552625c |
|
MD5 | 38bd25a898516efdf15ddf6552695c44 |
|
BLAKE2b-256 | a15987c70b6cbccb10226d017ef4b6126eeded160625d720ff839e11716bada4 |
Hashes for polars_ds-0.1.1-cp310-cp310-macosx_10_12_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 8330dc56a30af49489f04f6a6e332d5649f4ed987f325dd3d345ecce1ef51f41 |
|
MD5 | f7eeee5fd73988e3d007dd43f7047030 |
|
BLAKE2b-256 | f93aab31000ec30c5051b546fa5e298ec2a122e1d36b154c1b6c109b3a37a18e |
Hashes for polars_ds-0.1.1-cp39-none-win32.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 72f5098a95cdf51ca617b356eb96c0219d334cd96620f0bd5ad4c81d869ec3fd |
|
MD5 | e8eecafcaf2757f59eb8227565980f98 |
|
BLAKE2b-256 | ba83879850d88bdada5440503707cbc2a744db4cc7360e3e4fd00c9ac4c6b971 |
Hashes for polars_ds-0.1.1-cp39-cp39-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | a4931afa015d16a233fefcd3f3066b759dd7611aad742b49dda62e3fe46c3c3a |
|
MD5 | 8c8083439ea3186e7d75183fe30b1f75 |
|
BLAKE2b-256 | 6a6960b4dc8199bcbef98a7d7686c6e76154e70eab530aa472ecbe1e2651ea55 |
Hashes for polars_ds-0.1.1-cp39-cp39-manylinux_2_17_aarch64.manylinux2014_aarch64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 1544309f50284d3286b29bbd3790575e1066b5fc7bc9567f899ebef28edfb3b7 |
|
MD5 | 78f5934fa9665846e00652db0d4d638a |
|
BLAKE2b-256 | 0aaa0bfbf152a80e704256ab983df7976b4bc45b3a1f64e6d7507b71df398a89 |
Hashes for polars_ds-0.1.1-cp38-none-win32.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | e96a0011e49e2e3c22ec59a0c6077bbc5d0c494a1172a582cc917aca41151a30 |
|
MD5 | b5a4eb2272a8d723e518786801edb097 |
|
BLAKE2b-256 | ed47e8b749c665da057e064e7f346cbce4d4d82fc01791a000fcfa5b3d55dbd2 |
Hashes for polars_ds-0.1.1-cp38-cp38-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | fe35f9b11fab83481b2678727745318b25eeac557e46dfb06c26ea86f1fe10de |
|
MD5 | f0d23cdbf11d0e5ec41b37dc5af46079 |
|
BLAKE2b-256 | 1e06cdaf0c6afe82a6a6b6828b74fd36b1c4184959261d1feb754c3c3128b079 |
Hashes for polars_ds-0.1.1-cp38-cp38-manylinux_2_17_aarch64.manylinux2014_aarch64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 9085981a0d180771117544e331ad0f247e4099458eac5b872b08ea3cc1a1ed07 |
|
MD5 | 4b2e36945a36d23b1444b36dc63e29c8 |
|
BLAKE2b-256 | 9310f7ab4c3df10442f154ba87425f9b40fd81e804319cb3bedc02f4addbeb06 |