Skip to main content

No project description provided

Project description

DataSonar 🌐

This is a simple lib to sanitize big datas in files by applying cleaners. It uses lazy loading and clustering loading to optimize performance with dask lib.

Available file formats 📃

Input ↩️

  • CSV
  • Parquet

⚠️ : CSV datas are loaded as string !

Output ↪️

  • CSV

How to use 💯

Creation of the executor 📈

from datasonar import SonarExecutorBuilder

executor = SonarExecutorBuilder().with_dates(
    [
        "myDateField",
        "myDateTimeField"
    ],  # Field names that must be treated
    "%Y-%m-%d",  # New date format
    "%H:%M:%S",  # New time format
    ).build()  # Output a SonarExecutor instance that will execute sonars in the same order that they have been registered

NB : You can add more sonars if it's needed.

Load the DataFile 📄

from datasonar import DataFileService

df = DataFileService.create_from_csv("./test.csv")  # If you don't specify the separator then an analyzer will determins it
df_parquet = DataFileService.create_from_parquet("./test.parquet")

Execute the sonars 🎇

executor.execute(df).export_csv("./", "result")

Going further ⏩

You can also implement your own sonars with your custom logic.

Create your sonar 👷

from datasonar import BaseSonar
from re import search, compile


class PlusSonar(BaseSonar):
    """
    Sonar that increment integers by an amount
    """
    REGEX_IS_INT = compile(r"^-?\d+$")

    __amount: int

    def __init__(self, column_names: list[str] | None, amount: int) -> None:
        super().__init__(column_names)
        self.__amount = amount

    def is_valid(self, value: object) -> bool:
        return search(self.REGEX_IS_INT, str(value))

    def treat(self, value: object) -> object:
        return int(value) + self.__amount

Add your sonar to the builder 🚀

from datasonar import SonarExecutorBuilder

executor = SonarExecutorBuilder().with_custom(PlusSonar(None, 2)).build()

And now you can execute it on your loaded DataFile.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

datasonar-1.0.tar.gz (19.3 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

datasonar-1.0-py3-none-any.whl (21.4 kB view details)

Uploaded Python 3

File details

Details for the file datasonar-1.0.tar.gz.

File metadata

  • Download URL: datasonar-1.0.tar.gz
  • Upload date:
  • Size: 19.3 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/5.1.1 CPython/3.10.11

File hashes

Hashes for datasonar-1.0.tar.gz
Algorithm Hash digest
SHA256 d8582c1fc3ed2b15c3dab3cc05eb5f69eaf83b643276511904d6c2bd6c39735b
MD5 1ca8c36fa3d57ffc540c8e4ba910f59f
BLAKE2b-256 114ba42829d1cc4672b1b623a676cecdacf1dc03b2d10962643a6fba2c84e3fe

See more details on using hashes here.

File details

Details for the file datasonar-1.0-py3-none-any.whl.

File metadata

  • Download URL: datasonar-1.0-py3-none-any.whl
  • Upload date:
  • Size: 21.4 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/5.1.1 CPython/3.10.11

File hashes

Hashes for datasonar-1.0-py3-none-any.whl
Algorithm Hash digest
SHA256 93167e63a83edb80468ca7574166c49d493af56e5a31192e8f1019e4d3b69050
MD5 b4c7065c7b250c8546c6ab49d9684cf2
BLAKE2b-256 402915348d27f49503fd49dc66fa370ff64be2231ae76596ac80ee5986b2cb61

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page