Skip to main content

No project description provided

Project description

DataSonar 🌐

This is a simple lib to sanitize big datas in files by applying cleaners. It uses lazy loading and clustering loading to optimize performance with dask lib.

Available file formats 📃

Input ↩️

  • CSV
  • Parquet

⚠️ : CSV datas are loaded as string !

Output ↪️

  • CSV

How to use 💯

Creation of the executor 📈

from datasonar import SonarExecutorBuilder

executor = SonarExecutorBuilder().with_dates(
    [
        "myDateField",
        "myDateTimeField"
    ],  # Field names that must be treated
    "%Y-%m-%d",  # New date format
    "%H:%M:%S",  # New time format
    ).build()  # Output a SonarExecutor instance that will execute sonars in the same order that they have been registered

NB : You can add more sonars if it's needed.

Load the DataFile 📄

from datasonar import DataFileService

df = DataFileService.create_from_csv("./test.csv")  # If you don't specify the separator then an analyzer will determins it
df_parquet = DataFileService.create_from_parquet("./test.parquet")

Execute the sonars 🎇

executor.execute(df).export_csv("./", "result")

Going further ⏩

You can also implement your own sonars with your custom logic.

Create your sonar 👷

from datasonar import BaseSonar
from re import search, compile


class PlusSonar(BaseSonar):
    """
    Sonar that increment integers by an amount
    """
    REGEX_IS_INT = compile(r"^-?\d+$")

    __amount: int

    def __init__(self, column_names: list[str] | None, amount: int) -> None:
        super().__init__(column_names)
        self.__amount = amount

    def is_valid(self, value: object) -> bool:
        return search(self.REGEX_IS_INT, str(value))

    def treat(self, value: object) -> object:
        return int(value) + self.__amount

Add your sonar to the builder 🚀

from datasonar import SonarExecutorBuilder

executor = SonarExecutorBuilder().with_custom(PlusSonar(None, 2)).build()

And now you can execute it on your loaded DataFile.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

datasonar-1.0.1.tar.gz (19.8 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

datasonar-1.0.1-py3-none-any.whl (22.4 kB view details)

Uploaded Python 3

File details

Details for the file datasonar-1.0.1.tar.gz.

File metadata

  • Download URL: datasonar-1.0.1.tar.gz
  • Upload date:
  • Size: 19.8 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/5.1.1 CPython/3.10.11

File hashes

Hashes for datasonar-1.0.1.tar.gz
Algorithm Hash digest
SHA256 fe60cb255b8a6902d973d5d753703fe2ffdcfe21c0675e29843e97e7af4e24aa
MD5 82d028c50fd862a90f69ed79a1810ebd
BLAKE2b-256 c4127b7127f0523675b74651f8034844b9eba458957cd192fed056ea289e037a

See more details on using hashes here.

File details

Details for the file datasonar-1.0.1-py3-none-any.whl.

File metadata

  • Download URL: datasonar-1.0.1-py3-none-any.whl
  • Upload date:
  • Size: 22.4 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/5.1.1 CPython/3.10.11

File hashes

Hashes for datasonar-1.0.1-py3-none-any.whl
Algorithm Hash digest
SHA256 58e0f6125fd37e33a95f4eeaa3c9993f305c39e6b3af62e98a7a4fdb1e37cb79
MD5 8e7a192cba2d9903f0f8c88414b62355
BLAKE2b-256 3124825a0de0d0e517f47af6b5cbbfa5cfeae94a8fc303a365ca646fbd9c0d18

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page