No project description provided
Project description
DataSonar 🌐
This is a simple lib to sanitize big datas in files by applying cleaners. It uses lazy loading and clustering loading to optimize performance with dask lib.
Available file formats 📃
Input ↩️
- CSV
- Parquet
⚠️ : CSV datas are loaded as string !
Output ↪️
- CSV
How to use 💯
Creation of the executor 📈
from datasonar import SonarExecutorBuilder
executor = SonarExecutorBuilder().with_dates(
[
"myDateField",
"myDateTimeField"
], # Field names that must be treated
"%Y-%m-%d", # New date format
"%H:%M:%S", # New time format
).build() # Output a SonarExecutor instance that will execute sonars in the same order that they have been registered
NB : You can add more sonars if it's needed.
Load the DataFile 📄
from datasonar import DataFileService
df = DataFileService.create_from_csv("./test.csv") # If you don't specify the separator then an analyzer will determins it
df_parquet = DataFileService.create_from_parquet("./test.parquet")
Execute the sonars 🎇
executor.execute(df).export_csv("./", "result")
Going further ⏩
You can also implement your own sonars with your custom logic.
Create your sonar 👷
from datasonar import BaseSonar
from re import search, compile
class PlusSonar(BaseSonar):
"""
Sonar that increment integers by an amount
"""
REGEX_IS_INT = compile(r"^-?\d+$")
__amount: int
def __init__(self, column_names: list[str] | None, amount: int) -> None:
super().__init__(column_names)
self.__amount = amount
def is_valid(self, value: object) -> bool:
return search(self.REGEX_IS_INT, str(value))
def treat(self, value: object) -> object:
return int(value) + self.__amount
Add your sonar to the builder 🚀
from datasonar import SonarExecutorBuilder
executor = SonarExecutorBuilder().with_custom(PlusSonar(None, 2)).build()
And now you can execute it on your loaded DataFile.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file datasonar-1.0.tar.gz.
File metadata
- Download URL: datasonar-1.0.tar.gz
- Upload date:
- Size: 19.3 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/5.1.1 CPython/3.10.11
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
d8582c1fc3ed2b15c3dab3cc05eb5f69eaf83b643276511904d6c2bd6c39735b
|
|
| MD5 |
1ca8c36fa3d57ffc540c8e4ba910f59f
|
|
| BLAKE2b-256 |
114ba42829d1cc4672b1b623a676cecdacf1dc03b2d10962643a6fba2c84e3fe
|
File details
Details for the file datasonar-1.0-py3-none-any.whl.
File metadata
- Download URL: datasonar-1.0-py3-none-any.whl
- Upload date:
- Size: 21.4 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/5.1.1 CPython/3.10.11
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
93167e63a83edb80468ca7574166c49d493af56e5a31192e8f1019e4d3b69050
|
|
| MD5 |
b4c7065c7b250c8546c6ab49d9684cf2
|
|
| BLAKE2b-256 |
402915348d27f49503fd49dc66fa370ff64be2231ae76596ac80ee5986b2cb61
|