Python library for simplifying data science
Project description
Atlantis
Atlantis is a Python library for simplifying programming with Python for data science.
Installation
You can just use pip to install Atlantis:
pip install atlantis
Modules
- collections helps with working with collections.
- colour simplifies using colours.
- ds (datascience) provides tools for:
- data wrangling,
- validation,
- tuning,
- sampling,
- evaluation,
- clustering, and
- parallel processing of machine learning models.
- functions manages higher order functions.
- hash simplifies and standardizes hashing.
- text makes working with texts and strings easy.
- time
- provides methods for interacting with time and date as well as
- progress bars
collections
This module of the package atlantis helps with working with collections.
flatten
from atlantis.collections import flatten
flatten([1, 2, [3, 4, [5, 6], 7], 8])
returns: [1, 2, 3, 4, 5, 6, 7, 8]
List
This class inherits from Python's list class but implements a few additional functionalities.
from atlantis.collections import List
l = List(1, 2, 3, 4, 2, [1, 2], [1, 2])
Flattening:
l.flatten()
>>> List: [1, 2, 3, 4, 2, 1, 2, 1, 2]
Finding duplicates:
l.get_duplicates()
>>> List: [2, List: [1, 2]]
Note: the list elements of a List automatically get converted to Lists, recursively.
ds (Data Science)
This module provides data science tools for:
- data wrangling,
- validation,
- tuning,
- sampling,
- evaluation,
- clustering, and
- parallel processing of machine learning models.
KMeans Clustering
I have used the KMeans
class from both sklearn and that of pyspark and was frustrated
by two problems: (a) even though the two classes do exactly the same thing their interfaces
are vastly different and (b) some of the simplest operations are very hard to do with
both classes. I solved this problem by creating my own KMeans
class that is a wrapper
aroung both of those classes and uses the appropriate one automatically without
complicating it for the data scientist programmer.
Usage
from atlantis.ds.clustering import KMeans
kmeans = KMeans(n_clusters=3, n_jobs=10)
kmeans.fit(X=X)
predictions = kmeans.predict(X=X)
transformed_x = kmeans.transform(X=X)
Clustering Optimization
Usage
from atlantis.ds.clustering import ClusteringOptimizer
clustering_optimizer = ClusteringOptimizer(min_k=2, max_k=16, n_jobs=10)
clustering_optimizer.fit(X=X)
print(f'best number of clusters: {clustering_optimizer.optimal_number_of_clusters}')
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
File details
Details for the file atlantis-2023.6.21.tar.gz
.
File metadata
- Download URL: atlantis-2023.6.21.tar.gz
- Upload date:
- Size: 131.8 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/3.7.1 importlib_metadata/4.10.1 pkginfo/1.8.2 requests/2.25.1 requests-toolbelt/0.9.1 tqdm/4.61.2 CPython/3.9.5
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 359d5cfb205a6af69f5b03e5437f5ab89e1af1bf08c8afee03cce8f4886cf316 |
|
MD5 | f5144afcfa6cd9952845366b26ab9745 |
|
BLAKE2b-256 | 0c2635e2b5338f6a185075821572ab1d7c450cbf43e65fbcf7f316ffcd6c8013 |
File details
Details for the file atlantis-2023.6.21-py3-none-any.whl
.
File metadata
- Download URL: atlantis-2023.6.21-py3-none-any.whl
- Upload date:
- Size: 199.8 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/3.7.1 importlib_metadata/4.10.1 pkginfo/1.8.2 requests/2.25.1 requests-toolbelt/0.9.1 tqdm/4.61.2 CPython/3.9.5
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 9fba22cf704e9a9915f1e982029db01e2c37e844374b4ddaf0b0f91cf7470387 |
|
MD5 | 20ca09150bb234826a49401fea12af43 |
|
BLAKE2b-256 | 202f20b28369cacfef33bdf7f812511c846c31cf842b7787110fea125f3ebe68 |