Skip to main content

Python library for simplifying data science

Project description

Atlantis

Atlantis is a Python library for simplifying programming with Python for data science.

Installation

You can just use pip to install Atlantis:

pip install atlantis

Modules

  • collections helps with working with collections.
  • colour simplifies using colours.
  • ds (datascience) provides tools for:
    • data wrangling,
    • validation,
    • tuning,
    • sampling,
    • evaluation,
    • clustering, and
    • parallel processing of machine learning models.
  • functions manages higher order functions.
  • hash simplifies and standardizes hashing.
  • text makes working with texts and strings easy.
  • time
    • provides methods for interacting with time and date as well as
    • progress bars

collections

This module of the package atlantis helps with working with collections.

flatten

from atlantis.collections import flatten
flatten([1, 2, [3, 4, [5, 6], 7], 8])

returns: [1, 2, 3, 4, 5, 6, 7, 8]

List

This class inherits from Python's list class but implements a few additional functionalities.

from atlantis.collections import List
l = List(1, 2, 3, 4, 2, [1, 2], [1, 2])

Flattening:

l.flatten()
>>> List: [1, 2, 3, 4, 2, 1, 2, 1, 2]

Finding duplicates:

l.get_duplicates()
>>> List: [2, List: [1, 2]]

Note: the list elements of a List automatically get converted to Lists, recursively.

ds (Data Science)

This module provides data science tools for:

  • data wrangling,
  • validation,
  • tuning,
  • sampling,
  • evaluation,
  • clustering, and
  • parallel processing of machine learning models.

KMeans Clustering

I have used the KMeans class from both sklearn and that of pyspark and was frustrated by two problems: (a) even though the two classes do exactly the same thing their interfaces are vastly different and (b) some of the simplest operations are very hard to do with both classes. I solved this problem by creating my own KMeans class that is a wrapper aroung both of those classes and uses the appropriate one automatically without complicating it for the data scientist programmer.

Usage

from atlantis.ds.clustering import KMeans

kmeans = KMeans(n_clusters=3, n_jobs=10)
kmeans.fit(X=X)

predictions = kmeans.predict(X=X)
transformed_x = kmeans.transform(X=X)

Clustering Optimization

Usage

from atlantis.ds.clustering import ClusteringOptimizer

clustering_optimizer = ClusteringOptimizer(min_k=2, max_k=16, n_jobs=10)
clustering_optimizer.fit(X=X)
print(f'best number of clusters: {clustering_optimizer.optimal_number_of_clusters}')

Project details


Release history Release notifications | RSS feed

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

atlantis-2023.6.21.tar.gz (131.8 kB view details)

Uploaded Source

Built Distribution

atlantis-2023.6.21-py3-none-any.whl (199.8 kB view details)

Uploaded Python 3

File details

Details for the file atlantis-2023.6.21.tar.gz.

File metadata

  • Download URL: atlantis-2023.6.21.tar.gz
  • Upload date:
  • Size: 131.8 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.7.1 importlib_metadata/4.10.1 pkginfo/1.8.2 requests/2.25.1 requests-toolbelt/0.9.1 tqdm/4.61.2 CPython/3.9.5

File hashes

Hashes for atlantis-2023.6.21.tar.gz
Algorithm Hash digest
SHA256 359d5cfb205a6af69f5b03e5437f5ab89e1af1bf08c8afee03cce8f4886cf316
MD5 f5144afcfa6cd9952845366b26ab9745
BLAKE2b-256 0c2635e2b5338f6a185075821572ab1d7c450cbf43e65fbcf7f316ffcd6c8013

See more details on using hashes here.

File details

Details for the file atlantis-2023.6.21-py3-none-any.whl.

File metadata

  • Download URL: atlantis-2023.6.21-py3-none-any.whl
  • Upload date:
  • Size: 199.8 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.7.1 importlib_metadata/4.10.1 pkginfo/1.8.2 requests/2.25.1 requests-toolbelt/0.9.1 tqdm/4.61.2 CPython/3.9.5

File hashes

Hashes for atlantis-2023.6.21-py3-none-any.whl
Algorithm Hash digest
SHA256 9fba22cf704e9a9915f1e982029db01e2c37e844374b4ddaf0b0f91cf7470387
MD5 20ca09150bb234826a49401fea12af43
BLAKE2b-256 202f20b28369cacfef33bdf7f812511c846c31cf842b7787110fea125f3ebe68

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page