Skip to main content

Python library for fast multi-threaded data manipulation and munging.

Project description

datatable

Gitter chat PyPi version License Build Status Documentation Status Codacy Badge

This is an automate Fork of datatable. Because there is no automated build / publish pipeline for the datatable package, and it is not optimal to always install from GitHub. Fork-URL: https://github.com/semmjon/datatable.git

This is a Python package for manipulating 2-dimensional tabular data structures (aka data frames). It is close in spirit to pandas or SFrame; however we put specific emphasis on speed and big data support. As the name suggests, the package is closely related to R's data.table and attempts to mimic its core algorithms and API.

Requirements: Python 3.6+ (64 bit) and pip 20.3+.

Project goals

datatable started in 2017 as a toolkit for performing big data (up to 100GB) operations on a single-node machine, at the maximum speed possible. Such requirements are dictated by modern machine-learning applications, which need to process large volumes of data and generate many features in order to achieve the best model accuracy. The first user of datatable was Driverless.ai.

The set of features that we want to implement with datatable is at least the following:

  • Column-oriented data storage.

  • Native-C implementation for all datatypes, including strings. Packages such as pandas and numpy already do that for numeric columns, but not for strings.

  • Support for date-time and categorical types. Object type is also supported, but promotion into object discouraged.

  • All types should support null values, with as little overhead as possible.

  • Data should be stored on disk in the same format as in memory. This will allow us to memory-map data on disk and work on out-of-memory datasets transparently.

  • Work with memory-mapped datasets to avoid loading into memory more data than necessary for each particular operation.

  • Fast data reading from CSV and other formats.

  • Multi-threaded data processing: time-consuming operations should attempt to utilize all cores for maximum efficiency.

  • Efficient algorithms for sorting/grouping/joining.

  • Expressive query syntax (similar to data.table).

  • Minimal amount of data copying, copy-on-write semantics for shared data.

  • Use "rowindex" views in filtering/sorting/grouping/joining operators to avoid unnecessary data copying.

  • Interoperability with pandas / numpy / pyarrow / pure python: the users should have the ability to convert to another data-processing framework with ease.

Installation

On macOS, Linux and Windows systems installing datatable is as easy as

pip install datatable

On all other platforms a source distribution will be needed. For more information see Build instructions.

See also

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

python_datatable-1.1.2.tar.gz (1.4 MB view details)

Uploaded Source

Built Distributions

python_datatable-1.1.2-cp311-cp311-win_amd64.whl (4.3 MB view details)

Uploaded CPython 3.11 Windows x86-64

python_datatable-1.1.2-cp311-cp311-manylinux_2_35_x86_64.whl (28.5 MB view details)

Uploaded CPython 3.11 manylinux: glibc 2.35+ x86-64

python_datatable-1.1.2-cp311-cp311-macosx_10_9_universal2.whl (8.2 MB view details)

Uploaded CPython 3.11 macOS 10.9+ universal2 (ARM64, x86-64)

python_datatable-1.1.2-cp310-cp310-win_amd64.whl (4.3 MB view details)

Uploaded CPython 3.10 Windows x86-64

python_datatable-1.1.2-cp310-cp310-manylinux_2_35_x86_64.whl (28.5 MB view details)

Uploaded CPython 3.10 manylinux: glibc 2.35+ x86-64

python_datatable-1.1.2-cp310-cp310-macosx_10_15_x86_64.whl (8.2 MB view details)

Uploaded CPython 3.10 macOS 10.15+ x86-64

python_datatable-1.1.2-cp39-cp39-win_amd64.whl (4.3 MB view details)

Uploaded CPython 3.9 Windows x86-64

python_datatable-1.1.2-cp39-cp39-manylinux_2_35_x86_64.whl (28.5 MB view details)

Uploaded CPython 3.9 manylinux: glibc 2.35+ x86-64

python_datatable-1.1.2-cp39-cp39-macosx_10_15_x86_64.whl (8.2 MB view details)

Uploaded CPython 3.9 macOS 10.15+ x86-64

python_datatable-1.1.2-cp38-cp38-win_amd64.whl (4.3 MB view details)

Uploaded CPython 3.8 Windows x86-64

python_datatable-1.1.2-cp38-cp38-manylinux_2_35_x86_64.whl (28.5 MB view details)

Uploaded CPython 3.8 manylinux: glibc 2.35+ x86-64

python_datatable-1.1.2-cp38-cp38-macosx_10_15_x86_64.whl (8.2 MB view details)

Uploaded CPython 3.8 macOS 10.15+ x86-64

File details

Details for the file python_datatable-1.1.2.tar.gz.

File metadata

  • Download URL: python_datatable-1.1.2.tar.gz
  • Upload date:
  • Size: 1.4 MB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.1 CPython/3.11.0

File hashes

Hashes for python_datatable-1.1.2.tar.gz
Algorithm Hash digest
SHA256 f7dabf67fc6476b123418a1ae23dd2666d25873d9f5e1b4fa99560c5d049e6a9
MD5 360b821296dee465975c430b5bedb96d
BLAKE2b-256 6eade7ba1280f486b6b5704351507ca79379d9a866c64a3c9ef4049b3d2ee1c0

See more details on using hashes here.

File details

Details for the file python_datatable-1.1.2-cp311-cp311-win_amd64.whl.

File metadata

File hashes

Hashes for python_datatable-1.1.2-cp311-cp311-win_amd64.whl
Algorithm Hash digest
SHA256 d29b4f6ed3ecba91e1112526f90215bb6d2b3d6b80717cc00c18a832746142e2
MD5 04da6f2dd6ecba7813bdfb09d17d8084
BLAKE2b-256 0eaff4c6f31f1cfcf761848d18f00d563ea996a36481562fde0cf55bd8c71c80

See more details on using hashes here.

File details

Details for the file python_datatable-1.1.2-cp311-cp311-manylinux_2_35_x86_64.whl.

File metadata

File hashes

Hashes for python_datatable-1.1.2-cp311-cp311-manylinux_2_35_x86_64.whl
Algorithm Hash digest
SHA256 759f101a925763e988f76788ddba99aa0d8ae6bca9f44680558137b5ee41d2f5
MD5 a06b690a71060462484ed3f5e0710f54
BLAKE2b-256 8f1f7867a0b7037142013d7a056b0c314ebc1ee963f4ff79cab9da57e9346bca

See more details on using hashes here.

File details

Details for the file python_datatable-1.1.2-cp311-cp311-macosx_10_9_universal2.whl.

File metadata

File hashes

Hashes for python_datatable-1.1.2-cp311-cp311-macosx_10_9_universal2.whl
Algorithm Hash digest
SHA256 ffffcd9a2e61a1613e1b9af34f872e4f85de2c4d73d59b58a4368a0cc06bb5be
MD5 9a0b51af5863d16e15b1fb8b742a37b5
BLAKE2b-256 26f9c85ae92d9e2fe7558cea430275b4076cbba4f01f701f197f583c0728d154

See more details on using hashes here.

File details

Details for the file python_datatable-1.1.2-cp310-cp310-win_amd64.whl.

File metadata

File hashes

Hashes for python_datatable-1.1.2-cp310-cp310-win_amd64.whl
Algorithm Hash digest
SHA256 9686125783575dbb8d998509e8525eec84bcabfa13b432c8b43cca97a765d9e3
MD5 14eba6da7049438703768356dc37aebc
BLAKE2b-256 8599dbaf28fa7f7d5a08684548444a894336afddc4d180d9ea17cac033825ed4

See more details on using hashes here.

File details

Details for the file python_datatable-1.1.2-cp310-cp310-manylinux_2_35_x86_64.whl.

File metadata

File hashes

Hashes for python_datatable-1.1.2-cp310-cp310-manylinux_2_35_x86_64.whl
Algorithm Hash digest
SHA256 bb61f2abbd71c15c60977084bbe98d04cf1f451418d483702822004aa15df93f
MD5 b707ccaaf7a3d6fcec526dc5e6ff1d9c
BLAKE2b-256 046db055e6f7ef841a210808faadf2f3df1748df2204ee2d80d3dcdc4c1fdd24

See more details on using hashes here.

File details

Details for the file python_datatable-1.1.2-cp310-cp310-macosx_10_15_x86_64.whl.

File metadata

File hashes

Hashes for python_datatable-1.1.2-cp310-cp310-macosx_10_15_x86_64.whl
Algorithm Hash digest
SHA256 84745d5e17ca7b5ce6a5601a0e6462c0eef33e6ee68609359ab0b49493ed0249
MD5 22b3daf0b7df5050f8d04343e476fbc4
BLAKE2b-256 d31af500a629271b9de60f541c4dae32eecbebdd9cf32369b8d9d2b332322e52

See more details on using hashes here.

File details

Details for the file python_datatable-1.1.2-cp39-cp39-win_amd64.whl.

File metadata

File hashes

Hashes for python_datatable-1.1.2-cp39-cp39-win_amd64.whl
Algorithm Hash digest
SHA256 115f07813949e561538d5d4c74fb3a11b4f6330915990af80282ae6a22a0d806
MD5 db0ae9ecd0408f937b94978266f4a374
BLAKE2b-256 9b0dbc2c59834a4c3e360251cf5d62f7a777e802e2cc0ce5d4ff0c030e717ff7

See more details on using hashes here.

File details

Details for the file python_datatable-1.1.2-cp39-cp39-manylinux_2_35_x86_64.whl.

File metadata

File hashes

Hashes for python_datatable-1.1.2-cp39-cp39-manylinux_2_35_x86_64.whl
Algorithm Hash digest
SHA256 214a287253dc5801315a31e1a7911b9606ed00b7d79da6cd01384bc490d80d80
MD5 610e8fec9aa1f03a88e69422c5c74bc2
BLAKE2b-256 28825e8b84b959ba33d500a59200d15333e0a6de4c65dc221c02d0c590ca3711

See more details on using hashes here.

File details

Details for the file python_datatable-1.1.2-cp39-cp39-macosx_10_15_x86_64.whl.

File metadata

File hashes

Hashes for python_datatable-1.1.2-cp39-cp39-macosx_10_15_x86_64.whl
Algorithm Hash digest
SHA256 44c1c93bc39c0adfcc3e5fc2f57ec9e0bfb2586c357f88098ceb3ceac2d98577
MD5 6a8fbd5e8cf7303980da732a65a7ba9e
BLAKE2b-256 0ec18961317cf2dddd8549f5c8ba9199a6832665e93647c3df7e4a218f1d6981

See more details on using hashes here.

File details

Details for the file python_datatable-1.1.2-cp38-cp38-win_amd64.whl.

File metadata

File hashes

Hashes for python_datatable-1.1.2-cp38-cp38-win_amd64.whl
Algorithm Hash digest
SHA256 49011bac4ea60007671e25668db7348510433fb3c91328e1440f35048ba22b0e
MD5 547495b9daced980bcd059afae802ea0
BLAKE2b-256 b629e95c622d48f92517591fdbd01816a1173accfd8585ff28a66aa3ec135dce

See more details on using hashes here.

File details

Details for the file python_datatable-1.1.2-cp38-cp38-manylinux_2_35_x86_64.whl.

File metadata

File hashes

Hashes for python_datatable-1.1.2-cp38-cp38-manylinux_2_35_x86_64.whl
Algorithm Hash digest
SHA256 311df01af660011ffb0b73d7845b6fd3402e8d09499be2d1124c71054fc02ca6
MD5 3c4f27297c8374c79c5205731b88172a
BLAKE2b-256 0ef54e4766f984adf651c25f4c9e78c9252b72d492089f81bbc3ba2a21edd3e7

See more details on using hashes here.

File details

Details for the file python_datatable-1.1.2-cp38-cp38-macosx_10_15_x86_64.whl.

File metadata

File hashes

Hashes for python_datatable-1.1.2-cp38-cp38-macosx_10_15_x86_64.whl
Algorithm Hash digest
SHA256 8a34baed87af2260e93f85615da6e136526d51919b1317f7e748923013869400
MD5 d265fb558f3ca09f4b57281d3569016e
BLAKE2b-256 9f5e4440abd58706431340d072a2649b611878fdf74c2046fb4e9ca6db8ba253

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page