Skip to main content

Python library for fast multi-threaded data manipulation and munging.

Project description

datatable

Gitter chat PyPi version License Build Status Documentation Status Codacy Badge

This is an automate Fork of datatable. Because there is no automated build / publish pipeline for the datatable package, and it is not optimal to always install from GitHub. Fork-URL: https://github.com/semmjon/datatable.git

This is a Python package for manipulating 2-dimensional tabular data structures (aka data frames). It is close in spirit to pandas or SFrame; however we put specific emphasis on speed and big data support. As the name suggests, the package is closely related to R's data.table and attempts to mimic its core algorithms and API.

Requirements: Python 3.6+ (64 bit) and pip 20.3+.

Project goals

datatable started in 2017 as a toolkit for performing big data (up to 100GB) operations on a single-node machine, at the maximum speed possible. Such requirements are dictated by modern machine-learning applications, which need to process large volumes of data and generate many features in order to achieve the best model accuracy. The first user of datatable was Driverless.ai.

The set of features that we want to implement with datatable is at least the following:

  • Column-oriented data storage.

  • Native-C implementation for all datatypes, including strings. Packages such as pandas and numpy already do that for numeric columns, but not for strings.

  • Support for date-time and categorical types. Object type is also supported, but promotion into object discouraged.

  • All types should support null values, with as little overhead as possible.

  • Data should be stored on disk in the same format as in memory. This will allow us to memory-map data on disk and work on out-of-memory datasets transparently.

  • Work with memory-mapped datasets to avoid loading into memory more data than necessary for each particular operation.

  • Fast data reading from CSV and other formats.

  • Multi-threaded data processing: time-consuming operations should attempt to utilize all cores for maximum efficiency.

  • Efficient algorithms for sorting/grouping/joining.

  • Expressive query syntax (similar to data.table).

  • Minimal amount of data copying, copy-on-write semantics for shared data.

  • Use "rowindex" views in filtering/sorting/grouping/joining operators to avoid unnecessary data copying.

  • Interoperability with pandas / numpy / pyarrow / pure python: the users should have the ability to convert to another data-processing framework with ease.

Installation

On macOS, Linux and Windows systems installing datatable is as easy as

pip install datatable

On all other platforms a source distribution will be needed. For more information see Build instructions.

See also

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

python_datatable-1.1.3.tar.gz (1.3 MB view details)

Uploaded Source

Built Distributions

python_datatable-1.1.3-cp311-cp311-win_amd64.whl (4.3 MB view details)

Uploaded CPython 3.11 Windows x86-64

python_datatable-1.1.3-cp311-cp311-manylinux_2_35_x86_64.whl (28.5 MB view details)

Uploaded CPython 3.11 manylinux: glibc 2.35+ x86-64

python_datatable-1.1.3-cp311-cp311-macosx_10_9_universal2.whl (8.2 MB view details)

Uploaded CPython 3.11 macOS 10.9+ universal2 (ARM64, x86-64)

python_datatable-1.1.3-cp310-cp310-win_amd64.whl (4.3 MB view details)

Uploaded CPython 3.10 Windows x86-64

python_datatable-1.1.3-cp310-cp310-manylinux_2_35_x86_64.whl (28.5 MB view details)

Uploaded CPython 3.10 manylinux: glibc 2.35+ x86-64

python_datatable-1.1.3-cp310-cp310-macosx_10_15_x86_64.whl (8.2 MB view details)

Uploaded CPython 3.10 macOS 10.15+ x86-64

python_datatable-1.1.3-cp39-cp39-win_amd64.whl (4.3 MB view details)

Uploaded CPython 3.9 Windows x86-64

python_datatable-1.1.3-cp39-cp39-manylinux_2_35_x86_64.whl (28.5 MB view details)

Uploaded CPython 3.9 manylinux: glibc 2.35+ x86-64

python_datatable-1.1.3-cp39-cp39-macosx_10_15_x86_64.whl (8.2 MB view details)

Uploaded CPython 3.9 macOS 10.15+ x86-64

python_datatable-1.1.3-cp38-cp38-win_amd64.whl (4.3 MB view details)

Uploaded CPython 3.8 Windows x86-64

python_datatable-1.1.3-cp38-cp38-manylinux_2_35_x86_64.whl (28.5 MB view details)

Uploaded CPython 3.8 manylinux: glibc 2.35+ x86-64

python_datatable-1.1.3-cp38-cp38-macosx_10_15_x86_64.whl (8.2 MB view details)

Uploaded CPython 3.8 macOS 10.15+ x86-64

File details

Details for the file python_datatable-1.1.3.tar.gz.

File metadata

  • Download URL: python_datatable-1.1.3.tar.gz
  • Upload date:
  • Size: 1.3 MB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.2 CPython/3.10.8

File hashes

Hashes for python_datatable-1.1.3.tar.gz
Algorithm Hash digest
SHA256 7a102d5892ed627f8f389261aa219a3107f2e5a3a32865bc06963cb2d2d991b5
MD5 e3560e1cbbe322efd52549e95fcb5c51
BLAKE2b-256 8d452115a7d65483be25906e2c574111e924fec10aa4e8f37f6f4845019c06d6

See more details on using hashes here.

File details

Details for the file python_datatable-1.1.3-cp311-cp311-win_amd64.whl.

File metadata

File hashes

Hashes for python_datatable-1.1.3-cp311-cp311-win_amd64.whl
Algorithm Hash digest
SHA256 9802c9bbb874df9097d475533a3706667769bc1b446d76daff3130a7829a663d
MD5 0e940e33eb784328a27f692a3fb83970
BLAKE2b-256 189182e0299f6011a1dddd2d392dd69c371d6f1e77acb85751b58e2d98cde8c3

See more details on using hashes here.

File details

Details for the file python_datatable-1.1.3-cp311-cp311-manylinux_2_35_x86_64.whl.

File metadata

File hashes

Hashes for python_datatable-1.1.3-cp311-cp311-manylinux_2_35_x86_64.whl
Algorithm Hash digest
SHA256 56a3bd7935cdeddaba7f380ab865f5ef7f8ed48e47c62cf16fc396657a1b55e5
MD5 a6fec622ef3866245724ca7217827467
BLAKE2b-256 58fe43183744b99fa42b347be46186da900524fdcb56320b6786e994720e093a

See more details on using hashes here.

File details

Details for the file python_datatable-1.1.3-cp311-cp311-macosx_10_9_universal2.whl.

File metadata

File hashes

Hashes for python_datatable-1.1.3-cp311-cp311-macosx_10_9_universal2.whl
Algorithm Hash digest
SHA256 a445dd9794a27c965b1727a7c7353629855503ec42b7f2429d5228650dc0d0b0
MD5 ecf3e85604c92e62d9f91cfb173c8d59
BLAKE2b-256 96e54f8f1ca4e5e616cadf2f080bc6ef895c740106f891a42bdbaa86ce65d441

See more details on using hashes here.

File details

Details for the file python_datatable-1.1.3-cp310-cp310-win_amd64.whl.

File metadata

File hashes

Hashes for python_datatable-1.1.3-cp310-cp310-win_amd64.whl
Algorithm Hash digest
SHA256 b8dbb1bd51e89ed86010cc14f56eac49216aad7aae409d5e3f6bcab67e626360
MD5 3a1bbfdc4659fd74a089319fd5e2204b
BLAKE2b-256 e3e7d9a24a48ba7c31da9b3f5b24a5180d591758970440212fdf0f106031ea7d

See more details on using hashes here.

File details

Details for the file python_datatable-1.1.3-cp310-cp310-manylinux_2_35_x86_64.whl.

File metadata

File hashes

Hashes for python_datatable-1.1.3-cp310-cp310-manylinux_2_35_x86_64.whl
Algorithm Hash digest
SHA256 b9d30c536079ee3a31ae052006efbf8169fc684f399fd12052c32abaafe18b7a
MD5 04ddeeabca8d2dd6653790109160d6ef
BLAKE2b-256 f0c3c8652c702fbea435ca181565291bf77b6bfe13f9fcbf13f5faea9d0f6349

See more details on using hashes here.

File details

Details for the file python_datatable-1.1.3-cp310-cp310-macosx_10_15_x86_64.whl.

File metadata

File hashes

Hashes for python_datatable-1.1.3-cp310-cp310-macosx_10_15_x86_64.whl
Algorithm Hash digest
SHA256 e97bc8d898d06ab750a31a9aba38e1f585b4465c03170ea81d489301fe8d91c4
MD5 b0fd7a90c0e5028e9f2fb43667fa659d
BLAKE2b-256 e46a552fef6dabd649f153b8446284724c9687580fc1eb0dfa0af86ed68c366c

See more details on using hashes here.

File details

Details for the file python_datatable-1.1.3-cp39-cp39-win_amd64.whl.

File metadata

File hashes

Hashes for python_datatable-1.1.3-cp39-cp39-win_amd64.whl
Algorithm Hash digest
SHA256 b612410099fd210dafd1d0e59ac6a2c447b9b90973afcabb2c01f2355b38a214
MD5 c89c1d95bb0d9280e32897c175b6dea9
BLAKE2b-256 fc455ef1a431f1e6f5ef475e436f97e7e627e897682e119db2120194859750a5

See more details on using hashes here.

File details

Details for the file python_datatable-1.1.3-cp39-cp39-manylinux_2_35_x86_64.whl.

File metadata

File hashes

Hashes for python_datatable-1.1.3-cp39-cp39-manylinux_2_35_x86_64.whl
Algorithm Hash digest
SHA256 7adc63734ca11a37d41c6ae3b642d083f3971815469a2e0fe3240a71af707574
MD5 8bfc32ce2f91d255f860184dcfdd9c0a
BLAKE2b-256 d0d5762bb71bc71913b33c918eff48a01078b74079479483ba8d3ce62426d38e

See more details on using hashes here.

File details

Details for the file python_datatable-1.1.3-cp39-cp39-macosx_10_15_x86_64.whl.

File metadata

File hashes

Hashes for python_datatable-1.1.3-cp39-cp39-macosx_10_15_x86_64.whl
Algorithm Hash digest
SHA256 31207ace0134ca9038431093d9b8c07a7d437961118c84cab8fbe23f53b3543e
MD5 a9ef60222fb72d73623b818d0ec37ba3
BLAKE2b-256 17b827b9f1d2dcf96894777ada6421926f37a666990b94a4249901ceb4dac64a

See more details on using hashes here.

File details

Details for the file python_datatable-1.1.3-cp38-cp38-win_amd64.whl.

File metadata

File hashes

Hashes for python_datatable-1.1.3-cp38-cp38-win_amd64.whl
Algorithm Hash digest
SHA256 3a43ac1832091921707d4c5fc5f5fd36be7c59f72b5e4b8e89a00971028452a5
MD5 63a6c1dd2e2de5e55ce7142c3ae36b58
BLAKE2b-256 709523e3dad323fce766cd9e5bbbf85cfde5ceffb9e63c2e1a9988586e1d19d0

See more details on using hashes here.

File details

Details for the file python_datatable-1.1.3-cp38-cp38-manylinux_2_35_x86_64.whl.

File metadata

File hashes

Hashes for python_datatable-1.1.3-cp38-cp38-manylinux_2_35_x86_64.whl
Algorithm Hash digest
SHA256 639936dd14a3a0f2f517b9bc96e4a52693416600c35b00fb71ff7d5e40b9f54a
MD5 c939da765ab16672eb4160d1ff183e57
BLAKE2b-256 cf0f3c760377bfaef07685ec6d7ef6b330b1f6fe4ee0ff592a0548f300a51d9b

See more details on using hashes here.

File details

Details for the file python_datatable-1.1.3-cp38-cp38-macosx_10_15_x86_64.whl.

File metadata

File hashes

Hashes for python_datatable-1.1.3-cp38-cp38-macosx_10_15_x86_64.whl
Algorithm Hash digest
SHA256 bfcddba8a1528782277617bf865416a41e45c2252f92a58991c5afefa6fbcb74
MD5 0995d9feeeb5464088e30365e7e37879
BLAKE2b-256 bf471e5c1239ae802cc6f06aa692d3ec4706c525091268a889a6734323113c81

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page