Skip to main content

Unfilled description

Project description

Search times

  • Searched 1 columns with 431644 rows in 0.26 seconds
  • Searched 2 columns with 431644 rows in 0.39 seconds
  • Searched 3 columns with 431644 rows in 0.51 seconds
  • Searched 4 columns with 431644 rows in 0.61 seconds
  • Searched 5 columns with 431644 rows in 0.69 seconds
  • Searched 6 columns with 431644 rows in 0.92 seconds
  • Searched 7 columns with 431644 rows in 1.04 seconds
  • Searched 8 columns with 431644 rows in 1.34 seconds
  • Searched 9 columns with 431644 rows in 1.66 seconds
  • Searched 10 columns with 431644 rows in 1.83 seconds
  • Searched 11 columns with 431644 rows in 1.96 seconds
  • Searched 12 columns with 431644 rows in 2.11 seconds
  • Searched 13 columns with 431644 rows in 2.29 seconds
  • Searched 14 columns with 431644 rows in 2.39 seconds
  • Searched 15 columns with 431644 rows in 2.47 seconds

Conversion times

  • name start size time
  • ignore_country_classification.csv 4257 0.02
  • ignore_goods_classification.csv 239619 0.07
  • ignore_gsquarterlySeptember20.csv 73824486 20.65
  • ignore_services_classification.csv 2828 0.02
  • ignore_test.csv 82533516 47.74

End User Manual

How would I like use this code?

Suppose I have a table called people stored in CTF.

TODO: Define and describe people table.

$ cat people.csv
names       age
Shawheen    21
Julian      20
Clark       34

I want to access a column called names from this table.

SELECT names FROM people

Assume that people is a directory containing the CTF data.

import CTF

names = CTF.load_column("people", "names")

TODO: look at load_column, see what the most common name is for reading / loading data. How closely can we copy csv from the standard library?

Use case: it would be great if we could access the data as a stream, without necessarily loading everything in memory. We can get this feature by having names be an iterator or generator over the column values.

Example processing names:

from Collections import Counter

counts = Counter(names)

Use case 2 - column types

age = CTF.load_column("people", "age")

age should generate integer values corresponding to each entry of the age column. CTF knows that the age column means integer because of the metadata file in the people directory. TODO: link to W3 standard.

# User should not write this- it's just the idea we want
def create_age():
    for x in [21, 20, 34]:
        yield x

age = create_age()

# User can do something like this:
>>> list(age)
[21, 20, 34]

Use case 3 - compatibility with csv

import csv

# Referring to file `people.csv` in CSV format
r = csv.reader("people.csv")

# Referring to directory `people` in CTF format
r2 = CTF.reader("people")

r2 should essentially be a drop in replacement for r.

for row in r:
    process(row)

TODO: Process a csv file using Python's csv package- any kind of data analysis is fine. For example, find the set of all values in one column.

Python notes

I used this link for helping me construct the iterable. Python special methods W3C metadata

Outline

  • Ctf modeled after csv and/or dictionary
    • Should Ctf be accessed with a reader like csv or through itself like a dictionary
    • Column accessed with ["column_name"]
    • Can convert a csv file to ctf
    • Reader runs like csv reader returning iterable rows
    • class Row to give a guide for adding new columns using values from metadata.json
    • Use custom exceptions
    • Get type from metadata.json or autodetect
with Ctf.open() as ctf_file:
    ctf_file["column"]
    for row in ctf_file:
        print(row)
Ctf.open()
Ctf.close()

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

column-text-format-0.0.2.tar.gz (3.6 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

column_text_format-0.0.2-py3-none-any.whl (2.9 kB view details)

Uploaded Python 3

File details

Details for the file column-text-format-0.0.2.tar.gz.

File metadata

  • Download URL: column-text-format-0.0.2.tar.gz
  • Upload date:
  • Size: 3.6 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.3.0 pkginfo/1.7.0 requests/2.22.0 setuptools/45.2.0 requests-toolbelt/0.9.1 tqdm/4.59.0 CPython/3.8.5

File hashes

Hashes for column-text-format-0.0.2.tar.gz
Algorithm Hash digest
SHA256 e942e29e91883526bcca9a8b01ca542fbd573fdf0954e6488675d44a47a7ef5e
MD5 0d22bd0c5027a5217702f6ff9cfff6d3
BLAKE2b-256 9d5ecab3a389cf527cb5bab19e89ee58c912e25ec8592051fd0d5d7904a035a3

See more details on using hashes here.

File details

Details for the file column_text_format-0.0.2-py3-none-any.whl.

File metadata

  • Download URL: column_text_format-0.0.2-py3-none-any.whl
  • Upload date:
  • Size: 2.9 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.3.0 pkginfo/1.7.0 requests/2.22.0 setuptools/45.2.0 requests-toolbelt/0.9.1 tqdm/4.59.0 CPython/3.8.5

File hashes

Hashes for column_text_format-0.0.2-py3-none-any.whl
Algorithm Hash digest
SHA256 f15dbb7b96554dcaf7cf40e31ef41a4a1b019f4cc27f5a6c984290cf0095ef08
MD5 7e3666fa86d42aa2bc97bb1be2371169
BLAKE2b-256 14658ff9b2d95e5815814e2066bb6bff5909f506ef61859f38adaf6d9bb10cb3

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page