Unfilled description
Project description
Search times
- Searched 1 columns with 431644 rows in 0.26 seconds
- Searched 2 columns with 431644 rows in 0.39 seconds
- Searched 3 columns with 431644 rows in 0.51 seconds
- Searched 4 columns with 431644 rows in 0.61 seconds
- Searched 5 columns with 431644 rows in 0.69 seconds
- Searched 6 columns with 431644 rows in 0.92 seconds
- Searched 7 columns with 431644 rows in 1.04 seconds
- Searched 8 columns with 431644 rows in 1.34 seconds
- Searched 9 columns with 431644 rows in 1.66 seconds
- Searched 10 columns with 431644 rows in 1.83 seconds
- Searched 11 columns with 431644 rows in 1.96 seconds
- Searched 12 columns with 431644 rows in 2.11 seconds
- Searched 13 columns with 431644 rows in 2.29 seconds
- Searched 14 columns with 431644 rows in 2.39 seconds
- Searched 15 columns with 431644 rows in 2.47 seconds
Conversion times
- name start size time
- ignore_country_classification.csv 4257 0.02
- ignore_goods_classification.csv 239619 0.07
- ignore_gsquarterlySeptember20.csv 73824486 20.65
- ignore_services_classification.csv 2828 0.02
- ignore_test.csv 82533516 47.74
End User Manual
How would I like use this code?
Suppose I have a table called people
stored in CTF.
TODO: Define and describe people
table.
$ cat people.csv
names age
Shawheen 21
Julian 20
Clark 34
I want to access a column called names
from this table.
SELECT names FROM people
Assume that people
is a directory containing the CTF data.
import CTF
names = CTF.load_column("people", "names")
TODO: look at load_column
, see what the most common name is for reading / loading data.
How closely can we copy csv
from the standard library?
Use case: it would be great if we could access the data as a stream, without necessarily loading everything in memory.
We can get this feature by having names
be an iterator or generator over the column values.
Example processing names:
from Collections import Counter
counts = Counter(names)
Use case 2 - column types
age = CTF.load_column("people", "age")
age
should generate integer values corresponding to each entry of the age
column.
CTF knows that the age
column means integer because of the metadata file in the people
directory.
TODO: link to W3 standard.
# User should not write this- it's just the idea we want
def create_age():
for x in [21, 20, 34]:
yield x
age = create_age()
# User can do something like this:
>>> list(age)
[21, 20, 34]
Use case 3 - compatibility with csv
import csv
# Referring to file `people.csv` in CSV format
r = csv.reader("people.csv")
# Referring to directory `people` in CTF format
r2 = CTF.reader("people")
r2
should essentially be a drop in replacement for r
.
for row in r:
process(row)
TODO: Process a csv file using Python's csv
package- any kind of data analysis is fine.
For example, find the set of all values in one column.
Python notes
I used this link for helping me construct the iterable. Python special methods W3C metadata
Outline
- Ctf modeled after csv and/or dictionary
- Should Ctf be accessed with a reader like csv or through itself like a dictionary
- Column accessed with ["column_name"]
- Can convert a csv file to ctf
- Reader runs like csv reader returning iterable rows
- class Row to give a guide for adding new columns using values from metadata.json
- Use custom exceptions
- Get type from metadata.json or autodetect
with Ctf.open() as ctf_file:
ctf_file["column"]
for row in ctf_file:
print(row)
Ctf.open()
Ctf.close()
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Hashes for column_text_format-0.0.2-py3-none-any.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | f15dbb7b96554dcaf7cf40e31ef41a4a1b019f4cc27f5a6c984290cf0095ef08 |
|
MD5 | 7e3666fa86d42aa2bc97bb1be2371169 |
|
BLAKE2b-256 | 14658ff9b2d95e5815814e2066bb6bff5909f506ef61859f38adaf6d9bb10cb3 |