Data library focusing on pure python data structures and Excel interaction
Project description
Managing tabular data shouldn't be complicated.
If values are stored in a matrix, it should't be any harder to iterate or modify
than a normal list. One enhancement, however, would be to have row values accessible
by column names instead of by integer indeces
eg,
for row in m:
row.header
for row in m:
row[17] # did i count those columns correctly?
Two naive solutions for this are:
1) convert rows to a dictionaries
Using duplicate dictionary instances for every row has a high memory
footprint, and makes accessing values by index more complicated
eg,
[{'col_a': 1.0, 'col_b': 'b', 'col_c': 'c'},
{'col_a': 1.0, 'col_b': 'b', 'col_c': 'c'}]
2) convert rows to namedtuples
Named tuples do not have per-instance dictionaries, so they are
lightweight and require no more memory than regular tuples,
but their values are read-only
Another possibility would be to store the values in column-major order,
like in a database. This has a further advantage in that all values
in the same column are usually of the same data type, allowing them to
be stored more efficiently
eg,
row-major order
[['coi_a', 'col_b', 'col_c'],
[1.0, 'b', 'c'],
[1.0, 'b', 'c'],
[1.0, 'b', 'c']]
column-major order
{'col_a': [1.0, 1.0, 1.0],
'col_a': ['b', 'b', 'b'],
'col_a': ['c', 'c', 'c']}
This is essentially what a pandas DataFrame is. The drawback to this
is a major conceptual overhead.
***********************************************************************************
* Intuitively, each row is some entity, each column is a property of that row *
***********************************************************************************
* the first thing everyone looks up for a DataFrame is "how to iterate rows",
the first thing the documentation says is "I hope you never have to use this"
* DataFrames have some great features, but also require specialized
syntax that can get very awkward and requires a lot of memorization
The flux_cls attempts to balance intuitive iteration with performance
it has the following attributes:
* row-major order
* named attributes on rows (that are efficiently updated)
* value mutability on rows
* light memory footprint
If values are stored in a matrix, it should't be any harder to iterate or modify
than a normal list. One enhancement, however, would be to have row values accessible
by column names instead of by integer indeces
eg,
for row in m:
row.header
for row in m:
row[17] # did i count those columns correctly?
Two naive solutions for this are:
1) convert rows to a dictionaries
Using duplicate dictionary instances for every row has a high memory
footprint, and makes accessing values by index more complicated
eg,
[{'col_a': 1.0, 'col_b': 'b', 'col_c': 'c'},
{'col_a': 1.0, 'col_b': 'b', 'col_c': 'c'}]
2) convert rows to namedtuples
Named tuples do not have per-instance dictionaries, so they are
lightweight and require no more memory than regular tuples,
but their values are read-only
Another possibility would be to store the values in column-major order,
like in a database. This has a further advantage in that all values
in the same column are usually of the same data type, allowing them to
be stored more efficiently
eg,
row-major order
[['coi_a', 'col_b', 'col_c'],
[1.0, 'b', 'c'],
[1.0, 'b', 'c'],
[1.0, 'b', 'c']]
column-major order
{'col_a': [1.0, 1.0, 1.0],
'col_a': ['b', 'b', 'b'],
'col_a': ['c', 'c', 'c']}
This is essentially what a pandas DataFrame is. The drawback to this
is a major conceptual overhead.
***********************************************************************************
* Intuitively, each row is some entity, each column is a property of that row *
***********************************************************************************
* the first thing everyone looks up for a DataFrame is "how to iterate rows",
the first thing the documentation says is "I hope you never have to use this"
* DataFrames have some great features, but also require specialized
syntax that can get very awkward and requires a lot of memorization
The flux_cls attempts to balance intuitive iteration with performance
it has the following attributes:
* row-major order
* named attributes on rows (that are efficiently updated)
* value mutability on rows
* light memory footprint
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
vengeance-1.0.10.tar.gz
(34.4 kB
view hashes)