rowgenerators

Generate row data from a variety of file formats

These details have not been verified by PyPI

Project links

Homepage

Development Status
- 4 - Beta
Environment
- Web Environment
Intended Audience
- Developers
License
- OSI Approved :: MIT License
Operating System
- OS Independent
Programming Language
- Python
- Python :: 3.5
Topic
- Software Development :: Debuggers
- Software Development :: Libraries :: Python Modules

Project description

Row Generators

Application Urls

https://travis-ci.org/Metatab/appurl.svg?branch=master

Application Urls provide structure and operations on URLS where the file the URL refers to can’t, in general, simply be downloaded. For instance, you may want to refer to a CSV file inside a ZIP archive, or a worksheet in an Excel file. In conjunction with Row Generators, Application Urls are often used to refer to tabular data stored on data repositories. For instance:

Stored on the web: http://examples.com/file.csv
Inside a zip file on the web: http://example.com/archive.zip#file.csv
A worksheet in an Excel file: http://example.com/excel.xls#worksheet
A worksheet in an Excel file in a ZIP Archive: http://example.com/archive.zip#excel.xls;worksheet
An API: socrata+http://chhs.data.ca.gov/api/views/tthg-z4mf

Install

$ pip install appurl

Documentation

See the documentation at http://row-generators.readthedocs.io/

Development Notes

Running tests

Run python setup.py tests to run normal development tests. You can also run tox, which will try to run the tests with python 3.4, 3.5 and 3.6, ignoring non-existent interpreters.

Development Testing with Docker

Testing during development for other versions of Python is a bit of a pain, since you have to install the alternate version, and Tox will run all of the tests, not just the one you want.

One way to deal with this is to install Docker locally, then run the docker test container on the source directory. This is done automatically from the Makefile in appurl/tests

$ cd ./docker
$ make build # to create the container image
$ make shell # to run bash the container

You now have a docker container where the /code directory is the appurl source dir.

Now, run tox to build the tox virtual environments, then enter the specific version you want to run tests for and activate the virtual environment.

# tox
# cd .tox/py34
# source bin/activate # Activate the python 3.4 virtual env
# cd ../../
# python setup.py test # Cause test deps to get installed
#
# python -munittest appurl.test.test_basic.BasicTests.test_url_classes  # Run one test

Row Data Pipeline

The Rowpipe library manages row-oriented data transformers. Clients can create a RowProcessor() that has schema, composed of tables and columns, where each column cna have a “transform” that describes how to alter the data in the column.

from rowpipe.table import Table
from rowpipe.processor import RowProcessor

def doubleit(v):
    return int(v) * 2

env = {
    'doubleit': doubleit
}

t = Table('foobar')
t.add_column('id', datatype='int')
t.add_column('other_id', datatype='int', transform='^row.a')
t.add_column('i1', datatype='int', transform='^row.a;doubleit')
t.add_column('f1', datatype='float', transform='^row.b;doubleit')
t.add_column('i2', datatype='int', transform='^row.a')
t.add_column('f2', datatype='float', transform='^row.b')

In this table definition, other_id and i2 columns are initialized to the valu of the a column in the input row, The i1 column is initialized to the input row a column, then the doubleit function is called on the value. In the last step, all of the values are cast to the types specified in the datatype column.

The RowProcessor is then run using this table definition, and an input generator:

class Source(object):

    headers = 'a b'.split()

    def __iter__(self):
        for i in range(N):
            yield i, 2*i

rp = RowProcessor(Source(), t, env=env)

Then, rp is a generator that returns RowProxy objects, which can be indexed as integers or by clolumn number:

for row in rp:
    v1 = row['f1']
    v2 = row[3]

The RowProcessor creates Python code files and executes them.

Transforms can have several steps, seperated by ‘;’. The first, prefixes with a ‘^’, initializes the value for the rest of the transforms. A transform that is prefixes with a ‘!’ is executed on exceptions. Transform functions can have a variable signature; the tranform processor matches argument names. Valid argument names are:

row. A rowProxy object for the input row. Allows access to any input row value
row_n. Row number.
scratch. A dict for temporary storage
errors. A defaultdict(set) for storing error reports for columns. Keys are column names
accumulator. A dict for accumulating value, such as sums.
pipe. Unused
bundle. Unused
source. Reference to the input generator that is generating rows
v . The input row value
header_s. The header for the column in the input row.
i_s. The index of the column in the input row
header_d. The header for the column in the output row.
i_d. The index of the column in the output row

… and there is a whole lot more. This documentation is woefully incomplete …

Notes

This repo still contains old code for Row Pipelines, which are in the pipeline.py file. These components can be combined to performd defined operations on rows, such as skipping rows based on a predicate, altering the number of rows, returning on ly the head or tail, etc. The code is not currently used ot tested.

Project details

These details have not been verified by PyPI

Project links

Homepage

Development Status
- 4 - Beta
Environment
- Web Environment
Intended Audience
- Developers
License
- OSI Approved :: MIT License
Operating System
- OS Independent
Programming Language
- Python
- Python :: 3.5
Topic
- Software Development :: Debuggers
- Software Development :: Libraries :: Python Modules

Release history Release notifications | RSS feed

This version

0.9.26

Oct 19, 2023

0.9.24

Nov 9, 2022

0.9.23

Nov 9, 2022

0.9.21

Feb 9, 2022

0.9.20

Oct 11, 2021

0.9.19

Mar 4, 2021

0.9.18

Jan 9, 2021

0.9.17

Nov 24, 2020

0.9.16

Oct 15, 2020

0.9.14

Sep 11, 2020

0.9.13

Jun 24, 2020

0.9.12

Jun 22, 2020

0.9.11

Jun 14, 2020

0.9.10

May 26, 2020

0.9.8

Apr 22, 2020

0.9.7

Mar 6, 2020

0.9.6

Nov 19, 2019

0.9.5

Jul 12, 2019

0.9.3

May 20, 2019

0.9.1

May 20, 2019

0.9.0

May 19, 2019

0.8.31

May 5, 2019

0.8.30

Mar 20, 2019

0.8.26

Jan 16, 2019

0.8.25

Dec 31, 2018

0.8.23

Dec 12, 2018

0.8.19

Oct 20, 2018

0.8.17

Sep 18, 2018

0.8.16

Sep 18, 2018

0.8.15

Sep 18, 2018

0.8.14

Sep 18, 2018

0.8.13

Sep 13, 2018

0.8.12

Sep 5, 2018

0.8.11

Aug 9, 2018

0.8.10

Jul 26, 2018

0.8.8

Jul 19, 2018

0.8.7

Jul 13, 2018

0.8.6

Jul 12, 2018

0.8.5

May 30, 2018

0.8.3

May 14, 2018

0.8.1

May 9, 2018

0.7.29

May 9, 2018

0.7.28

May 4, 2018

0.7.27

May 4, 2018

0.7.26

May 4, 2018

0.7.24

Apr 1, 2018

0.7.23

Apr 1, 2018

0.7.17

Feb 23, 2018

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

rowgenerators-0.9.26.tar.gz (19.6 MB view details)

Uploaded Oct 19, 2023 Source

File details

Details for the file rowgenerators-0.9.26.tar.gz.

File metadata

Download URL: rowgenerators-0.9.26.tar.gz
Upload date: Oct 19, 2023
Size: 19.6 MB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/4.0.2 CPython/3.9.18

File hashes

Hashes for rowgenerators-0.9.26.tar.gz
Algorithm	Hash digest
SHA256	`93fc93caebfe545ca50a67dcff4339de8d3ade33d03458d680502f775781c9db`
MD5	`87beebc98142ac1a8b9cd342dea883c5`
BLAKE2b-256	`9a4f81f36a823551e2785d248c16d3c611a0c5ca37fc9769110d7edaa1274fd3`

See more details on using hashes here.

rowgenerators 0.9.26

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

Row Generators

Install

Documentation

Development Notes

Running tests

Development Testing with Docker

Row Data Pipeline

Notes

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

File details

File metadata

File hashes