Skip to main content

List processing with csv files

Project description

lico

CI PyPI PyPI - Python Version Code style: black Checked with mypy

List comb. For quick-and-dirty operations on each row of a csv file. Handles boiler-plate code for IO, error handling printing progress. Optimized for single-use operations on smaller (< millions) csv files in noisy environments.

features

  • Free software: MIT license
  • Read and write CSV files
  • Run custom operations for each row
  • Handles errors and existing results

Installation

pip install lico

Usage

Basic example

from lico.io import Task
from lico.operations import Concatenate

# concatenate column 1 and 2 in input.csv, write to output
Task(input='input.csv', 
     operation=Concatenate(['col1', 'col2']),
     output='output.csv').run()

Defining operations

from lico.core import Operation

# first of all, subclass lico.core.Operation
class MyOperation(Operation):         
    def apply(self, row):
        """This method gets called on each row"""
        old_value = row['column1']           # access values like dict 
        new_value = any_function(old_value)
        return {'new_column': new_value}     # new value(s)
        # 'new_column' is appended to existing columns in output

Skipping rows

There are two ways to tell lico to skip a row.Operation.has_previous_result() and raising RowProcessError

from lico.core import Operation
from lico.exceptions import RowProcessError

class MyOperation(Operation):         
    def apply(self, row):
        if row['col1'] == '0':          
          raise RowProcessError  # Lico will skip current row   
        return {'result':'a_result'}
        
    def has_previous_result(self, row):
      """# If the column 'result' contains anything, skip this"""      
      if row.get('result', None):
        return True   
      else:
        return False

Built-in error handling

Beyond skipping lines with previous results or RowProcessingErrors there are ways in which lico makes processing more robust:

  • Trying to access a non-existent column in Operation.apply() will yield an error and automatically skip that row
  • Output of Task.run() will always have the same number of rows as the input. If an unhandled exception occurs during Task.run(), lico will stop processing but still write all results obtained so far. The unprocessed rows will be in the output unmodified.

Logging

Lico uses the root logger lico. To print log messages put this in your code:

import logging

logging.basicConfig(level=logging.DEBUG)

CSV structure

The idea is to keep CSVs as simple and unambiguous as possible. Therefore:

  • All csv values are text. No interpreting things as ints. Too many operations have been messed up by truncating leading zeros etc.
  • csv row headers are required and are considered unique keys

Why?

Situations in which lico might speed up your work:

  • I've got a Here is a csv file of (~1000) rows including legacy id
  • Can we find new id for each of these legacy ids and also add datapoint based on new id?
  • We don't know whether legacy id is valid in all cases. Or at all.
  • This whole procedure is just to 'get an idea'. Just for exploration

There are many ways to approach this. Mine is usually to get rid of excel by parsing the data into a flat csv file and then using a combination of a text editor and bash magic for merging, sorting. Intermediate steps are saved for auditing.

However, for certain operations such as interacting with servers this is not enough. I then tend to use python. This is more powerful but also creates overhead. Many of these tasks are single-use. Each time I have to slighty modify the same code: read in csv, do something, handle errors, write output.

lico tries to get rid of that boiler plate code as much as possible.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

lico-0.1.3.tar.gz (9.9 kB view details)

Uploaded Source

Built Distribution

lico-0.1.3-py3-none-any.whl (9.4 kB view details)

Uploaded Python 3

File details

Details for the file lico-0.1.3.tar.gz.

File metadata

  • Download URL: lico-0.1.3.tar.gz
  • Upload date:
  • Size: 9.9 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: poetry/1.1.13 CPython/3.9.9 Linux/5.13.0-52-generic

File hashes

Hashes for lico-0.1.3.tar.gz
Algorithm Hash digest
SHA256 181022e0080ee0e86f6731deff358f8f4f975948b66a064c7e28b3130e2354fa
MD5 6a1cee18d1198941e81cfcb35c5e36e8
BLAKE2b-256 baf3cf8d003978b5ecb9f7b07f8064d2ec240faef5245e382e298cba985e6adf

See more details on using hashes here.

File details

Details for the file lico-0.1.3-py3-none-any.whl.

File metadata

  • Download URL: lico-0.1.3-py3-none-any.whl
  • Upload date:
  • Size: 9.4 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: poetry/1.1.13 CPython/3.9.9 Linux/5.13.0-52-generic

File hashes

Hashes for lico-0.1.3-py3-none-any.whl
Algorithm Hash digest
SHA256 7b7092cf40d73a82d27e9d929876caaaa7c7e35368d516a65d7fc0c461e4c3e4
MD5 70960b687d7fb12ed18e3d83281e3df8
BLAKE2b-256 518544b72d89531252083b64dd4b4e0ffa7bff1023c698105279dd06df2b9ff1

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page