Skip to main content

A project to betterize your csv experience.

Project description

CSV utils

When your statics data is too large with many lines you may want to remove wrong data and bad data. There is others csv modules that help you with that, but csv utils provide functions that made it easier to you, all you need to do is read the documentation and find what you want to do with your csv file and the implement it. This module was developed while I was taking statics class in college and found many csv data files with wrong and bad data that I myself had to filter with python, so why not give it back to community.

Installing

You can install it using pip:

pip3 install csv_utils

Usage

While we still are developing new util functions to use there is plenty of developed functions that may help you.

How to read a file

To read a file all you need to do is import the module and then use the function read_file. It will read the file and return a list of its values.

import csv_util

f = csv_util.read_file('/path/to/file.csv', field_delimiter=';')

You can also read a file in the FileCSV object provided in the module.

from csv_util import csv_file

f = csv_file.FileCSV()
f.read_file('/path/to/file.csv', field_delimiter=';', word_delimiter='"')

The code above creates an FileCSV object that will have data about the header, if have_headers=True in read_file function and the lines. Let's suppose we have the csv file with the content below:

Name Age Occupation
Jonh 16 Student
Jose 35 Professor
Ellen 45 Scientist

Reanding the table above will give us the follwing using the last code:

f
# output> [['Name', 'Age', 'Occupation'], ['Jonh', '16', 'Student'], ['Jose', '35', 'Professor'], ['Ellen', '45', 'Scientist']]

f.headers
# output> ['Name', 'Age', 'Occupation']

f.lines
# output> [['Jonh', '16', 'Student'], ['Jose', '35', 'Professor'], ['Ellen', '45', 'Scientist']]

How to filter content

While dealing with many data you may want to uniform the content, for example, if you are making a research about the most used programming language in your field. You could for example create an online form where the user input its data, some people may insert Python for the language used, other will insert Python3, while other could just insert Py3, while dealing with data you may want to use a statics tool that don't allow you to account in one variable Python, Python3 and Py3. Using this module you can modify the content in the cells where this errors occurs and the use the new file in your research.

To change all the values in a column with the name o Programming Language you can use:

from csv_utils import csv_file

f = csv_file.FileCSV()
f.read_file('/path/to/file.csv')
f.substitute_cells_content(['Py3', 'Python3'], ['Python'], 'Programming Language')
f.save_file()

The code above will make all cells content that are Py3 or Python3 in the column Programming Language into Python and then save the modification.

Other situation could be if you forgot to make the fiel required and get a plenty of data with empty cells. You can filter this lines from your data doing:

from csv_utils import csv_file

f = csv_file.FileCSV()
f.read_file('/path/to/file.csv')
f.remove_lines_with_empty_cells()
f.save_file()

The code above will get all lines with empty cells in any column and delete it from the object, then you save your changes.

NOTES

save_file()

The function save_file() cleans the content of the given file and subscript its content.

CHANGELOG

0.0.1

First version with class CSV_file and no methods implemented for csv_utils besides those in the class.

0.1.0

  • Rename class CSV_file to FileCSV.
  • Refactored functions substitute_cells_content and change_cells_content in FileCSV.
  • Created __str__ and __repr__ functions for FileCSV. Enabling the objects of FileCSV to be printed and reproduced in the interpreter.
  • Added method remove_lines_with_empty_cells which remove all line with empty cells.
  • Added methods to csv_utils module, read_file, which reads a file and returns it as a list, clean_file, which cleans the content from the file, and clean_lines_empty_cells, which cleans all lines with an empty cell.
  • Added method account_occurrences_column which accounts the occurence of values in a column.

  • Fixed methods __str__ and __repr__ of FileCSV.
  • Added method undo_last_change to FileCSV to undo the last change made by one of its methods.
  • Added method create_new_file to FileCSV to save the changes made in a new file.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Filename, size & hash SHA256 hash help File type Python version Upload date
csv_utils-0.1.1-py3-none-any.whl (7.5 kB) Copy SHA256 hash SHA256 Wheel py3
csv_utils-0.1.1.tar.gz (6.2 kB) Copy SHA256 hash SHA256 Source None

Supported by

Elastic Elastic Search Pingdom Pingdom Monitoring Google Google BigQuery Sentry Sentry Error logging AWS AWS Cloud computing DataDog DataDog Monitoring Fastly Fastly CDN SignalFx SignalFx Supporter DigiCert DigiCert EV certificate StatusPage StatusPage Status page