Skip to main content

A python 3 class for memory-efficient navigation of CSV/Text files.

Project description

CSVNAV: a python 3 class for memory-efficient navigation of CSV/Text files.

This package can be installed with pip:

pip install csvnav

or by downloading this repo and using setup tools:

python setup.py install

run from within the csvnav directory.

The file csvnav.py is a python module containing the class Navigator. When instantiated, Navigator will open a given path and then store pointers to the location of each row in the opened file. In the simplest case, one can use the instantiation sort of like a list. For instance, if I have a file "inventory.csv" containing the following CSV data:

time,product,quantity
5,tire,4
8,sparkplug,20
2,battery,120
10,tire,2
11,tire,3
30,sparkplug,35

I can instantiate the class and query rows by index:

from csvnav import Navigator

nav = Navigator('./inventory.csv', header=True, delimiter=',')
print(nav[0])
print(nav[2])
print(nav.size(force=True))

nav.close()

where the output would be:

{'product': 'tire', 'quantity': '4', 'time': '5'}
{'product': 'battery', 'quantity': '120', 'time': '2'}
6

Note that the number of data rows (excluding any skipped lines and the header) can be printed by calling Navigator.size(force=True). In this case, force=True means that the number of data rows in the file will be determined even if the last row in the file has not be accessed yet. If the last row had been accessed, force=False would return the same result. However, if the last row had not yet been accessed, force=False would return None. Another thing to note is that the rows are returned as a dictionary. As long as Navigator.header contains a list of the column names (done automatically from the first row of the CSV file after any skipped lines when header=True in instantiation or when column names are provided with the Navigator.set_header() method), the rows will be returned as a dictionary. Otherwise, the rows are returned as lists. For example, if "inventory.csv" did not have a header then the output would be:

['5', 'tire', '4']
['2', 'battery', '120']
6

The Navigator class is also iterable and will iterate through rows in order:

for row in nav:
    print(row)

gives the output (assuming we have a header):

{'time': '5', 'product': 'tire', 'quantity': '4'}
{'time': '8', 'product': 'sparkplug', 'quantity': '20'}
{'time': '2', 'product': 'battery', 'quantity': '120'}
{'time': '10', 'product': 'tire', 'quantity': '2'}
{'time': '11', 'product': 'tire', 'quantity': '3'}
{'time': '30', 'product': 'sparkplug', 'quantity': '35'}

If we only want to iterate through a subset of rows that match a condition, we can use the Navigator.filter method:

from csvnav import Navigator

nav = Navigator('./inventory.csv', header=True, delimiter=',')

def when_few_tires(row):
    if row['product'] == 'tire' and int(row['quantity']) <= 3:
        return True
    else:
        return False

for row in nav.filter(when_few_tires):
    print(row)

nav.close()

will produce the output:

{'time': '10', 'product': 'tire', 'quantity': '2'}
{'time': '11', 'product': 'tire', 'quantity': '3'} 

Another usage of the class is to group pointers by column name (assuming Navigator.header is set). This can be done with the Navigator.register method. The following code will then group rows by product and show how this data can be accessed:

from csvnav import Navigator

nav = Navigator('./inventory.csv', header=True, delimiter=',')

nav.register('product') # can also provide a list of columns to register each

print(nav.fields)
print(nav.keys('product'))
for k, v in nav.items('product'):
    print(k, list(v))

nav.close()

will print out the following groups (list of dict or list):

dict_keys(['product'])
dict_keys(['tire', 'sparkplug', 'battery'])
tire [{'time': '5', 'product': 'tire', 'quantity': '4'}, {'time': '10', 'product': 'tire', 'quantity': '2'}, {'time': '11', 'product': 'tire', 'quantity': '3'}]
sparkplug [{'time': '8', 'product': 'sparkplug', 'quantity': '20'}, {'time': '30', 'product': 'sparkplug', 'quantity': '35'}]
battery [{'time': '2', 'product': 'battery', 'quantity': '120'}]

Note that groups are then accessed by two "indexes", namely the column name and the key.

The Navigator class should be thread safe and an instance can be shared between threads. Navigator has some more functionality that I have not described here but this covers the basics. Refer to the docstrings of the various methods of the Navigator class for more information.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

csvnav-0.1.0.tar.gz (10.1 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

csvnav-0.1.0-py3-none-any.whl (10.8 kB view details)

Uploaded Python 3

File details

Details for the file csvnav-0.1.0.tar.gz.

File metadata

  • Download URL: csvnav-0.1.0.tar.gz
  • Upload date:
  • Size: 10.1 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.2.0 pkginfo/1.5.0.1 requests/2.24.0 setuptools/49.2.1 requests-toolbelt/0.9.1 tqdm/4.48.2 CPython/3.6.9

File hashes

Hashes for csvnav-0.1.0.tar.gz
Algorithm Hash digest
SHA256 4777e825c052cafcdf9a39c7055c892955500f19d8f9f1db85ab900a1a21f974
MD5 26e9817d49bf47ac505f9240407a8823
BLAKE2b-256 5270efd2c60aa4536e2bb02142adaeb78cd80de0d35fa2b7ffaa55194413b1cc

See more details on using hashes here.

File details

Details for the file csvnav-0.1.0-py3-none-any.whl.

File metadata

  • Download URL: csvnav-0.1.0-py3-none-any.whl
  • Upload date:
  • Size: 10.8 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.2.0 pkginfo/1.5.0.1 requests/2.24.0 setuptools/49.2.1 requests-toolbelt/0.9.1 tqdm/4.48.2 CPython/3.6.9

File hashes

Hashes for csvnav-0.1.0-py3-none-any.whl
Algorithm Hash digest
SHA256 2fd35c8bfe531e3534534441aee31767484a750105c991ab62e79666fb27eda7
MD5 eaf3fff751ca4b8ad120ca065465b537
BLAKE2b-256 9bd3e1662dd8692dd0727704c9ba4e771f9c040528396263efd04e21a5d7cb79

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page