Skip to main content

Python commented file reader

Project description

A module to access data in text formatted file

In data analysis, data is often stored in text formatted files, where values are written in columns on a text line.

The file may containt comments, usually starting with #, unused or uniteresting columns or multiple files can contain data of intereste.

Consequently, it maybe useful to be have a python shortcut to:

  • read one or several files one after the other
  • or put some files side by side (i.e. append the columns together)
  • filter out commented or empty lines.

This module provides one function that wraps around a file iterator, allowing the file(s) to be read as following :

for one_line in data_file('myfile.txt', comment_prefix='#'):
    print(one_line)

Getting Started

The following instructions will get you a copy of the project up and running on your local machine.

Installing

The module comes with no external dependency, and can easily be installed with the distutils tools of Python.

Get the ascii_data_file.tar.gz file. Then cd to the directory where the file was download and execute the following commands:

tar xvvzf `ascii_data_file-001.tar.gz 
cd `ascii_data_file-001
python3 setup.py install

This will unpack, build, install and test the module.

Testing

You can test the library online with pytest

Dependencies

The module is built with no dependencies.

Usage

The data_file function is defined as follow:

data_file(file_path: Union[str, Sequence[str]],
          returned_columns: Union[str, slice, Sequence[int]] = '*',
          comment_prefix: str = "#",
          separator: Union[None, str] = None,
          returned_type: type = float,
          multi_files_behavior: str = 'append',
          skip_empty_lines: bool = True,
          skip_error_lines: bool = True,
          error_line_warning: bool = True,
          error_line_error: bool = False) -> Generator

It returns a generator filtering out commented lines

The parameters are:

  • file_path (str or list of str), required: the path to the file or files to open
  • returned columns ('*' or slice or list of int), default = ''*': select the columns to return. either '*' for all, a list of indices, or a slice.
  • comment_prefix (str), default = "#": the characters to look for at the start of a commented line.
  • returned_type (type), default = float: the type of data to return.
  • multi_files_behavior (str), default = 'append': what to do when multiple files are given in input. either append or side_by_side
  • skip_empty_lines (bool), default = True: wether to skip empty lines
  • skip_error_lines (bool), default = True: wether to skip files with errorin the processing
  • error_line_warning (bool), default = True: if error lines are not skipped, wether to issue a warning
  • error_line_error (bool), default = True: if error lines are not skipped, wether to raise a RuntimeError when there is a problem reading the line.

For example of usage, go see the test_ascii_data_file.py file in the repository.

Authors

  • Greg Henning - ghenning​.at.​iphc․cnrs․fr

License

This project is licensed under the CeCILL FREE SOFTWARE LICENSE AGREEMENT.

See LICENSE for more.

Project details


Release history Release notifications | RSS feed

This version

1

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

ascii_data_file-1.tar.gz (4.2 kB view hashes)

Uploaded Source

Built Distribution

ascii_data_file-1-py3-none-any.whl (12.8 kB view hashes)

Uploaded Python 3

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page