This is a pre-production deployment of Warehouse, however changes made here WILL affect the production instance of PyPI.
Latest Version Dependencies status unknown Test status unknown Test coverage unknown
Project Description

A python package to scrape time series from any excel file and return them turned into pandas data frames.

Installation

If you want to install in developer mode, clone the repository and follow these instructions:

If you are using Anaconda as your python distribution

  1. conda create -n xlseries python=2 Create new environment
  2. cd project_directory
  3. source activate xlseries Activate the environment
  4. pip install -e . Install the package in developer mode
  5. pip install -r requirements.txt Install dependencies
  6. deactivate Deactivate when you are done

If you are using a standard python installation

  1. cd project_directory
  2. virtualenv venv Create new environment
  3. source venv/bin/activate Activate the environment
  4. pip install -r requirements.txt Install dependencies
  5. deactivate Deactivate when you are done

If you just want to use it:

pip install xlseries in your environment, instead of cloning and pip installing in developer mode.

Quick start

from xlseries import XlSeries
xl = XlSeries("path_to_excel_file" or openpyxl.Workbook instance)
dfs = xl.get_data_frames("path_to_json_parameters" or parameters_dictionary)

With the test case number 1:

from xlseries import XlSeries
from xlseries.utils.path_finders import get_orig_cases_path, get_param_cases_path

# this will only work if you clone the repo with all the test files
path_to_excel_file = get_orig_cases_path(1)
path_to_json_parameters = get_param_cases_path(1)

xl = XlSeries(path_to_excel_file)
dfs = series.get_data_frames(path_to_json_parameters)

or passing only the critical parameters as a dictionary:

parameters_dictionary = {
    "headers_coord": ["B1","C1"],
    "data_starts": 2,
    "frequency": "M",
    "time_header_coord": "A1"
}
dfs = xl.get_data_frames(parameters_dictionary)

you can specify what worksheet you want to scrape (otherwise the first one will be used):

dfs = xl.get_data_frames(parameters_dictionary, ws_name="my_worksheet")

you can ask an XlSeries object for a template dictionary of the critical parameters you need to fill:

>>> params = xl.critical_params_template()
>>> params
{'data_starts': 2,
 'frequency': 'M',
 'headers_coord': ['B1', 'C1', 'E1-G1'],
 'time_header_coord': 'A1'}
>>> params["headers_coord"] = ["B1","C1"]
>>> dfs = xl.get_data_frames(params, ws_name="my_worksheet")

if this doesn’t work and you want to see exactly where the scraping is failing, you may want to fill out all the parameters and try again to see where the exception is raised:

>>> params = xl.complete_params_template()
>>> params
{'alignment': u'vertical',
 'blank_rows': False,
 'continuity': True,
 'data_ends': None,
 'data_starts': 2,
 'frequency': 'M',
 'headers_coord': ['B1', 'C1', 'E1-G1'],
 'missing_value': [None, '-', '...', '.', ''],
 'missings': False,
 'series_names': None,
 'time_alignment': 0,
 'time_composed': False,
 'time_header_coord': 'A1',
 'time_multicolumn': False}
>>> params["headers_coord"] = ["B1","C1"]
>>> params["data_ends"] = 256
>>> params["missings"] = True
>>> dfs = xl.get_data_frames(params, ws_name="my_worksheet")
  • Excel file: Up to this development point the excel file should not be more complicated than the 7 test cases:
  • Parameters: Together with the excel file, some parameters about the series must be provided. These could be passed to get_data_frames() as a path to a JSON file or as a python dictionary. xlseries use about 14 parameters to characterize the time series of a spreadsheet, but only 4 of them are critical most of the time: the rest can be guessed by the package. The only difference between specifying more or less parameters than the 4 critical is the total time that xlseries will need to complete the task (more parameters, less time). Go to the parameters section for a more detailed explanation about how to use them, and when you need to specify more than the basic 4 (headers_coord, data_starts, frequency and time_header_coord).

Take a look to this ipython notebook template to get started!.

If you want to dig inside the test cases and get an idea of how far is going xlseries at the moment, check out this ipython notebook with the 7 test cases.

For more details go to the official repository on github: https://github.com/abenassi/xlseries

Release History

Release History

0.1.13

This version

History Node

TODO: Figure out how to actually get changelog content.

Changelog content for this version goes here.

Donec et mollis dolor. Praesent et diam eget libero egestas mattis sit amet vitae augue. Nam tincidunt congue enim, ut porta lorem lacinia consectetur. Donec ut libero sed arcu vehicula ultricies a non tortor. Lorem ipsum dolor sit amet, consectetur adipiscing elit.

Show More

0.1.12

History Node

TODO: Figure out how to actually get changelog content.

Changelog content for this version goes here.

Donec et mollis dolor. Praesent et diam eget libero egestas mattis sit amet vitae augue. Nam tincidunt congue enim, ut porta lorem lacinia consectetur. Donec ut libero sed arcu vehicula ultricies a non tortor. Lorem ipsum dolor sit amet, consectetur adipiscing elit.

Show More

0.1.11

History Node

TODO: Figure out how to actually get changelog content.

Changelog content for this version goes here.

Donec et mollis dolor. Praesent et diam eget libero egestas mattis sit amet vitae augue. Nam tincidunt congue enim, ut porta lorem lacinia consectetur. Donec ut libero sed arcu vehicula ultricies a non tortor. Lorem ipsum dolor sit amet, consectetur adipiscing elit.

Show More

0.1.10

History Node

TODO: Figure out how to actually get changelog content.

Changelog content for this version goes here.

Donec et mollis dolor. Praesent et diam eget libero egestas mattis sit amet vitae augue. Nam tincidunt congue enim, ut porta lorem lacinia consectetur. Donec ut libero sed arcu vehicula ultricies a non tortor. Lorem ipsum dolor sit amet, consectetur adipiscing elit.

Show More

0.1.9

History Node

TODO: Figure out how to actually get changelog content.

Changelog content for this version goes here.

Donec et mollis dolor. Praesent et diam eget libero egestas mattis sit amet vitae augue. Nam tincidunt congue enim, ut porta lorem lacinia consectetur. Donec ut libero sed arcu vehicula ultricies a non tortor. Lorem ipsum dolor sit amet, consectetur adipiscing elit.

Show More

0.1.8

History Node

TODO: Figure out how to actually get changelog content.

Changelog content for this version goes here.

Donec et mollis dolor. Praesent et diam eget libero egestas mattis sit amet vitae augue. Nam tincidunt congue enim, ut porta lorem lacinia consectetur. Donec ut libero sed arcu vehicula ultricies a non tortor. Lorem ipsum dolor sit amet, consectetur adipiscing elit.

Show More

0.1.7

History Node

TODO: Figure out how to actually get changelog content.

Changelog content for this version goes here.

Donec et mollis dolor. Praesent et diam eget libero egestas mattis sit amet vitae augue. Nam tincidunt congue enim, ut porta lorem lacinia consectetur. Donec ut libero sed arcu vehicula ultricies a non tortor. Lorem ipsum dolor sit amet, consectetur adipiscing elit.

Show More

0.1.6

History Node

TODO: Figure out how to actually get changelog content.

Changelog content for this version goes here.

Donec et mollis dolor. Praesent et diam eget libero egestas mattis sit amet vitae augue. Nam tincidunt congue enim, ut porta lorem lacinia consectetur. Donec ut libero sed arcu vehicula ultricies a non tortor. Lorem ipsum dolor sit amet, consectetur adipiscing elit.

Show More

0.1.5

History Node

TODO: Figure out how to actually get changelog content.

Changelog content for this version goes here.

Donec et mollis dolor. Praesent et diam eget libero egestas mattis sit amet vitae augue. Nam tincidunt congue enim, ut porta lorem lacinia consectetur. Donec ut libero sed arcu vehicula ultricies a non tortor. Lorem ipsum dolor sit amet, consectetur adipiscing elit.

Show More

0.1.4

History Node

TODO: Figure out how to actually get changelog content.

Changelog content for this version goes here.

Donec et mollis dolor. Praesent et diam eget libero egestas mattis sit amet vitae augue. Nam tincidunt congue enim, ut porta lorem lacinia consectetur. Donec ut libero sed arcu vehicula ultricies a non tortor. Lorem ipsum dolor sit amet, consectetur adipiscing elit.

Show More

0.1.3

History Node

TODO: Figure out how to actually get changelog content.

Changelog content for this version goes here.

Donec et mollis dolor. Praesent et diam eget libero egestas mattis sit amet vitae augue. Nam tincidunt congue enim, ut porta lorem lacinia consectetur. Donec ut libero sed arcu vehicula ultricies a non tortor. Lorem ipsum dolor sit amet, consectetur adipiscing elit.

Show More

Download Files

Download Files

TODO: Brief introduction on what you do with files - including link to relevant help section.

File Name & Checksum SHA256 Checksum Help Version File Type Upload Date
xlseries-0.1.13-py2.7.egg (14.0 MB) Copy SHA256 Checksum SHA256 2.7 Egg Sep 8, 2015
xlseries-0.1.13.tar.gz (13.9 MB) Copy SHA256 Checksum SHA256 Source Sep 8, 2015

Supported By

WebFaction WebFaction Technical Writing Elastic Elastic Search Pingdom Pingdom Monitoring Dyn Dyn DNS HPE HPE Development Sentry Sentry Error Logging CloudAMQP CloudAMQP RabbitMQ Heroku Heroku PaaS Kabu Creative Kabu Creative UX & Design Fastly Fastly CDN DigiCert DigiCert EV Certificate Rackspace Rackspace Cloud Servers DreamHost DreamHost Log Hosting