Skip to main content
Join the official 2019 Python Developers SurveyStart the survey!

A package to ease Excel files mass data extraction

Project description

WARNING : DON'T EXPECT SOMETHING USEFULL FROM THIS TOOL AT THIS STAGE !!

xlscrap

xlscrap is a MIT-licensed package to ease Excel files mass data extraction

See the documentation.

Rationale

Have you ever feel the pain of extracting data from a lot of Excel files ?

  • When you have hundreds or thousands file that look similar but differ in slighty annoying details.
  • When data cells coordinates can't be used because they change
  • When you have to spot dozens or hundreds fields with different strategies.
  • When the same field moves in different sheet position or name
  • When the same field label changes
  • When the data cell is on the right of the label or below the label
  • When you need to check that the collected data is correct.

xlscrap helps you to scrap data out of your Excel files.

Quickstart

>>> import xlscrap
>>> s = xlscrap.Scrapper()
>>> s.field('name')
>>> s.field('age')
>>> s.field('address')
>>> s.table('pets', fields=['name', 'breed', 'age'])
>>> s.scrap('excel-files/*.xls*')
looking for 4 fields in 5 files in excel-files/*.xls*,
file 1/5, found 4/4 fields in diana.xlsx
file 2/5, found 4/4 fields in bob.xls
file 3/5, found 3/4 fields in richard.ods
file 4/5, found 0/4 fields in alien.xls
file 5/5, found 4/4 fields in maria.xlsm
>>> s.result
[
    {'name': 'Diana',
    'age': 47,
    'address': '44 rue du Louvre\n75000 Paris\nFrance'
    'pets': []},
    ...
]

TODO

  • set gitlab URL in setup.py
  • clone gitlab/github
  • complete quickstart in README

Project details


Release history Release notifications

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Files for xlscrap, version 0.1.0
Filename, size File type Python version Upload date Hashes
Filename, size xlscrap-0.1.0-py3-none-any.whl (3.4 kB) File type Wheel Python version py3 Upload date Hashes View hashes
Filename, size xlscrap-0.1.0.tar.gz (3.4 kB) File type Source Python version None Upload date Hashes View hashes

Supported by

Elastic Elastic Search Pingdom Pingdom Monitoring Google Google BigQuery Sentry Sentry Error logging AWS AWS Cloud computing DataDog DataDog Monitoring Fastly Fastly CDN SignalFx SignalFx Supporter DigiCert DigiCert EV certificate StatusPage StatusPage Status page