Skip to main content

A package to ease Excel files mass data extraction

Project description

WARNING : DON'T EXPECT SOMETHING USEFULL FROM THIS TOOL AT THIS STAGE !!

xlscrap

xlscrap is a MIT-licensed package to ease Excel files mass data extraction

See the documentation.

Rationale

Have you ever feel the pain of extracting data from a lot of Excel files ?

  • When you have hundreds or thousands file that look similar but differ in slighty annoying details.
  • When data cells coordinates can't be used because they change
  • When you have to spot dozens or hundreds fields with different strategies.
  • When the same field moves in different sheet position or name
  • When the same field label changes
  • When the data cell is on the right of the label or below the label
  • When you need to check that the collected data is correct.

xlscrap helps you to scrap data out of your Excel files.

Quickstart

>>> import xlscrap
>>> s = xlscrap.Scrapper()
>>> s.field('name')
>>> s.field('age')
>>> s.field('address')
>>> s.table('pets', fields=['name', 'breed', 'age'])
>>> s.scrap('excel-files/*.xls*')
looking for 4 fields in 5 files in excel-files/*.xls*,
file 1/5, found 4/4 fields in diana.xlsx
file 2/5, found 4/4 fields in bob.xls
file 3/5, found 3/4 fields in richard.ods
file 4/5, found 0/4 fields in alien.xls
file 5/5, found 4/4 fields in maria.xlsm
>>> s.result
[
    {'name': 'Diana',
    'age': 47,
    'address': '44 rue du Louvre\n75000 Paris\nFrance'
    'pets': []},
    ...
]

TODO

  • set gitlab URL in setup.py
  • clone gitlab/github
  • complete quickstart in README

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

xlscrap-0.1.0.tar.gz (3.4 kB view hashes)

Uploaded Source

Built Distribution

xlscrap-0.1.0-py3-none-any.whl (3.4 kB view hashes)

Uploaded Python 3

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page