Skip to main content

Excel 2007+ Binary Workbook (xlsb) parser

Project description

pyxlsb2 (a variant of pyxlsb - https://github.com/wwwiiilll/pyxlsb) is an Excel 2007+ Binary Workbook (xlsb) parser written in Python.

pyxslb2 offers the following improvements/changes in comparison to pyxlsb:

  1. By default, keeps all data in memory instead of creating temporary files. This is mainly to speed up the processing and also not changing the local filesystem during the processing.

  2. relies on both “xl\workbook.bin” and “xl\_rels\workbook.bin.rels” to load locate boundsheets. As a result, it can load all worksheets as well as all macrosheets.

  3. extracts macro formulas:

  • accurately shows the formulas

  • supports A1 addressing

  • supports external addressing (partially implemented))

  1. extracts defined names such as auto_open

Install

  1. Installing the whl file

Download .whl file from the release section

pip install -U [path to whl file]

  1. Installing the latest development

Using pip

pip install -U https://github.com/DissectMalware/pyxlsb2/archive/master.zip

Or download the latest version

wget https://github.com/DissectMalware/pyxlsb2/archive/master.zip

Extract the zip file and go to the extracted directory

python setup.py install –user

Usage

The module exposes an open_workbook(name) method (similar to Xlrd and OpenPyXl) for opening XLSB files. The Workbook object representing the file is returned.

from pyxlsb2 import open_workbook
with open_workbook('Book1.xlsb') as wb:
    # Do stuff with wb

The Workbook object exposes a get_sheet_by_index(idx) and get_sheet_by_name(name) method to retrieve Worksheet instances.

# Using the sheet index (0-based, unlike VBA)
with wb.get_sheet_by_index(0) as sheet:
    # Do stuff with sheet

# Using the sheet name
with wb.get_sheet_by_name('Sheet1') as sheet:
    # Do stuff with sheet

A sheets property containing the sheet names is available on the Workbook instance.

The rows() method will hand out an iterator to read the worksheet rows. The Worksheet object is also directly iterable and is equivalent to calling rows().

# You can use .rows(sparse=False) to include empty rows
for row in sheet.rows():
    print(row)
# [Cell(r=0, c=0, v='TEXT'), Cell(r=0, c=1, v=42.1337)]

NOTE: Iterating the same Worksheet instance multiple times in parallel (nested for for instance) will yield unexpected results, retrieve more instances instead.

Note that dates will appear as floats. You must use the convert_date(date) method from the corresponding Workbook instance to turn them into datetime.

print(wb.convert_date(41235.45578))
# datetime.datetime(2012, 11, 22, 10, 56, 19)

Example

Converting a workbook to CSV:

import csv
from pyxlsb2 import open_workbook

with open_workbook('Book1.xlsb') as wb:
    for name in wb.sheets:
        with wb.get_sheet_by_name(name) as sheet:
            with open(name + '.csv', 'w') as f:
                writer = csv.writer(f)
                for row in sheet.rows():
                    writer.writerow([c.v for c in row])

Limitations

Non exhaustive list of things that are currently not supported:

  • Style and formatting WIP

  • Rich text cells (formatting is lost, but getting the text works)

  • Encrypted (password protected) workbooks

  • Comments and other annotations

  • Writing (out of scope)

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

pyxlsb2-0.0.8.tar.gz (37.0 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

pyxlsb2-0.0.8-py3-none-any.whl (40.1 kB view details)

Uploaded Python 3

File details

Details for the file pyxlsb2-0.0.8.tar.gz.

File metadata

  • Download URL: pyxlsb2-0.0.8.tar.gz
  • Upload date:
  • Size: 37.0 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.6.0 importlib_metadata/4.8.2 pkginfo/1.8.1 requests/2.25.1 requests-toolbelt/0.9.1 tqdm/4.62.3 CPython/3.9.5

File hashes

Hashes for pyxlsb2-0.0.8.tar.gz
Algorithm Hash digest
SHA256 a4700e31ccae2e6fbc5b77d8be7c84c158a5c6608115b615597cb622281ddde0
MD5 a383498ddc97a3836507817cb9508212
BLAKE2b-256 a1a1b1bc1ec79adee6dbd79d8b5594f3597119d8c7e5a560fa3958739e066424

See more details on using hashes here.

File details

Details for the file pyxlsb2-0.0.8-py3-none-any.whl.

File metadata

  • Download URL: pyxlsb2-0.0.8-py3-none-any.whl
  • Upload date:
  • Size: 40.1 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.6.0 importlib_metadata/4.8.2 pkginfo/1.8.1 requests/2.25.1 requests-toolbelt/0.9.1 tqdm/4.62.3 CPython/3.9.5

File hashes

Hashes for pyxlsb2-0.0.8-py3-none-any.whl
Algorithm Hash digest
SHA256 72f46cea74113879b2167a34da3a6eaedccf8d462caf9ee8ed3a15e58f1f5621
MD5 ea62929e07ada7c4aa9f54df8b37a11c
BLAKE2b-256 9456b795235a8840184dd4c35b32a7bcb67a64515571e4af4f89ec00e74121fa

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page