Skip to main content

Convert xls file to xlsx

Project description

https://img.shields.io/pypi/v/xls2xlsx.svg https://img.shields.io/travis/snoopyjc/xls2xlsx.svg Documentation Status

Convert xls file to xlsx

Features

  • Convert .xls files to .xlsx using xlrd and openpyxl.

  • Convert .htm and .mht files containing tables or excel contents to .xlsx using beautifulsoup4 and openpyxl.

We attempt to support anything that the underlying packages used will support. For example, the following are supported for both input types:

  • Multiple worksheets

  • Text, Numbers, Dates/Times, Unicode

  • Fonts, text color, bold, italic, underline, double underline, strikeout

  • Solid and Pattern Fills with color

  • Borders: Solid, Hair, Thin, Thick, Double, Dashed, Dotted; with color

  • Alignment: Horizontal, Vertical, Rotated, Indent, Shrink To Fit

  • Number Formats, including unicode currency symbols

  • Hidden Rows and Columns

  • Merged Cells

  • Hyperlinks (only 1 per cell)

  • Comments

These features are additionally supported by the .xls input format:

  • Freeze panes

These features are additional supported by the .htm and .mht input formats:

  • Images

Not supported by either format:

  • Conditional Formatting (the current stylings are preserved)

  • Formulas (the calculated values are preserved)

  • Charts (the image of the chart is handled by .htm and .mht input formats)

  • Drawings (the image of the drawing is handled by .htm and .mht input formats)

  • Pivot tables (the current data is preserved)

  • Text boxes (converted to an image by .htm and .mht input formats)

  • Shapes and Clip Art (converted to an image by .htm and .mht input formats)

  • Autofilter (the current filtered out rows are preserved)

  • Rich text in cells (openpyxl doesn’t support this: only styles applied to the entire cell are preserved)

  • Named Ranges

  • Macros (VBA)

Installation

To install xls2xlsx, run this command in your terminal:

$ pip install xls2xlsx

This is the preferred method to install xls2xlsx, as it will always install the most recent stable release.

Usage

To use xls2xlsx from the command line:

$ xls2xlsx [-v] file.xls ...

This will create file.xlsx in the current folder. file.xls can be any .xls, .htm, or .mht file and can also be a URL. The -v flag will print the input and output filename.

To use xls2xlsx in a project:

from xls2xlsx import XLS2XLSX
x2x = XLS2XLSX("spreadsheet.xls")
x2x.to_xlsx("spreadsheet.xlsx")

Alternatively:

from xls2xlsx import XLS2XLSX
x2x = XLS2XLSX("spreadsheet.xls")
wb = x2x.to_xlsx()

The xls2xlsx.to_xlsx method returns the filename given. If no filename is provided, the method returns the openpyxl workbook.

The input file can be in any of the following formats:

  • Excel 97-2003 workbook (.xls)

  • Web page (.htm, .html), optionally including a _Files folder

  • Single file web page (.mht, .mhtml)

The input specified can also be any of the following:

  • A filename / pathname

  • A url

  • A file-like object (opened in Binary mode for .xls and either Binary or Text mode otherwise)

  • The contents of a .xls file as a bytes object

  • The contents of a .htm or .mht file as a str object

Note: The file format is determined by examining the file contents, not by looking at the file extension.

Dependencies

Python >= 3.6 is required.

These packages are also required: xlrd, openpyxl, requests, beautifulsoup4, Pillow, python-dateutil, cssutils, webcolors, currency-symbols, fonttools, PyYAML.

Implementation Notes

The .htm and .mht input format conversion uses ImageFont from Pillow to measure the size (width and height) of cell contents. The first time you use it, it will look for font files in standard places on your system and create a Font Name to filename mapping. If the proper font files are not found on your system corresponding to the fonts used in the input file, then as a backup, an estimation algorithm is used.

If passed a .mht file (or url), the temporary folder name specified in the file will be used to unpack the contents for processing, then this folder will be removed when done.

Credits

Development Lead

Contributors

None yet. Why not be the first?

Acknowledgements

A portion of the code is based on the work of John Ricco (johnricco226@gmail.com), Apr 4, 2017: https://johnricco.github.io/2017/04/04/python-html/

This package was created with Cookiecutter and the audreyr/cookiecutter-pypackage project template.

History

0.2.0 (2023-01-05)

  • Modernize for more recent pythons and more recent packages. Drop support for Python 3.6. Fix issues #11, #14, #16. Add feature #12.

0.1.5 (2020-11-03)

  • Fix issues #1, #3, #5

0.1.4 (2020-11-02)

  • Fix issue #4

0.1.3 (2020-10-15)

  • Fix issue #2 - cli not working

0.1.0 (2020-09-13)

  • First release on PyPI.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

xls2xlsx-0.2.0.tar.gz (1.3 MB view details)

Uploaded Source

Built Distributions

xls2xlsx-0.2.0-py3.10.egg (77.3 kB view details)

Uploaded Source

xls2xlsx-0.2.0-py2.py3-none-any.whl (39.2 kB view details)

Uploaded Python 2 Python 3

File details

Details for the file xls2xlsx-0.2.0.tar.gz.

File metadata

  • Download URL: xls2xlsx-0.2.0.tar.gz
  • Upload date:
  • Size: 1.3 MB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.2 CPython/3.10.2

File hashes

Hashes for xls2xlsx-0.2.0.tar.gz
Algorithm Hash digest
SHA256 98123cb8f43fdd68f4af8d61d7223100d6003daf9a592fa6c0746acbc7314c35
MD5 a7b19e31505a7a98224207e3177b5695
BLAKE2b-256 bbf1cd87cb50c5da52a32f3c8eb268f31f2e0594171a89de69b37a66dc5de0b8

See more details on using hashes here.

File details

Details for the file xls2xlsx-0.2.0-py3.10.egg.

File metadata

  • Download URL: xls2xlsx-0.2.0-py3.10.egg
  • Upload date:
  • Size: 77.3 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.2 CPython/3.10.2

File hashes

Hashes for xls2xlsx-0.2.0-py3.10.egg
Algorithm Hash digest
SHA256 fd32666a187dd29a365d3347d79bb4a83fc3d67d823af454baa66ddec1d010a8
MD5 ac688d8085a58cdd9baaf017d6f63af7
BLAKE2b-256 3187e1903627e92d77d2aad0e882360ff6201e7429f0e115ecf0a3fbd139bced

See more details on using hashes here.

File details

Details for the file xls2xlsx-0.2.0-py2.py3-none-any.whl.

File metadata

  • Download URL: xls2xlsx-0.2.0-py2.py3-none-any.whl
  • Upload date:
  • Size: 39.2 kB
  • Tags: Python 2, Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.2 CPython/3.10.2

File hashes

Hashes for xls2xlsx-0.2.0-py2.py3-none-any.whl
Algorithm Hash digest
SHA256 a6b9c6f887d2e366a54d26682d1ec399f5dbf408567d47768ef6178ef587af4e
MD5 649652f5b5a03dadb34679cc2a9fe5e2
BLAKE2b-256 fcbe8302d331252974200ff4adb392d1fc67e4ff161c85a3109b915f4cbaa1ca

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page