Skip to main content

Read and write Apple Numbers spreadsheets

Project description

numbers-parser

build: build: codecov

numbers-parser is a Python module for parsing Apple Numbers .numbers files. It supports Numbers files generated by Numbers version 10.3, and up with the latest tested version being 13.0 (current as of April 2023).

It supports and is tested against Python versions from 3.8 onwards. It is not compatible with earlier versions of Python.

Formula evaluation relies on Numbers storing current values which should usually be the case. Formulas themselves rather than the computed values can optionally be extracted. Style support is somewhat limited, but has grown significantly as of version 4.0.

API changes in version 4.0

To better partition cell styles, background image data which was supported in earlier versions through the methods image_data and image_filename is now part of the new cell_style property. Using the deprecated methods image_data and image_filename will issue a DeprecationWarning if used.The legacy methods will be removed in a future version of numbers-parser.

NumberCell cell types return decimal.Decimal types rather than float. These are created in a decimal128 context to preserve the precision used by Numbers. Previously, using float resulted in rounding errors in unpacking internal numbers.

Installation

python3 -m pip install numbers-parser

A pre-requisite for this package is python-snappy which will be installed by Python automatically, but python-snappy also requires that the binary libraries for snappy compression are present.

The most straightforward way to install the binary dependencies is to use Homebrew and source Python from Homebrew rather than from macOS as described in the python-snappy github:

For Intel Macs:

brew install snappy python3
CPPFLAGS="-I/usr/local/include -L/usr/local/lib" python3 -m pip install python-snappy

And on Apple Silicon:

brew install snappy python3
CPPFLAGS="-I/opt/homebrew/include -L/opt/homebrew/lib" python3 -m pip install python-snappy

On Windows, you will need to either arrange for snappy to be found for VSC++ or you can install python binary libraries compiled by Christoph Gohlke. You must select the correct python version for your installation. For example for python 3.11:

C:\Users\Jon>pip install C:\Users\Jon\Downloads\python_snappy-0.6.1-cp311-cp311-win_amd64.whl

Usage

Reading documents:

from numbers_parser import Document
doc = Document("my-spreadsheet.numbers")
sheets = doc.sheets
tables = sheets[0].tables
rows = tables[0].rows()

Referring to sheets and tables

Both sheets and names can be accessed from lists of these objects using an integer index (list syntax) and using the name of the sheet/table (dict syntax):

# list access method
sheet_1 = doc.sheets[0]
print("Opened sheet", sheet_1.name)

# dict access method
table_1 = sheets["Table 1"]
print("Opened table", table_1.name)

Accessing data

Table objects have a rows method which contains a nested list with an entry for each row of the table. Each row is itself a list of the column values. Empty cells in Numbers are returned as None values.

data = sheets["Table 1"].rows()
print("Cell A1 contains", data[0][0])
print("Cell C2 contains", data[2][1])

Cells are objects with a common base class of Cell. All cell types have a property value which returns the contents of the cell in as a native Python datatype. DurationCell object values are datetime.timedelta objects which are additionally available as a formatted value matching that stored in the Numbers spreadsheet. The formatted value is returned using the formatted_value property.

Cell references

In addition to extracting all data at once, individual cells can be referred to as methods

doc = Document("my-spreadsheet.numbers")
sheets = doc.sheets
tables = sheets["Sheet 1"].tables
table = tables["Table 1"]

# row, column syntax
print("Cell A1 contains", table.cell(0, 0))
# Excel/Numbers-style cell references
print("Cell C2 contains", table.cell("C2"))

Merged cells

When extracting data using rows() merged cells are ignored since only text values are returned. The cell() method of Table objects returns a Cell type object which is typed by the type of cell in the Numbers table. MergeCell objects indicates cells removed in a merge.

doc = Document("my-spreadsheet.numbers")
sheets = doc.sheets
tables = sheets["Sheet 1"].tables
table = tables["Table 1"]

cell = table.cell("A1")
print(cell.merge_range)
print(f"Cell A1 merge size is {cell.size[0]},{cell.size[1]})

Row and column iterators

Tables have iterators for row-wise and column-wise iteration with each iterator returning a list of the cells in that row or column

for row in table.iter_rows(min_row=2, max_row=7, values_only=True):
    sum += row
for col in table.iter_cols(min_row=2, max_row=7):
    sum += col.value

Pandas

Since the return value of data() is a list of lists, you can pass this directly to pandas. Assuming you have a Numbers table with a single header which contains the names of the pandas series you want to create you can construct a pandas dataframe using:

import pandas as pd

doc = Document("simple.numbers")
sheets = doc.sheets
tables = sheets[0].tables
data = tables[0].rows(values_only=True)
df = pd.DataFrame(data[1:], columns=data[0])

Bullets and lists

Cells that contain bulleted or numbered lists can be identified by the is_bulleted property. Data from such cells is returned using the value property as with other cells, but can additionally extracted using the bullets property. bullets returns a list of the paragraphs in the cell without the bullet or numbering character. Newlines are not included when bullet lists are extracted using bullets.

doc = Document("bullets.numbers")
sheets = doc.sheets
tables = sheets[0].tables
table = tables[0]
if not table.cell(0, 1).is_bulleted:
    print(table.cell(0, 1).value)
else:
    bullets = ["* " + s for s in table.cell(0, 1).bullets]
    print("\n".join(bullets))

Bulleted and numbered data can also be extracted with the bullet or number characters present in the text for each line in the cell in the same way as above but using the formatted_bullets property. A single space is inserted between the bullet character and the text string and in the case of bullets, this will be the Unicode character seen in Numbers, for example "• some text".

Hyperlinks

Numbers does not support hyperlinks to cells within a spreadsheet, but does allow embedding links in cells. When cells contain hyperlinks, numbers_parser returns the text version of the cell. The hyperlinks property of cells where is_bulleted is True is a list of text and URL tuples:

cell = table.cell(0, 0)
(text, url) = cell.hyperlinks[0]

Styles

numbers_parser currently only supports paragraph styles and cell styles. The following paragraph styles are suppoprted:

  • font attributes: bold, italic, underline, strikethrough
  • font selection and size
  • text foreground color
  • horizontal and vertical alignment
  • cell background color

Table styles that allow new tables to adopt a style across the whole table are not planned.

Reading styles

The cell method style returns a Style object containing all the style information for that cell. Cells with identical style settings contain references to a single style object.

Cell text fonts can be returned using a number of methods.

  • Cell.style.alignment: the horizontal and vertical alignment of the cell as an Alignment names tuple
  • Cell.style.bg_color: cell background color as an RGB named tuple, or a list of RGB values for gradients
  • Cell.style.bold: True if the cell font is bold
  • Cell.style.font_color: font color as an RGB named tuple
  • Cell.style.font_size: font size in points (float)
  • Cell.style.font_name: font name (str)
  • Cell.style.italic: True if the cell font is italic
  • Cell.style.name: cell style (str)
  • Cell.style.underline: True if the cell font is underline
  • Cell.style.strikethrough: True if the cell font is strikethrough

Cell images

The methods style.bg_image.filename and style.bg_image.data return data about the image used for a cell's background, where set. If a cell has no background image, style.bg_image is None.

cell = table.cell("B1")
with open (cell.style.bg_image.filename, "wb") as f:
    f.write(cell.style.bg_image.data)

Writing Numbers files

Whilst support for writing numbers files has been stable since version 3.4.0, you are highly recommened not to overwrite working Numbers files and instead save data to a new file.

Limitations

Current limitations to write support are:

  • Creating cells of type BulletedTextCell is not supported
  • Formats cannot be defined for DurationCell or DateCell
  • New tables are inserted with a fixed offset below the last table in a worksheet which does not take into account title or caption size
  • New sheets insert tables with formats copied from the first table in the previous sheet rather than default table formats
  • Style editing is limited to paragraph styles.

Cell values

numbers-parser will automatically empty rows and columns for any cell references that are out of range of the current table. The write method accepts the same cell numbering notation as cell plus an additional argument representing the new cell value. The type of the new value will be used to determine the cell type.

doc = Document("old-sheet.numbers")
sheets = doc.sheets
tables = sheets[0].tables
table = tables[0]
table.write(1, 1, "This is new text")
table.write("B7", datetime(2020, 12, 25))
doc.save("new-sheet.numbers")

Sheet names and table names can be changed by assigning a new value to the name of each:

sheets[0].name = "My new sheet"
tables[0].name = "Edited table"

Adding tables and sheets

Additional tables and worksheets can be added to a Document before saving. If no sheet name or table name is supplied, numbers-parser will use Sheet 1, Sheet 2, etc.

doc = Document()
doc.add_sheet("New Sheet", "New Table")
sheet = doc.sheets["New Sheet"]
table = sheet.tables["New Table"]
table.write(1, 1, 1000)
table.write(1, 2, 2000)
table.write(1, 3, 3000)

doc.save("sheet.numbers")

Table geometries

numbers-parser can query and change the position and size of tables. Changes made to a table's row height or column width is retained when files are saved.

 Row and column sizes

Row heights and column widths are queried and set using the row_height and col_width methods:

doc = Document("sheet.numbers")
table = doc.sheets[0].tables[0]
print(f"Table size is {table.height} x {table.width}")
print(f"Table row 1 height is {table.row_height(0)}")
table.row_height(0, 40)
print(f"Table row 1 height is now {table.row_height(0)}")
print(f"Table column A width is {table.col_width(0)}")
table.col_width(0, 200)
print(f"Table column A width is {table.col_width(0)}")

 Header row and columns

When new tables are created, numbers-parser follows the Numbers convention of creating a table with one row header and one column header. You can change the number of headers by modifying the appopriate property:

doc = Document("sheet.numbers")
table = doc.sheets[0].tables[0]
table.num_header_rows = 2
table.num_header_cols = 0
doc.save("saved.numbers")

A zero header count will remove the headers from the table. Attempting to set a negative number of headers, or using more headers that rows or columns in the table will raise a ValueError exception.

Positioning tables

By default, new tables are positioned at a fixed offset below the last table vertically in a sheet and on the left side of the sheet. Large table headers and captions may result in new tables overlapping existing ones. The add_table method takes optional coordinates for positioning a table. A table's height and coordinates can also be queried to help aligning new tables:

(x, y) = sheet.table[0].coordinates
y += sheet.table[0].height + 200.0
new_table = sheet.add_table("Offset Table", x, y)

Editing paragraph styles

Cell text styles, known as paragraph styles, are those applied by the Text tab in Numbers Format pane. To simplify the API, when writing documents, it is not possible to make ad hoc changes to cells without assigning an existing style or creating a new one. This differs to the Numbers interface where cells can have modified styles on a per cell basis. Such styles are read correctly when reading Numbers files.

Character styles, which allow formatting changes within cells such as "This is bold text" are not supported.

Styles are created using the Document's add_style method, and can be applied to cells either as part of a write or using set_cell_style:

red_text = doc.add_style(
    name="Red Text",
    font_name="Lucida Grande",
    font_color=RGB(230, 25, 25),
    font_size=14.0,
    bold=True,
    italic=True,
    alignment=Alignment("right", "top"),
)
table.write("B2", "Red", style=red_text)
table.set_cell_style("C2", red_text)

Cell styles can also be referred to by name in both Table.write and Table.set_cell_style. A dict of available styles is returned by Document.styles. This contains key value pairs of style names and Style objects. Any changes to Style objects in the document are written back such that those styles are changed for all cells that use them.

doc = Document("styles.numbers")
styles = doc.styles
styles["Title"].font_size = 20.0

Since Style objects are shared, changing Cell.style.font_size will have the effect of changing the font size for that style and will in turn affect the styles of all cells using that style.

Command-line scripts

When installed from PyPI, a command-like script cat-numbers is installed in Python's scripts folder. This script dumps Numbers spreadsheets into Excel-compatible CSV format, iterating through all the spreadsheets passed on the command-line.

usage: cat-numbers [-h] [-T | -S | -b] [-V] [--debug] [--formulas]
                   [--formatting] [-s SHEET] [-t TABLE] [document ...]

Export data from Apple Numbers spreadsheet tables

positional arguments:
  document                 Document(s) to export

optional arguments:
  -h, --help               show this help message and exit
  -T, --list-tables        List the names of tables and exit
  -S, --list-sheets        List the names of sheets and exit
  -b, --brief              Don't prefix data rows with name of sheet/table (default: false)
  -V, --version
  --debug                  Enable debug output
  --formulas               Dump formulas instead of formula results
  --formatting             Dump formatted cells (durations) as they appear in Numbers
  -s SHEET, --sheet SHEET  Names of sheet(s) to include in export
  -t TABLE, --table TABLE  Names of table(s) to include in export

Note: --formatting will return different capitalisation for 12-hour times due to differences between Numbers' representation of these dates and datetime.strftime. Numbers in English locales displays 12-hour times with 'am' and 'pm', but datetime.strftime on macOS at least cannot return lower-case versions of AM/PM.

Numbers File Formats

Numbers uses a proprietary, compressed binary format to store its tables. This format is comprised of a zip file containing images, as well as Snappy-compressed Protobuf .iwa files containing metadata, text, and all other definitions used in the spreadsheet.

Protobuf updates

As numbers-parser includes private Protobuf definitions extracted from a copy of Numbers, new versions of Numbers will inevitably create .numbers files that cannot be read by numbers-parser. As new versions of Numbers are released, running make bootstrap will perform all the steps necessary to recreate the protobuf files used numbers-parser to read Numbers spreadsheets.

The default protobuf package installation may not include the C++ optimised version which is required by the bootstrapping scripts to extract protobufs. You will receive the following error during build if this is the case:

This script requires the Protobuf installation to use the C++ implementation. Please reinstall Protobuf with C++ support.

To include the C++ support, download a released version of Google protobuf from github. Build instructions are described in src/README.md.These have changed greatly over time, but as of April 2023, this was useful:

bazel build :protoc :protobuf
cmake . -DCMAKE_CXX_STANDARD=14
cmake --build . --parallel 8
export PROTOCOL_BUFFERS_PYTHON_IMPLEMENTATION=cpp
export LD_LIBRARY_PATH=../bazel-bin/src/google
cd python
python3 setup.py -q bdist_wheel --cpp_implementation --warnings_as_errors --compile_static_extension

This can then be used make bootstrap in the numbers-parser source tree. The signing workflow assumes that you have an Apple Developer Account and that you have created provisioning profile that includes iCloud. Using a self-signed certificate does not seem to work, at least on Apple Silicon (a working PR contradicting this is greatly appreciated).

make bootstrap requires PyObjC to genetrate font maps, but this dependency is excluded from Poetry to ensure that tests can run on non-Mac OSes. You can run poetry run pip install PyObjC to get the required packages.

Credits

numbers-parser was built by Jon Connell but relies heavily on from prior work by Peter Sobot to read the IWA format archives used by Apple's iWork family of applications, and to regenerate the mapping files required for Python. Both modules are derived from previous work by Sean Patrick O'Brien.

Decoding the data structures inside Numbers files was helped greatly by Stingray-Reader by Steven Lott.

Formula tests were adapted from JavaScript tests used in fast-formula-parser.

Decimal128 conversion to and from byte storage was adapted from work done by the SheetsJS project. SheetJS also helped greatly with some of the steps required to successfully save a Numbers spreadsheet.

License

All code in this repository is licensed under the MIT License

Project details


Release history Release notifications | RSS feed

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

numbers_parser-4.1.0.tar.gz (269.6 kB view details)

Uploaded Source

Built Distribution

numbers_parser-4.1.0-py3-none-any.whl (291.0 kB view details)

Uploaded Python 3

File details

Details for the file numbers_parser-4.1.0.tar.gz.

File metadata

  • Download URL: numbers_parser-4.1.0.tar.gz
  • Upload date:
  • Size: 269.6 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: poetry/1.4.2 CPython/3.11.3 Darwin/22.5.0

File hashes

Hashes for numbers_parser-4.1.0.tar.gz
Algorithm Hash digest
SHA256 1365df1bbba7f3fd0f54d2531bbef233da6e99124d05eac9b8ba30ef1d214c6f
MD5 3824c25cef5bed13d234525529db60a0
BLAKE2b-256 1d0f14a6df4097eca5a63cbd25e7209c09bd359ca85b500c75993bd01a3e247b

See more details on using hashes here.

File details

Details for the file numbers_parser-4.1.0-py3-none-any.whl.

File metadata

  • Download URL: numbers_parser-4.1.0-py3-none-any.whl
  • Upload date:
  • Size: 291.0 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: poetry/1.4.2 CPython/3.11.3 Darwin/22.5.0

File hashes

Hashes for numbers_parser-4.1.0-py3-none-any.whl
Algorithm Hash digest
SHA256 9981ffb2dbbe5321f64c7ed9137d095326b6338ffbd010501ca114be77a7cfbc
MD5 1b729fc4c22099522a164cd0f38ea934
BLAKE2b-256 a9da8d4ae91cd12d6a2001989396e9a9a454b05fe98a444a127ddc1c2c466465

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page