Read and write Apple Numbers spreadsheets
Project description
numbers-parser
numbers-parser
is a Python module for parsing Apple Numbers.numbers
files. It supports Numbers files
generated by Numbers version 10.3, and up with the latest tested version being 13.2
(current as of September 2023).
It supports and is tested against Python versions from 3.9 onwards. It is not compatible with earlier versions of Python.
Installation
python3 -m pip install numbers-parser
A pre-requisite for this package is python-snappy which will be installed by Python automatically, but python-snappy also requires that the binary libraries for snappy compression are present.
The most straightforward way to install the binary dependencies is to use Homebrew and source Python from Homebrew rather than from macOS as described in the python-snappy github:
For Intel Macs:
brew install snappy python3
CPPFLAGS="-I/usr/local/include -L/usr/local/lib" \
python3 -m pip install python-snappy
For Apple Silicon Macs:
brew install snappy python3
CPPFLAGS="-I/opt/homebrew/include -L/opt/homebrew/lib" \
python3 -m pip install python-snappy
For Linux (your package manager may be different):
sudo apt-get -y install libsnappy-dev
On Windows, you will need to either arrange for snappy to be found for VSC++ or you can install python binary libraries compiled by Christoph Gohlke. You must select the correct python version for your installation. For example for python 3.11:
pip install python_snappy-0.6.1-cp311-cp311-win_amd64.whl
Quick Start
Reading documents:
>>> from numbers_parser import Document
>>> doc = Document("mydoc.numbers")
>>> sheets = doc.sheets
>>> tables = sheets[0].tables
>>> rows = tables[0].rows()
Sheets and tables are iterables that can be indexed using either an integer index or using the name of the sheet/table:
>>> doc.sheets[0].name
'Sheet 1'
>>> doc.sheets["Sheet 1"].name
'Sheet 1'
>>> doc.sheets[0].tables[0].name
'Table 1'
>>> doc.sheets[0].tables["Table 1"].name
'Table 1'
Table
objects have a rows
method which contains a nested list
with an entry for each row of the table. Each row is itself a list of
the column values.
>>> data = sheets["Sheet 1"].tables["Table 1"].rows()
>>> data[0][0]
<numbers_parser.cell.EmptyCell object at 0x1022b5710>
>>> data[1][0]
<numbers_parser.cell.TextCell object at 0x101eb6790>
>>> data[1][0].value
'Debit'
Cell Data
Cells are objects with a common base class of Cell
. All cell types
have a property value
which returns the contents of the cell as a
python datatype. numbers-parser
uses
pendulum instead of python’s builtin
types. Available cell types are:
Cell type | value type | Additional properties |
---|---|---|
NumberCell | float |
|
TextCell | str |
|
RichTextCell | str |
See Rich text |
EmptyCell | None |
|
BoolCell | bool |
|
DateCell | pendulum.datetime |
|
DurationCell | pendulum.duration |
|
ErrorCell | None |
|
MergedCell | None |
See Merged cells |
Cell references can be either zero-offset row/column integers or an
Excel/Numbers A1 notation. Where cell values are not None
the
property formatted_value
returns the cell value as a str
as
displayed in Numbers. Cells that have no values in a table are
represented as EmptyCell
and cells containing evaluation errors of
any kind ErrorCell
.
>>> table.cell(1,0)
<numbers_parser.cell.TextCell object at 0x1019ade50>
>>> table.cell(1,0).value
'Debit'
>>> table.cell("B2")
<numbers_parser.cell.NumberCell object at 0x103a99790>
>>> table.cell("B2").value
1234.5
>>> table.cell("B2").formatted_value
'£1,234.50'
Pandas Support
Since the return value of rows()
is a list of lists, you can pass
this directly to pandas. Assuming you have a Numbers table with a single
header which contains the names of the pandas series you want to create
you can construct a pandas dataframe using:
import pandas as pd
doc = Document("simple.numbers")
sheets = doc.sheets
tables = sheets[0].tables
data = tables[0].rows(values_only=True)
df = pd.DataFrame(data[1:], columns=data[0])
Writing Numbers Documents
Whilst support for writing numbers files has been stable since version 3.4.0, you are highly recommended not to overwrite working Numbers files and instead save data to a new file.
Cell values are written using
Table.write() and
numbers-parser
will automatically create empty rows and columns
for any cell references that are out of range of the current table.
doc = Document("write.numbers")
sheets = doc.sheets
tables = sheets[0].tables
table = tables[0]
table.write(1, 1, "This is new text")
table.write("B7", datetime(2020, 12, 25))
doc.save("new-sheet.numbers")
Additional tables and worksheets can be added to a Document
before saving using
Document.add_sheet() and
Sheet.add_table() respectively:
doc = Document()
doc.add_sheet("New Sheet", "New Table")
sheet = doc.sheets["New Sheet"]
table = sheet.tables["New Table"]
table.write(1, 1, 1000)
table.write(1, 2, 2000)
table.write(1, 3, 3000)
doc.save("sheet.numbers")
Styles
numbers_parser
currently only supports paragraph styles and cell
styles. The following styles are supported:
- font attributes: bold, italic, underline, strikethrough
- font selection and size
- text foreground color
- horizontal and vertical alignment
- cell background color
- cell background images
- cell indents (first line, left, right, and text inset)
Numbers conflates style attributes that can be stored in paragraph styles (the style menu in the text panel) with the settings that are available on the Style tab of the Text panel. Some attributes in Numbers are not applied to new cells when a style is applied.
To keep the API simple, numbers-parser
packs all styling into a single
Style object. When a document is saved, the attributes
not stored in a paragraph style are applied to each cell that includes it.
Styles are read from cells using the Cell.style property and you can add new styles with Document.add_style.
red_text = doc.add_style(
name="Red Text",
font_name="Lucida Grande",
font_color=RGB(230, 25, 25),
font_size=14.0,
bold=True,
italic=True,
alignment=Alignment("right", "top"),
)
table.write("B2", "Red", style=red_text)
table.set_cell_style("C2", red_text)
Cell Data Formatting
Numbers has two different cell formatting types: data formats and custom formats.
Data formats are presented in Numbers in the Cell tab of the Format pane
and are applied to individual cells. Like Numbers, numbers-parsers
caches formatting information that is identical across multiple cells.
You do not need to take any action for this to happen; this is handled
internally by the package. Changing a data format for cell has no impact
on any other cells.
Cell formats are changed using Table.set_cell_formatting():
table.set_cell_formatting(
"C1",
"date",
date_time_format="EEEE, d MMMM yyyy"
)
table.set_cell_formatting(
0,
4,
"number",
decimal_places=3,
negative_style=NegativeNumberStyle.RED
)
Custom formats are shared across a Document and can be applied to multiple cells in multiple tables. Editing a custom format changes the appearance of data in all cells that share that format. You must first add a custom format to the document using Document.add_custom_format() before assigning it to cells using Table.set_cell_formatting():
long_date = doc.add_custom_format(
name="Long Date",
type="date",
date_time_format="EEEE, d MMMM yyyy"
)
table.set_cell_formatting("C1", "custom", format=long_date)
A limited number of currencies are formatted using symbolic notation
rather than an ISO code. These are defined in
numbers_parser.currencies
and match the ones chosen by Numbers. For
example, US dollars are referred to as US$
whereas Euros and British
Pounds are referred to using their symbols of €
and £
respectively.
Borders
numbers-parser
supports reading and writing cell borders, though the
interface for each differs. Individual cells can have each of their four
borders tested, but when drawing new borders, these are set for the
table to allow for drawing borders across multiple cells. Setting the
border of merged cells is not possible unless the edge of the cells is
at the end of the merged region.
Borders are represented using the Border class that can be initialized with line width, color and line style. The current state of a cell border is read using the Cell.border property and Table.set_cell_border() sets the border for a cell edge or a range of cells.
API
For more examples and details of all available classes and methods, see the full API docs.
Command-line scripts
When installed from PyPI,
a command-like script cat-numbers
is installed in Python’s scripts
folder. This script dumps Numbers spreadsheets into Excel-compatible CSV
format, iterating through all the spreadsheets passed on the
command-line.
usage: cat-numbers [-h] [-T | -S | -b] [-V] [--formulas] [--formatting]
[-s SHEET] [-t TABLE] [--debug]
[document ...]
Export data from Apple Numbers spreadsheet tables
positional arguments:
document Document(s) to export
options:
-h, --help show this help message and exit
-T, --list-tables List the names of tables and exit
-S, --list-sheets List the names of sheets and exit
-b, --brief Don't prefix data rows with name of sheet/table
(default: false)
-V, --version
--formulas Dump formulas instead of formula results
--formatting Dump formatted cells (durations) as they appear
in Numbers
-s SHEET, --sheet SHEET
Names of sheet(s) to include in export
-t TABLE, --table TABLE
Names of table(s) to include in export
--debug Enable debug logging
Note: --formatting
will return different capitalization for 12-hour
times due to differences between Numbers’ representation of these dates
and datetime.strftime
. Numbers in English locales displays 12-hour
times with ‘am’ and ‘pm’, but datetime.strftime
on macOS at least
cannot return lower-case versions of AM/PM.
Limitations
Current known limitations of numbers-parser
which may be implemented in the future are:
- Table styles that allow new tables to adopt a style across the whole table are not suppported
- Creating cells of type
BulletedTextCell
is not supported - New tables are inserted with a fixed offset below the last table in a worksheet which does not take into account title or caption size
- Formulas cannot be written to a document
- Pivot tables are unsupported and saving a document with a pivot table issues a UnsupportedWarning.
The following limitations are expected to always remain:
- New sheets insert tables with formats copied from the first table in the previous sheet rather than default table formats
- Due to a limitation in Python’s
ZipFile, Python
versions older than 3.11 do not support image filenames with UTF-8 characters
Cell.add_style.bg_image() returns
None
for such files and issues aRuntimeWarning
(see issue 69 for details).
License
All code in this repository is licensed under the MIT License.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
File details
Details for the file numbers_parser-4.10.5.tar.gz
.
File metadata
- Download URL: numbers_parser-4.10.5.tar.gz
- Upload date:
- Size: 280.0 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: poetry/1.7.1 CPython/3.12.1 Darwin/23.4.0
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 7e1119133aad596dddc5d2240b916b195e18980dfc196df9c4283611c2668c27 |
|
MD5 | 02f2281f4bda496ea9b77af3e2188191 |
|
BLAKE2b-256 | c62ec390a5c3e282ff662b1dba4e8f881992d49fcb58d05d99afcf91e821ac9f |
File details
Details for the file numbers_parser-4.10.5-py3-none-any.whl
.
File metadata
- Download URL: numbers_parser-4.10.5-py3-none-any.whl
- Upload date:
- Size: 302.2 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: poetry/1.7.1 CPython/3.12.1 Darwin/23.4.0
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 798b5797479fa6e51d9adfd1f097653125d6cce77703826b16038135861d6cde |
|
MD5 | 5a6b98f076080acc13a0c9b3d5323186 |
|
BLAKE2b-256 | 4348d113849b116df04d6131937a8daf2c0506586257ebb05dc2e9fe36e9e505 |