read tables in html file as excel data
Project description
================================================================================
pyexcel-htmlr - Let you focus on data, instead of html format
================================================================================
.. image:: https://raw.githubusercontent.com/pyexcel/pyexcel.github.io/master/images/patreon.png
:target: https://www.patreon.com/pyexcel
.. image:: https://api.travis-ci.org/pyexcel/pyexcel-htmlr.svg?branch=master
:target: http://travis-ci.org/pyexcel/pyexcel-htmlr
.. image:: https://codecov.io/gh/pyexcel/pyexcel-htmlr/branch/master/graph/badge.svg
:target: https://codecov.io/gh/pyexcel/pyexcel-htmlr
.. image:: https://img.shields.io/gitter/room/gitterHQ/gitter.svg
:target: https://gitter.im/pyexcel/Lobby
Known constraints
==================
Fonts, colors and charts are not supported.
Installation
================================================================================
You can install it via pip:
.. code-block:: bash
$ pip install pyexcel-htmlr
or clone it and install it:
.. code-block:: bash
$ git clone https://github.com/pyexcel/pyexcel-htmlr.git
$ cd pyexcel-htmlr
$ python setup.py install
Support the project
================================================================================
If your company has embedded pyexcel and its components into a revenue generating
product, please `support me on patreon <https://www.patreon.com/bePatron?u=5537627>`_ to
maintain the project and develop it further.
If you are an individual, you are welcome to support me too on patreon and for however long
you feel like to. As a patreon, you will receive
`early access to pyexcel related contents <https://www.patreon.com/pyexcel/posts>`_.
With your financial support, I will be able to invest
a little bit more time in coding, documentation and writing interesting posts.
Usage
================================================================================
As a standalone library
--------------------------------------------------------------------------------
>>> import pyexcel as pe
>>> if sys.version_info[0] < 3:
... from StringIO import StringIO
... else:
... from io import BytesIO as StringIO
>>> PY2 = sys.version_info[0] == 2
>>> if PY2 and sys.version_info[1] < 7:
... from ordereddict import OrderedDict
... else:
... from collections import OrderedDict
>>>
>>> data = OrderedDict() # from collections import OrderedDict
>>> data.update({"Sheet 1": [[1, 2, 3], [4, 5, 6]]})
>>> data.update({"Sheet 2": [["row 1", "row 2", "row 3"]]})
>>> book = pe.get_book(bookdict=data)
>>> book.save_as("your_file.html")
Read from an html file
********************************************************************************
Here's the sample code:
.. code-block:: python
>>> from pyexcel_htmlr import get_data
>>> data = get_data("your_file.html")
>>> import json
>>> print(json.dumps(data))
{"Table 1": [[1, 2, 3], [4, 5, 6]], "Table 2": [["row 1", "row 2", "row 3"]]}
Read from an html from memory
********************************************************************************
Continue from previous example:
.. code-block:: python
>>> # This is just an illustration
>>> # In reality, you might deal with html file upload
>>> # where you will read from requests.FILES['YOUR_HTML_FILE']
>>> data = get_data(book.stream.html)
>>> print(json.dumps(data))
{"Table 1": [[1, 2, 3], [4, 5, 6]], "Table 2": [["row 1", "row 2", "row 3"]]}
Pagination feature
********************************************************************************
Let's assume the following file is a huge html file:
.. code-block:: python
>>> huge_data = [
... [1, 21, 31],
... [2, 22, 32],
... [3, 23, 33],
... [4, 24, 34],
... [5, 25, 35],
... [6, 26, 36]
... ]
>>> sheetx = {
... "huge": huge_data
... }
>>> pe.save_as(bookdict=sheetx, dest_file_name="huge_file.html")
And let's pretend to read partial data:
.. code-block:: python
>>> partial_data = get_data("huge_file.html", start_row=2, row_limit=3)
>>> print(json.dumps(partial_data))
{"Table 1": [[3, 23, 33], [4, 24, 34], [5, 25, 35]]}
And you could as well do the same for columns:
.. code-block:: python
>>> partial_data = get_data("huge_file.html", start_column=1, column_limit=2)
>>> print(json.dumps(partial_data))
{"Table 1": [[21, 31], [22, 32], [23, 33], [24, 34], [25, 35], [26, 36]]}
Obvious, you could do both at the same time:
.. code-block:: python
>>> partial_data = get_data("huge_file.html",
... start_row=2, row_limit=3,
... start_column=1, column_limit=2)
>>> print(json.dumps(partial_data))
{"Table 1": [[23, 33], [24, 34], [25, 35]]}
As a pyexcel plugin
--------------------------------------------------------------------------------
No longer, explicit import is needed since pyexcel version 0.2.2. Instead,
this library is auto-loaded. So if you want to read data in html format,
installing it is enough.
Reading from an html file
********************************************************************************
Here is the sample code:
.. code-block:: python
>>> import pyexcel as pe
>>> sheet = pe.get_book(file_name="your_file.html")
>>> sheet
Table 1:
+---+---+---+
| 1 | 2 | 3 |
+---+---+---+
| 4 | 5 | 6 |
+---+---+---+
Table 2:
+-------+-------+-------+
| row 1 | row 2 | row 3 |
+-------+-------+-------+
Reading from a IO instance
********************************************************************************
You got to wrap the binary content with stream to get html working:
.. code-block:: python
>>> # This is just an illustration
>>> # In reality, you might deal with html file upload
>>> # where you will read from requests.FILES['YOUR_HTML_FILE']
>>> htmlfile = "your_file.html"
>>> with open(htmlfile, "r") as f:
... content = f.read()
... r = pe.get_book(file_type="html", file_content=content)
... print(r)
...
Table 1:
+---+---+---+
| 1 | 2 | 3 |
+---+---+---+
| 4 | 5 | 6 |
+---+---+---+
Table 2:
+-------+-------+-------+
| row 1 | row 2 | row 3 |
+-------+-------+-------+
License
================================================================================
New BSD License
Developer guide
==================
Development steps for code changes
#. git clone https://github.com/pyexcel/pyexcel-htmlr.git
#. cd pyexcel-htmlr
Upgrade your setup tools and pip. They are needed for development and testing only:
#. pip install --upgrade setuptools pip
Then install relevant development requirements:
#. pip install -r rnd_requirements.txt # if such a file exists
#. pip install -r requirements.txt
#. pip install -r tests/requirements.txt
Once you have finished your changes, please provide test case(s), relevant documentation
and update CHANGELOG.rst.
.. note::
As to rnd_requirements.txt, usually, it is created when a dependent
library is not released. Once the dependecy is installed
(will be released), the future
version of the dependency in the requirements.txt will be valid.
How to test your contribution
------------------------------
Although `nose` and `doctest` are both used in code testing, it is adviable that unit tests are put in tests. `doctest` is incorporated only to make sure the code examples in documentation remain valid across different development releases.
On Linux/Unix systems, please launch your tests like this::
$ make
On Windows systems, please issue this command::
> test.bat
How to update test environment and update documentation
---------------------------------------------------------
Additional steps are required:
#. pip install moban
#. git clone https://github.com/pyexcel/pyexcel-commons.git commons
#. make your changes in `.moban.d` directory, then issue command `moban`
What is pyexcel-commons
---------------------------------
Many information that are shared across pyexcel projects, such as: this developer guide, license info, etc. are stored in `pyexcel-commons` project.
What is .moban.d
---------------------------------
`.moban.d` stores the specific meta data for the library.
Acceptance criteria
-------------------
#. Has Test cases written
#. Has all code lines tested
#. Passes all Travis CI builds
#. Has fair amount of documentation if your change is complex
#. Agree on NEW BSD License for your contribution
Change log
===========
0.5.1 - 20.10.2017
--------------------------------------------------------------------------------
added
++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
#. `#103 <https://github.com/pyexcel/pyexcel/issues/103>`_, include LICENSE file
in MANIFEST.in, meaning LICENSE file will appear in the released tar ball.
0.5.0 - 30.08.2017
--------------------------------------------------------------------------------
Updated
********************************************************************************
#. put dependency on pyexcel-io 0.5.0, which uses cStringIO instead of StringIO.
Hence, there will be performance boost in handling files in memory.
#. version jumped because it will be easy to see pyexcel-htmlr depends on
pyexcel-io v0.5.0
Relocated
********************************************************************************
#. type detection code is being relocated into pyexcel-io
0.0.1 - 26-07-2017
---------------------------
Initial release
pyexcel-htmlr - Let you focus on data, instead of html format
================================================================================
.. image:: https://raw.githubusercontent.com/pyexcel/pyexcel.github.io/master/images/patreon.png
:target: https://www.patreon.com/pyexcel
.. image:: https://api.travis-ci.org/pyexcel/pyexcel-htmlr.svg?branch=master
:target: http://travis-ci.org/pyexcel/pyexcel-htmlr
.. image:: https://codecov.io/gh/pyexcel/pyexcel-htmlr/branch/master/graph/badge.svg
:target: https://codecov.io/gh/pyexcel/pyexcel-htmlr
.. image:: https://img.shields.io/gitter/room/gitterHQ/gitter.svg
:target: https://gitter.im/pyexcel/Lobby
Known constraints
==================
Fonts, colors and charts are not supported.
Installation
================================================================================
You can install it via pip:
.. code-block:: bash
$ pip install pyexcel-htmlr
or clone it and install it:
.. code-block:: bash
$ git clone https://github.com/pyexcel/pyexcel-htmlr.git
$ cd pyexcel-htmlr
$ python setup.py install
Support the project
================================================================================
If your company has embedded pyexcel and its components into a revenue generating
product, please `support me on patreon <https://www.patreon.com/bePatron?u=5537627>`_ to
maintain the project and develop it further.
If you are an individual, you are welcome to support me too on patreon and for however long
you feel like to. As a patreon, you will receive
`early access to pyexcel related contents <https://www.patreon.com/pyexcel/posts>`_.
With your financial support, I will be able to invest
a little bit more time in coding, documentation and writing interesting posts.
Usage
================================================================================
As a standalone library
--------------------------------------------------------------------------------
>>> import pyexcel as pe
>>> if sys.version_info[0] < 3:
... from StringIO import StringIO
... else:
... from io import BytesIO as StringIO
>>> PY2 = sys.version_info[0] == 2
>>> if PY2 and sys.version_info[1] < 7:
... from ordereddict import OrderedDict
... else:
... from collections import OrderedDict
>>>
>>> data = OrderedDict() # from collections import OrderedDict
>>> data.update({"Sheet 1": [[1, 2, 3], [4, 5, 6]]})
>>> data.update({"Sheet 2": [["row 1", "row 2", "row 3"]]})
>>> book = pe.get_book(bookdict=data)
>>> book.save_as("your_file.html")
Read from an html file
********************************************************************************
Here's the sample code:
.. code-block:: python
>>> from pyexcel_htmlr import get_data
>>> data = get_data("your_file.html")
>>> import json
>>> print(json.dumps(data))
{"Table 1": [[1, 2, 3], [4, 5, 6]], "Table 2": [["row 1", "row 2", "row 3"]]}
Read from an html from memory
********************************************************************************
Continue from previous example:
.. code-block:: python
>>> # This is just an illustration
>>> # In reality, you might deal with html file upload
>>> # where you will read from requests.FILES['YOUR_HTML_FILE']
>>> data = get_data(book.stream.html)
>>> print(json.dumps(data))
{"Table 1": [[1, 2, 3], [4, 5, 6]], "Table 2": [["row 1", "row 2", "row 3"]]}
Pagination feature
********************************************************************************
Let's assume the following file is a huge html file:
.. code-block:: python
>>> huge_data = [
... [1, 21, 31],
... [2, 22, 32],
... [3, 23, 33],
... [4, 24, 34],
... [5, 25, 35],
... [6, 26, 36]
... ]
>>> sheetx = {
... "huge": huge_data
... }
>>> pe.save_as(bookdict=sheetx, dest_file_name="huge_file.html")
And let's pretend to read partial data:
.. code-block:: python
>>> partial_data = get_data("huge_file.html", start_row=2, row_limit=3)
>>> print(json.dumps(partial_data))
{"Table 1": [[3, 23, 33], [4, 24, 34], [5, 25, 35]]}
And you could as well do the same for columns:
.. code-block:: python
>>> partial_data = get_data("huge_file.html", start_column=1, column_limit=2)
>>> print(json.dumps(partial_data))
{"Table 1": [[21, 31], [22, 32], [23, 33], [24, 34], [25, 35], [26, 36]]}
Obvious, you could do both at the same time:
.. code-block:: python
>>> partial_data = get_data("huge_file.html",
... start_row=2, row_limit=3,
... start_column=1, column_limit=2)
>>> print(json.dumps(partial_data))
{"Table 1": [[23, 33], [24, 34], [25, 35]]}
As a pyexcel plugin
--------------------------------------------------------------------------------
No longer, explicit import is needed since pyexcel version 0.2.2. Instead,
this library is auto-loaded. So if you want to read data in html format,
installing it is enough.
Reading from an html file
********************************************************************************
Here is the sample code:
.. code-block:: python
>>> import pyexcel as pe
>>> sheet = pe.get_book(file_name="your_file.html")
>>> sheet
Table 1:
+---+---+---+
| 1 | 2 | 3 |
+---+---+---+
| 4 | 5 | 6 |
+---+---+---+
Table 2:
+-------+-------+-------+
| row 1 | row 2 | row 3 |
+-------+-------+-------+
Reading from a IO instance
********************************************************************************
You got to wrap the binary content with stream to get html working:
.. code-block:: python
>>> # This is just an illustration
>>> # In reality, you might deal with html file upload
>>> # where you will read from requests.FILES['YOUR_HTML_FILE']
>>> htmlfile = "your_file.html"
>>> with open(htmlfile, "r") as f:
... content = f.read()
... r = pe.get_book(file_type="html", file_content=content)
... print(r)
...
Table 1:
+---+---+---+
| 1 | 2 | 3 |
+---+---+---+
| 4 | 5 | 6 |
+---+---+---+
Table 2:
+-------+-------+-------+
| row 1 | row 2 | row 3 |
+-------+-------+-------+
License
================================================================================
New BSD License
Developer guide
==================
Development steps for code changes
#. git clone https://github.com/pyexcel/pyexcel-htmlr.git
#. cd pyexcel-htmlr
Upgrade your setup tools and pip. They are needed for development and testing only:
#. pip install --upgrade setuptools pip
Then install relevant development requirements:
#. pip install -r rnd_requirements.txt # if such a file exists
#. pip install -r requirements.txt
#. pip install -r tests/requirements.txt
Once you have finished your changes, please provide test case(s), relevant documentation
and update CHANGELOG.rst.
.. note::
As to rnd_requirements.txt, usually, it is created when a dependent
library is not released. Once the dependecy is installed
(will be released), the future
version of the dependency in the requirements.txt will be valid.
How to test your contribution
------------------------------
Although `nose` and `doctest` are both used in code testing, it is adviable that unit tests are put in tests. `doctest` is incorporated only to make sure the code examples in documentation remain valid across different development releases.
On Linux/Unix systems, please launch your tests like this::
$ make
On Windows systems, please issue this command::
> test.bat
How to update test environment and update documentation
---------------------------------------------------------
Additional steps are required:
#. pip install moban
#. git clone https://github.com/pyexcel/pyexcel-commons.git commons
#. make your changes in `.moban.d` directory, then issue command `moban`
What is pyexcel-commons
---------------------------------
Many information that are shared across pyexcel projects, such as: this developer guide, license info, etc. are stored in `pyexcel-commons` project.
What is .moban.d
---------------------------------
`.moban.d` stores the specific meta data for the library.
Acceptance criteria
-------------------
#. Has Test cases written
#. Has all code lines tested
#. Passes all Travis CI builds
#. Has fair amount of documentation if your change is complex
#. Agree on NEW BSD License for your contribution
Change log
===========
0.5.1 - 20.10.2017
--------------------------------------------------------------------------------
added
++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
#. `#103 <https://github.com/pyexcel/pyexcel/issues/103>`_, include LICENSE file
in MANIFEST.in, meaning LICENSE file will appear in the released tar ball.
0.5.0 - 30.08.2017
--------------------------------------------------------------------------------
Updated
********************************************************************************
#. put dependency on pyexcel-io 0.5.0, which uses cStringIO instead of StringIO.
Hence, there will be performance boost in handling files in memory.
#. version jumped because it will be easy to see pyexcel-htmlr depends on
pyexcel-io v0.5.0
Relocated
********************************************************************************
#. type detection code is being relocated into pyexcel-io
0.0.1 - 26-07-2017
---------------------------
Initial release
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
pyexcel-htmlr-0.5.1.tar.gz
(9.7 kB
view hashes)
Built Distribution
Close
Hashes for pyexcel_htmlr-0.5.1-py2.py3-none-any.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | a2efb7f48a7c5235e51a120be570e639c8e5950132002bc40a9f7d826b9f37fa |
|
MD5 | 707bc2c3f274f8c9d248944b20474792 |
|
BLAKE2b-256 | 67484f392c47dba83bcf88ceb80d29254f04cb6580e199983c7fc02ed74aa716 |