Streamlit component for PDF visualisation and manipulation
Project description
streamlit-pdf-viewer
Streamlit component that allows the visualisation and enrichment of PDF documents Tested on Chrome and Firefox. You can see an application in action here.
Work in progress
We are early in the development, and we appreciate new contributors.
Getting started
pip install streamlit-pdf-viewer
In your streamlit application, you can use it as:
import streamlit as st
from streamlit_pdf_viewer import pdf_viewer
pdf_viewer("str, path or bytes")
Options
Params
In the following table the list of parameters that can be provided to the pdf_viewer
function:
name | description |
---|---|
input | The source of the PDF file. Accepts a file path, URL, or binary data. |
width | Width of the PDF viewer in pixels. It defaults to 700 pixels. |
height | Height of the PDF viewer in pixels. If not provided, the viewer shows the whole content. |
annotations | A list of annotations to be overlaid on the PDF. Format is described here. |
pages_vertical_spacing | The vertical space (in pixels) between each page of the PDF. Defaults to 2 pixels. |
annotation_outline_size | Size of the outline around each annotation in pixels. Defaults to 1 pixel. |
rendering | Type of rendering: unwrap (default), legacy_iframe , or legacy_embed . The default value, unwrap shows the PDF document using pdf.js, and supports the visualisation of annotations. Other values are legacy_iframe and legacy_embed which use the legacy approach of injecting the document into an <embed> or <iframe> . They allow viewing the PDF using the viewer of the browser that contains additional features we are still working to implement in this component. IMPORTANT: :warning: The "legacy" methods work only with Firefox, and do not support annotations. :warning: |
pages_to_render | Filter the rendering to a specific set of pages. By default, all pages are rendered. |
render_text | Enable a layer of text on top of the PDF document. The text may be selected and copied. NOTE to avoid breaking existing deployments, we made this optional at first, also considering that having many annotations might interfere with the copy-paste. |
Annotation format
The annotations format has been derived from the Grobid's coordinate formats, which are described as a list of "bounding boxes". The annotations are expressed as a dictionary of six elements, the page, x and y indicate the top left point. The color can be expressed following the html CSS convention.
Here an example:
[
{
"page": 1,
"x": 220,
"y": 155,
"height": 22,
"width": 65,
"color": "red"
},
[...]
The example shown in our screenshot can be found here.
Developers notes
Environment
- Python >= 3.8
- Node.js >= 16
- Streamlit >= 1.28.2
Configure environment for development
First, make sure that _RELEASE = False in streamlit_pdf_viewer/__init__.py
. To run the component in development mode, use the following commands:
streamlit run streamlit_pdf_viewer/__init__.py
cd frontend
npm run serve
These commands will start the Streamlit application and serve the Node.js component. Please make sure you're in the correct directory before running these commands.
Integrate into a streamlit application
-
Build the frontend part:
cd frontend export NODE_OPTIONS=--openssl-legacy-provider npm run build
-
Make sure that _RELEASE = True in
streamlit_pdf_viewer/__init__.py
. -
move to the streamlit_application and run
pip install -e {path of component}
Release
bump-my-version bump patch | minor | major
git push
git push --tags
Acknowledgement
The project was initiated at the National Institute for Materials Science (NIMS) in Japan. Currently, the development is possible thanks to ScienciLAB. Main collaborator: Tomoya Mato very helpful to attenuate the pain of Javascript.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
File details
Details for the file streamlit_pdf_viewer-0.0.16.dev1.tar.gz
.
File metadata
- Download URL: streamlit_pdf_viewer-0.0.16.dev1.tar.gz
- Upload date:
- Size: 2.4 MB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/5.1.1 CPython/3.10.12
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 6e755eab75fc001749f7e9a22aab7f092ef48a5765f78ba4c981386f2af2cb81 |
|
MD5 | f54d096f2f1784f4029bc78c464717f1 |
|
BLAKE2b-256 | cb57e2db72d7076bdd40e3c5711eebde4ad797661265303f248eaaf48be65e35 |
File details
Details for the file streamlit_pdf_viewer-0.0.16.dev1-py3-none-any.whl
.
File metadata
- Download URL: streamlit_pdf_viewer-0.0.16.dev1-py3-none-any.whl
- Upload date:
- Size: 2.4 MB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/5.1.1 CPython/3.10.12
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | a669e9657efd234e443f16398f80acc00b0c7390c821887655c87f4713f78212 |
|
MD5 | 898033d55eb64e93a7588ea2a055199a |
|
BLAKE2b-256 | 864592639981f7fa8ed01b946e189b592cd4c61f2bc2c646e0401a3dd2d58605 |