Extracts text from pdf documents

These details have not been verified by PyPI

Project description

tags: [gradio-custom-component, , text extraction, pdf to string] title: gradio_simpletextextractfrompdf short_description: extract text from simple pdf documents colorFrom: blue colorTo: yellow sdk: gradio pinned: false app_file: space.py

`gradio_simpletextextractfrompdf`

Extracts text from pdf documents

Installation

pip install gradio_simpletextextractfrompdf

Usage

import gradio as gr
from gradio_simpletextextractfrompdf import SimpleTextExtractFromPDF

def first_200_chars(text):
    return text[:200]


demo = gr.Interface(
    fn=first_200_chars,
    inputs=SimpleTextExtractFromPDF(),
    outputs=gr.Textbox(label="First 200 characters of the extracted text"),
    title="Simple Text Extract From PDF",
    description="""
## Component Description
This space is to demo the usage of the SimpleTextExtractFromPDF component.
This component provides a simple interface to extract text from a PDF file. The extracted text can be submitted as a string input to a function for further processing.
- **Text Extraction Only:** Only the text content is extracted from the PDF. Images and table structures are not preserved.
- **Flexible Upload Options:** Users can upload a PDF file from their device or provide a URL to the PDF.
- **Input Component:** The component is primarily designed to be used as an input, allowing users to submit the extracted text to other functions.
- **Output Display:** When used as an output component, the extracted string content is displayed in a textarea.
The demo app here uses the SimpleTextExtractFromPDF component as an input component to extract the text from a PDF file and then show the first 200 characters of the extracted text.
""",
    article="""
<p>
    <code>pip install gradio-simpletextextractfrompdf</code>
    <br>
    <a href="https://pypi.org/project/gradio-simpletextextractfrompdf/"> https://pypi.org/project/gradio-simpletextextractfrompdf/</a>
</p>
""",
)


if __name__ == "__main__":
    demo.launch()

`SimpleTextExtractFromPDF`

Initialization

name	type	default	description
`value`	str \| None	`None`	The extracted text from the file. This value is set by the component and can be submitted as an input {string} to the function.
`every`	Timer \| float \| None	`None`	Continously calls `value` to recalculate it if `value` is a function (has no effect otherwise). Can provide a Timer whose tick resets `value`, or a float that provides the regular interval for the reset Timer.
`label`	str \| I18nData \| None	`None`	the label for this component, displayed above the component if `show_label` is `True` and is also used as the header if there are a table of examples for this component. If None and used in a `gr.Interface`, the label will be the name of the parameter this component corresponds to.
`inputs`	Component \| Sequence[Component] \| set[Component] \| None	`None`	None
`show_label`	bool \| None	`None`	if True, will display label.
`scale`	int \| None	`None`	relative size compared to adjacent Components. For example if Components A and B are in a Row, and A has scale=2, and B has scale=1, A will be twice as wide as B. Should be an integer. scale applies in Rows, and to top-level Components in Blocks where fill_height=True.
`min_width`	int	`160`	minimum pixel width, will wrap if not sufficient screen space to satisfy this value. If a certain scale value results in this Component being narrower than min_width, the min_width parameter will be respected first.
`interactive`	bool \| None	`None`	if True, will be rendered as an editable textbox; if False, editing will be disabled. If not provided, this is inferred based on whether the component is used as an input or output.
`visible`	bool	`True`	If False, component will be hidden.
`elem_id`	str \| None	`None`	An optional string that is assigned as the id of this component in the HTML DOM. Can be used for targeting CSS styles.
`elem_classes`	list[str] \| str \| None	`None`	An optional list of strings that are assigned as the classes of this component in the HTML DOM. Can be used for targeting CSS styles.
`render`	bool	`True`	If False, component will not render be rendered in the Blocks context. Should be used if the intention is to assign event listeners now but render the component later.
`key`	int \| str \| tuple[int \| str, ...] \| None	`None`	in a gr.render, Components with the same key across re-renders are treated as the same component, not a new component. Properties set in 'preserved_by_key' are not reset across a re-render.
`preserved_by_key`	list[str] \| str \| None	`"value"`	A list of parameters from this component's constructor. Inside a gr.render() function, if a component is re-rendered with the same key, these (and only these) parameters will be preserved in the UI (if they have been changed by the user or an event listener) instead of re-rendered based on the values provided during constructor.

Events

name	description
`submit`

User function

The impact on the users predict function varies depending on whether the component is used as an input or output for an event (or both).

When used as an Input, the component only impacts the input signature of the user function.
When used as an output, the component only impacts the return signature of the user function.

The code snippet below is accurate in cases where the component is used as both an input and an output.

As output: Is passed, passes the extracted text into the function - string.
As input: Should return, expects a {string} returned from the function and sets component value to it.

def predict(
    value: str | None
) -> str | None:
    return value

Project details

These details have not been verified by PyPI

Release history Release notifications | RSS feed

0.0.4

Jun 10, 2025

This version

0.0.3

Jun 10, 2025

0.0.2

Jun 8, 2025

0.0.1

Jun 8, 2025

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

gradio_simpletextextractfrompdf-0.0.3.tar.gz (2.1 MB view details)

Uploaded Jun 10, 2025 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

gradio_simpletextextractfrompdf-0.0.3-py3-none-any.whl (2.0 MB view details)

Uploaded Jun 10, 2025 Python 3

File details

Details for the file gradio_simpletextextractfrompdf-0.0.3.tar.gz.

File metadata

Download URL: gradio_simpletextextractfrompdf-0.0.3.tar.gz
Upload date: Jun 10, 2025
Size: 2.1 MB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.1.0 CPython/3.13.4

File hashes

Hashes for gradio_simpletextextractfrompdf-0.0.3.tar.gz
Algorithm	Hash digest
SHA256	`834441cf8ede897f2347e3103be7bb2f10d06af907ef8be451ea298ee10a460e`
MD5	`a9c24a263b8a6b1dc22542a26e355ee7`
BLAKE2b-256	`b5b99a649f8816a3132395cfb280b8dc8ce303795e8de0c9c4bfb07076c107f5`

See more details on using hashes here.

File details

Details for the file gradio_simpletextextractfrompdf-0.0.3-py3-none-any.whl.

File metadata

Download URL: gradio_simpletextextractfrompdf-0.0.3-py3-none-any.whl
Upload date: Jun 10, 2025
Size: 2.0 MB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.1.0 CPython/3.13.4

File hashes

Hashes for gradio_simpletextextractfrompdf-0.0.3-py3-none-any.whl
Algorithm	Hash digest
SHA256	`1bc2a637419f0b9c0fb9c6a6cee83fd708515885c41be30aaaa4d71080ef3ba5`
MD5	`7cf542d5e99197e8a1af93a300038bcb`
BLAKE2b-256	`b18b0a632bdc767fc23be06c10f67907229234ebf70cbd5fe8d1e069ee2aaad3`

See more details on using hashes here.

gradio-simpletextextractfrompdf 0.0.3

Navigation

Verified details

Maintainers

Unverified details

Meta

Classifiers

Project description

tags: [gradio-custom-component, , text extraction, pdf to string] title: gradio_simpletextextractfrompdf short_description: extract text from simple pdf documents colorFrom: blue colorTo: yellow sdk: gradio pinned: false app_file: space.py

`gradio_simpletextextractfrompdf`

Installation

Usage

`SimpleTextExtractFromPDF`

Initialization

Events

User function

Project details

Verified details

Maintainers

Unverified details

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes