Skip to main content

Parse data from documents optimised for downstream llm tasks.

Project description

LLM Parse

LLM Parse is a Python library designed for parsing and extracting data from files, specifically optimized for downstream tasks involving large language models (LLMs).

It is built on several popular document parsing libraries with further text processing to represent the data in a form that is more suitable for downstream LLM tasks such as RAG, summarization and drafting.

Getting started

Install the package:

pip install llm-parse

Examples

Parse a PDF to Markdown.

from llm_parse.pdf_2_md_parser import PDF2MDParser

parser = PDF2MDParser()
text = parser.load_data("example.pdf")

Parse a PDF to text.

from llm_parse.pdf_2_text_parser import PDF2TextParser

parser = PDF2TextParser()
text = parser.load_data("example.pdf")

Using LlamaParse parser.

from llm_parse.llamaparse_parser import LlamaParseParser

# can use any args for LlamaParse. ref: https://github.com/run-llama/llama_parse?tab=readme-ov-file#getting-started
parser = LlamaParseParser(
    api_key="llx-...",  # can also be set in your env as LLAMA_CLOUD_API_KEY
    result_type="markdown",  # "markdown" and "text" are available
    num_workers=4,  # if multiple files passed, split in `num_workers` API calls
    verbose=True,
    language="en",  # Optionally you can define a language, default=en
)
text = parser.load_data("example.pdf")

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

llm_parse-0.1.3.tar.gz (8.7 kB view details)

Uploaded Source

Built Distribution

llm_parse-0.1.3-py3-none-any.whl (10.1 kB view details)

Uploaded Python 3

File details

Details for the file llm_parse-0.1.3.tar.gz.

File metadata

  • Download URL: llm_parse-0.1.3.tar.gz
  • Upload date:
  • Size: 8.7 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: poetry/1.8.2 CPython/3.12.2 Darwin/23.4.0

File hashes

Hashes for llm_parse-0.1.3.tar.gz
Algorithm Hash digest
SHA256 34a8e6635ba58103081fc5ed28deda187cab7f7ea9865aa027cd81f5abe565b4
MD5 971006309214f4dfa7a76bfc3e53586d
BLAKE2b-256 2c5e54dab6c42465585579d07a90147c088fb8cb3afb34e33ea2d910d4c67f28

See more details on using hashes here.

File details

Details for the file llm_parse-0.1.3-py3-none-any.whl.

File metadata

  • Download URL: llm_parse-0.1.3-py3-none-any.whl
  • Upload date:
  • Size: 10.1 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: poetry/1.8.2 CPython/3.12.2 Darwin/23.4.0

File hashes

Hashes for llm_parse-0.1.3-py3-none-any.whl
Algorithm Hash digest
SHA256 ad6a0e3a0dbeb339afb34c3909e75289b5bd8aebbae69f37127a61a40d0b86eb
MD5 5a8bbe7dc4285f8c79d41f5e527bb007
BLAKE2b-256 83f6f21d89b11d32d1dff254d20f3e4bb26840de38074977992c1b5b25bd7cdb

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page