Skip to main content

Parse files into RAG-Optimized formats.

Project description

LlamaParse

⚠️ DEPRECATION NOTICE

This repository and its packages are deprecated and will be maintained until May 1, 2026.

Please migrate to the new packages:

  • Python: pip install llama-cloud>=1.0 (GitHub)
  • TypeScript: npm install @llamaindex/llama-cloud (GitHub)

The new packages provide the same functionality with improved performance, better support, and active development.

PyPI - Downloads GitHub contributors Discord

LlamaParse is a GenAI-native document parser that can parse complex document data for any downstream LLM use case (RAG, agents).

It is really good at the following:

  • Broad file type support: Parsing a variety of unstructured file types (.pdf, .pptx, .docx, .xlsx, .html) with text, tables, visual elements, weird layouts, and more.
  • Table recognition: Parsing embedded tables accurately into text and semi-structured representations.
  • Multimodal parsing and chunking: Extracting visual elements (images/diagrams) into structured formats and return image chunks using the latest multimodal models.
  • Custom parsing: Input custom prompt instructions to customize the output the way you want it.

LlamaParse directly integrates with LlamaIndex.

The free plan is up to 1000 pages a day. Paid plan is free 7k pages per week + 0.3c per additional page by default. There is a sandbox available to test the API https://cloud.llamaindex.ai/parse ↗.

Read below for some quickstart information, or see the full documentation.

If you're a company interested in enterprise RAG solutions, and/or high volume/on-prem usage of LlamaParse, come talk to us.

Getting Started

First, login and get an api-key from https://cloud.llamaindex.ai/api-key ↗.

Then, make sure you have the latest LlamaIndex version installed.

NOTE: If you are upgrading from v0.9.X, we recommend following our migration guide, as well as uninstalling your previous version first.

pip uninstall llama-index  # run this if upgrading from v0.9.x or older
pip install -U llama-index --upgrade --no-cache-dir --force-reinstall

Lastly, install the package:

pip install llama-parse

Now you can parse your first PDF file using the command line interface. Use the command llama-parse [file_paths]. See the help text with llama-parse --help.

export LLAMA_CLOUD_API_KEY='llx-...'

# output as text
llama-parse my_file.pdf --result-type text --output-file output.txt

# output as markdown
llama-parse my_file.pdf --result-type markdown --output-file output.md

# output as raw json
llama-parse my_file.pdf --output-raw-json --output-file output.json

You can also create simple scripts:

import nest_asyncio

nest_asyncio.apply()

from llama_parse import LlamaParse

parser = LlamaParse(
    api_key="llx-...",  # can also be set in your env as LLAMA_CLOUD_API_KEY
    result_type="markdown",  # "markdown" and "text" are available
    num_workers=4,  # if multiple files passed, split in `num_workers` API calls
    verbose=True,
    language="en",  # Optionally you can define a language, default=en
)

# sync
documents = parser.load_data("./my_file.pdf")

# sync batch
documents = parser.load_data(["./my_file1.pdf", "./my_file2.pdf"])

# async
documents = await parser.aload_data("./my_file.pdf")

# async batch
documents = await parser.aload_data(["./my_file1.pdf", "./my_file2.pdf"])

Using with file object

You can parse a file object directly:

import nest_asyncio

nest_asyncio.apply()

from llama_parse import LlamaParse

parser = LlamaParse(
    api_key="llx-...",  # can also be set in your env as LLAMA_CLOUD_API_KEY
    result_type="markdown",  # "markdown" and "text" are available
    num_workers=4,  # if multiple files passed, split in `num_workers` API calls
    verbose=True,
    language="en",  # Optionally you can define a language, default=en
)

file_name = "my_file1.pdf"
extra_info = {"file_name": file_name}

with open(f"./{file_name}", "rb") as f:
    # must provide extra_info with file_name key with passing file object
    documents = parser.load_data(f, extra_info=extra_info)

# you can also pass file bytes directly
with open(f"./{file_name}", "rb") as f:
    file_bytes = f.read()
    # must provide extra_info with file_name key with passing file bytes
    documents = parser.load_data(file_bytes, extra_info=extra_info)

Using with SimpleDirectoryReader

You can also integrate the parser as the default PDF loader in SimpleDirectoryReader:

import nest_asyncio

nest_asyncio.apply()

from llama_parse import LlamaParse
from llama_index.core import SimpleDirectoryReader

parser = LlamaParse(
    api_key="llx-...",  # can also be set in your env as LLAMA_CLOUD_API_KEY
    result_type="markdown",  # "markdown" and "text" are available
    verbose=True,
)

file_extractor = {".pdf": parser}
documents = SimpleDirectoryReader(
    "./data", file_extractor=file_extractor
).load_data()

Full documentation for SimpleDirectoryReader can be found on the LlamaIndex Documentation.

Examples

Several end-to-end indexing examples can be found in the examples folder

Documentation

https://docs.cloud.llamaindex.ai/

Terms of Service

See the Terms of Service Here.

Get in Touch (LlamaCloud)

LlamaParse is part of LlamaCloud, our e2e enterprise RAG platform that provides out-of-the-box, production-ready connectors, indexing, and retrieval over your complex data sources. We offer SaaS and VPC options.

LlamaCloud is currently available via waitlist (join by creating an account). If you're interested in state-of-the-art quality and in centralizing your RAG efforts, come get in touch with us.

Project details


Release history Release notifications | RSS feed

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

llama_parse-0.6.94.tar.gz (4.0 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

llama_parse-0.6.94-py3-none-any.whl (5.4 kB view details)

Uploaded Python 3

File details

Details for the file llama_parse-0.6.94.tar.gz.

File metadata

  • Download URL: llama_parse-0.6.94.tar.gz
  • Upload date:
  • Size: 4.0 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: uv/0.10.2 {"installer":{"name":"uv","version":"0.10.2","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"Ubuntu","version":"24.04","id":"noble","libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":true}

File hashes

Hashes for llama_parse-0.6.94.tar.gz
Algorithm Hash digest
SHA256 d9e4347ec6caa1e9d5266cc5d4d8b29a29aa0d0948d921a26c73d3e4aaf5ba72
MD5 17e6f7454d2a15a4da96f6b0b8c4c9f0
BLAKE2b-256 97550d089a90fe98027853dd41dead1f094458df1a964e0d7c379b6a44208761

See more details on using hashes here.

File details

Details for the file llama_parse-0.6.94-py3-none-any.whl.

File metadata

  • Download URL: llama_parse-0.6.94-py3-none-any.whl
  • Upload date:
  • Size: 5.4 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: uv/0.10.2 {"installer":{"name":"uv","version":"0.10.2","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"Ubuntu","version":"24.04","id":"noble","libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":true}

File hashes

Hashes for llama_parse-0.6.94-py3-none-any.whl
Algorithm Hash digest
SHA256 48f21d909696597bf992acf5017685d88a6b25604bc1e98775021f39fe66649f
MD5 8abf3cb9098e28eb6e6b0893b3d79924
BLAKE2b-256 bd703d4cc14f99c80491401d4ab514f3ebe3113e38c8017cd384a73dc67b3ae4

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page