Skip to main content

Sycamore is an LLM-powered semantic data preparation system for building search applications.

Project description

SycamoreLogoFinal.svg

PyPI PyPI - Python Version Slack Docs License

Sycamore is an open source, AI-powered document processing engine for ETL, RAG, LLM-based applications, and analytics on unstructured data. Sycamore can partition and enrich a wide range of document types including reports, presentations, transcripts, manuals, and more. It can analyze and chunk complex documents such as PDFs and images with embedded tables, figures, graphs, and other infographics. Check out an example notebook.

For processing documents, Sycamore leverages Aryn DocParse (formerly known as the Aryn Partitioning Service), a serverless, GPU-powered API for segmenting and labeling documents, doing OCR, extracting tables and images, and more. It levereages Aryn's state-of-the-art, open source deep learning DETR AI model trained on 80k+ enterprise documents, and it can lead to 6x more accurate data chunking and 2x improved recall on hybrid search or RAG when compared to alternate systems. You can sign-up for free here, or choose to run the Aryn Partitioner locally.

Aryn DocParse takes documents and returns the partitioned output in JSON, and you can use Sycamore for additional data extraction, enrichment, transforms, cleaning, and loading into downstream databases. You can choose the LLMs to use with these transforms.

Sycamore reliably loads your vector databases and hybrid search engines, including as OpenSearch, ElasticSearch, Pinecone, DuckDB and Weaviate, with higher quality data.

The Sycamore framework is built around a scalable and robust abstraction for document processing called a DocSet, and includes powerful high-level transformations in Python for data processing, enrichment, and cleaning. DocSets also encapsulate scalable data processing techniques removing the undifferentiated heavy lifting of reliably loading chunks. DocSets' functional programming approach allows you to rapidly customize and experiment with your chunking for better quality RAG results.

Untitled

Features

  • Integrated with Aryn DocParse, using a state-of-the art vision AI model for segmentation and preserving the semantic structure of documents
  • DocSet abstraction to scalably and reliably transform and manipulate unstructured documents
  • High-quality table extraction, OCR, visual summarization, LLM-powered UDFs, and other performant Python data transforms
  • Quickly create vector embeddings using your choice of AI model
  • Helpful features like automatic data crawlers (Amazon S3 and HTTP), Jupyter notebook for writing and iterating on jobs, and an OpenSearch hybrid search and RAG engine for testing
  • Scalable Ray backend

Demo

Introduction to Aryn DocParse

Get Started

Sycamore currently runs on Linux and Mac OS. To install , run:

pip install sycamore-ai

Sycamore provides connectors to vector databases via Python extras. To install a connector, include it as an extra with your pip install. For example,

pip install sycamore-ai[duckdb]

Supported connectors include duckdb, elasticsearch, opensearch, pinecone, and weaviate.

To use Aryn DocParse, sign-up for free here and use the API key.

Resources

Contributing

Check out our Contributing Guide for more information about how to contribute to Sycamore and set up your environment for development.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

sycamore_ai-0.1.34.tar.gz (18.6 MB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

sycamore_ai-0.1.34-py3-none-any.whl (18.8 MB view details)

Uploaded Python 3

File details

Details for the file sycamore_ai-0.1.34.tar.gz.

File metadata

  • Download URL: sycamore_ai-0.1.34.tar.gz
  • Upload date:
  • Size: 18.6 MB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for sycamore_ai-0.1.34.tar.gz
Algorithm Hash digest
SHA256 3fd88111f553ae43c4538ebd936ac2da5910dd9e2f9baa8e77ba53ec1962013e
MD5 4d6cab58afd2f8cc889ebc73a1b1052a
BLAKE2b-256 3a0417a9e6b16f983665ecbb206e69777136867499411477ee2398f0b82fcaea

See more details on using hashes here.

Provenance

The following attestation bundles were made for sycamore_ai-0.1.34.tar.gz:

Publisher: pypi_release.yml on aryn-ai/sycamore

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file sycamore_ai-0.1.34-py3-none-any.whl.

File metadata

  • Download URL: sycamore_ai-0.1.34-py3-none-any.whl
  • Upload date:
  • Size: 18.8 MB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for sycamore_ai-0.1.34-py3-none-any.whl
Algorithm Hash digest
SHA256 a65d49bd81bf957c58ca7a1596bcdf284223bd2c26e117a5e1373cd876b67eab
MD5 8917b44763457d7b6539c5692a2b0297
BLAKE2b-256 57d31369bf9802762cfe4a1553e500ea5b3335aa46d611e3e517c04d23e0bce1

See more details on using hashes here.

Provenance

The following attestation bundles were made for sycamore_ai-0.1.34-py3-none-any.whl:

Publisher: pypi_release.yml on aryn-ai/sycamore

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page