Skip to main content

pegasus - Pytorch

Project description

Multi-Modality

PegasusX: The Future of Multimodal Embeddings 🦄 🦄

Pegasus Banner

GitHub issues GitHub forks GitHub stars GitHub license GitHub star chart Dependency Status Downloads

Share on Social Media

Twitter Facebook LinkedIn Reddit Hacker News Pinterest WhatsApp

Welcome to PegasusX, the latest and most advanced package for creating high-quality embeddings from multimodal data. We're pushing the boundaries of what's possible with machine learning, enabling tasks and applications that were once mere visions of the future.

In essence, PegasusX is designed to transform the way we look at data. Our aim is to make it easier for anyone, regardless of their domain or discipline, to generate task-specific, high-quality embeddings from any type of data, be it text, image, video, audio, or even more complex data types..

Documentation

Installation

Sure, here is the modified section including installation instructions using git clone:

Git Clone Installation

There are 2 methods of installation. Currently, we're experiencing some path errors with pip installation. For a smooth installation, we recommend using git clone:

git clone https://github.com/kyegomez/Pegasus.git
cd Pegasus
pip install -r requirements.txt

To validate your installation, you can run the provided example:

python3 example.py

Usage

from pegasus import Pegasus

# For video and audio modalities, you can initialize the Pegasus class with "Pegasus('vision')" or "Pegasus('audio')" respectively, then pass in the file path of the vision or audio data
pegasus = Pegasus("text", multi_process=False, n_processes=4)

text_data = [
    'This is a query about artificial intelligence',
    'Another query about machine learning',
    'Yet another query about deep learning',
    'And one more about natural language processing'
]

embeddings = pegasus.embed_data(text_data)

print(embeddings)

Pip Installation

Please help us with this file path errors, they are very annoying.

pip install pegasusx

Pip Usage

from pegasus import Pegasus

# for video, audio do "Pegasus('vision'), Pegasus("audio") respectively then pass in the file path of the vision or audio data
pegasus = Pegasus("text", multi_process=False, n_processes=4)

text_data = ['This is a query about artificial intelligence',
             'Another query about machine learning',
             'Yet another query about deep learning',
             'And one more about natural language processing']

embeddings = pegasus.embed_data(text_data)

print(embeddings)

Features

PegasusX is not just another run-of-the-mill machine learning package. We've painstakingly crafted this package, ensuring it includes features that set it apart:

  1. Multimodal Data Understanding: From text to images, audio, and more, PegasusX is designed to handle and understand a wide array of data types.

  2. Personalized for Any Task: PegasusX adapts to your specific task, generating high-quality, task-specific embeddings for a wide variety of applications.

  3. Scalability & Performance: PegasusX has been optimized for efficiency and can scale according to the demands of your tasks, ensuring seamless operation even with large amounts of data.

  4. Open Source: We believe in the power of community and collaboration. PegasusX is an open-source project, welcoming contributions and improvements from the global developer community.

Contributing to PegasusX

We are thrilled to invite you to be a part of the PegasusX project. This is not just an open source project but a community initiative, and we value your expertise and creativity. To show our appreciation, we have instituted a unique rewards system that directly compensates contributors from the revenue generated by the PegasusX API.

Why Contribute

Contributing to PegasusX not only enhances your skills and profile but also comes with financial rewards. When you contribute code, documentation, or any form of improvement to the PegasusX project, you are adding value. As such, we believe it's only fair that you share in the rewards.

Rewards Program

Here's how the PegasusX Rewards Program works:

  1. Submit a Pull Request: This can be a code enhancement, bug fix, documentation update, new feature, or any improvement to the project.

  2. Review and Approval: Our team will review your contribution. If it gets approved and merged, you become eligible for the rewards program.

  3. Revenue Share: Once your pull request is merged, you will receive a percentage of the revenue generated by the PegasusX API. The percentage will be determined based on the significance and impact of your contribution.

Becoming a Paid API

As part of our growth strategy, we will be deploying PegasusX as a Paid API. The revenue generated from this API will not only sustain and further the project, but also fund the rewards program.

How to Start Contributing

If you're ready to become a part of PegasusX and contribute to the future of multimodal embeddings, here's what you need to do:

  1. Fork the repository.

  2. Make your improvements or additions in your forked repository.

  3. Submit a pull request detailing the changes you've made.

  4. Our team will review your submission. If it's approved, it will be merged into the main repository, and you will become part of the PegasusX Rewards Program.

Roadmap

PegasusX is a constant work in progress, and we're always striving for better. Our roadmap provides a snapshot of where we're heading.

  • Reconfiguring the ImageBind Model: To improve our handling of diverse data, we are reconfiguring the ImageBind model to utilize Flash Attention. This shift will allow us to manage longer context lengths and handle more complex inputs effectively.

  • Pretraining with Diverse Datasets: Quality embeddings require quality training. To ensure our model is versatile and robust, we're pretraining PegasusX using the same datasets that ImageBind has been trained on. This step ensures our model inherits the benefits of proven training methodologies while also incorporating our enhancements.

  • Benchmarking: It's important to know where we stand. After we've reconfigured and pretrained our model, we will conduct comprehensive benchmarking tests. This process will highlight any areas of strength or potential improvement, allowing us to further refine our model.

  • Finetuning on Long Samples: We believe that PegasusX can handle more than short snippets of data. To prove this, we'll finetune our model using long data samples, pushing the boundaries of what's possible with embedding models.

  • Continued Innovation: Our roadmap doesn't stop with finetuning. As we move forward, we're excited to explore new methodologies and techniques to enhance PegasusX.

  • Advanced Training Techniques: We'll look into more sophisticated methods to make the training process faster and more efficient.

  • Expanding Modality Types: We aim to support more types of modalities, ensuring that PegasusX is truly a universal tool for multi-modal data.

  • Integration with More Frameworks: We want PegasusX to be accessible and easy to use with popular machine learning and data processing frameworks.

  • Optimizing for Real-Time Processing: We're focused on making PegasusX capable of generating embeddings in real-time, a critical feature for many applications.

  • Community Driven Enhancements: We're excited to see what the community suggests and contributes - the possibilities are endless!

  • Production-Level API Deployment: PegasusX will enter Agora's paid API line up so you can effortlessly make API requests and receive your embeddings no complicated setup necessary

  • Making it Extremely Fast Through Quantization: By utilizing quantization techniques, we aim to significantly increase the speed and efficiency of the PegasusX model.

  • Parallelization, Asynchrony, and Other Optimizations: To ensure seamless operation even with large amounts of data, we're planning to implement parallelization, asynchronous operations, and other optimizations in the model.

  • Remake in Jax using dynamic sparse flash attention

Thank you for considering contributing to PegasusX. Your expertise and commitment to this project are what make it thrive. Let's build the future of multimodal embeddings together.

Demo


Demos

Swarm Video Demo {Click for more}

Watch the swarm video


Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

pegasusx-0.4.0.tar.gz (2.8 MB view details)

Uploaded Source

Built Distribution

pegasusx-0.4.0-py3-none-any.whl (2.8 MB view details)

Uploaded Python 3

File details

Details for the file pegasusx-0.4.0.tar.gz.

File metadata

  • Download URL: pegasusx-0.4.0.tar.gz
  • Upload date:
  • Size: 2.8 MB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: poetry/1.3.2 CPython/3.11.0 Darwin/22.4.0

File hashes

Hashes for pegasusx-0.4.0.tar.gz
Algorithm Hash digest
SHA256 deda54e89891314a2ac4adbc42287b3bca4beeeb614f9e87c3e2eb52641f83c7
MD5 ef1594ee7e2577e317e81631d21a84e1
BLAKE2b-256 a8a83e6eb28e5a7e072df11d1b59b9cd9d99ea1ac0dea787b7f2af00fd838c1c

See more details on using hashes here.

File details

Details for the file pegasusx-0.4.0-py3-none-any.whl.

File metadata

  • Download URL: pegasusx-0.4.0-py3-none-any.whl
  • Upload date:
  • Size: 2.8 MB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: poetry/1.3.2 CPython/3.11.0 Darwin/22.4.0

File hashes

Hashes for pegasusx-0.4.0-py3-none-any.whl
Algorithm Hash digest
SHA256 5ee69444a6febd37b3c16a7f342419f0d2c03125ab8f8fcff3ec636b5464968f
MD5 58becd74a07534d40fa1373c4f1517ca
BLAKE2b-256 9a8dc936435c6b8f03c6d7925fd3405d8a699af0624bad1a509dd6edc5efcedb

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page