pegasus - Pytorch

These details have not been verified by PyPI

Project links

GitHub Statistics

View statistics for this project via Libraries.io, or by using our public dataset on Google BigQuery

Project description

PegasusX: The Future of Multimodal Embeddings 🦄 🦄

Pegasus Banner

Share on Social Media

Welcome to PegasusX, the latest and most advanced package for creating high-quality embeddings from multimodal data. We're pushing the boundaries of what's possible with machine learning, enabling tasks and applications that were once mere visions of the future.

In essence, PegasusX is designed to transform the way we look at data. Our aim is to make it easier for anyone, regardless of their domain or discipline, to generate task-specific, high-quality embeddings from any type of data, be it text, image, video, audio, or even more complex data types..

Documentation

Click here for documentation

Installation

Sure, here is the modified section including installation instructions using git clone:

Git Clone Installation

There are 2 methods of installation. Currently, we're experiencing some path errors with pip installation. For a smooth installation, we recommend using git clone:

git clone https://github.com/kyegomez/Pegasus.git
cd Pegasus
pip install -r requirements.txt

To validate your installation, you can run the provided example:

python3 example.py

Usage

from pegasus import Pegasus

# For video and audio modalities, you can initialize the Pegasus class with "Pegasus('vision')" or "Pegasus('audio')" respectively, then pass in the file path of the vision or audio data
pegasus = Pegasus("text", multi_process=False, n_processes=4)

text_data = [
    'This is a query about artificial intelligence',
    'Another query about machine learning',
    'Yet another query about deep learning',
    'And one more about natural language processing'
]

embeddings = pegasus.embed_data(text_data)

print(embeddings)

Pip Installation

Please help us with this file path errors, they are very annoying.

pip install pegasusx

Pip Usage

from pegasus import Pegasus

# for video, audio do "Pegasus('vision'), Pegasus("audio") respectively then pass in the file path of the vision or audio data
pegasus = Pegasus("text", multi_process=False, n_processes=4)

text_data = ['This is a query about artificial intelligence',
             'Another query about machine learning',
             'Yet another query about deep learning',
             'And one more about natural language processing']

embeddings = pegasus.embed_data(text_data)

print(embeddings)

Features

PegasusX is not just another run-of-the-mill machine learning package. We've painstakingly crafted this package, ensuring it includes features that set it apart:

Multimodal Data Understanding: From text to images, audio, and more, PegasusX is designed to handle and understand a wide array of data types.
Personalized for Any Task: PegasusX adapts to your specific task, generating high-quality, task-specific embeddings for a wide variety of applications.
Scalability & Performance: PegasusX has been optimized for efficiency and can scale according to the demands of your tasks, ensuring seamless operation even with large amounts of data.
Open Source: We believe in the power of community and collaboration. PegasusX is an open-source project, welcoming contributions and improvements from the global developer community.

Contributing to PegasusX

We are thrilled to invite you to be a part of the PegasusX project. This is not just an open source project but a community initiative, and we value your expertise and creativity. To show our appreciation, we have instituted a unique rewards system that directly compensates contributors from the revenue generated by the PegasusX API.

Why Contribute

Contributing to PegasusX not only enhances your skills and profile but also comes with financial rewards. When you contribute code, documentation, or any form of improvement to the PegasusX project, you are adding value. As such, we believe it's only fair that you share in the rewards.

Rewards Program

Here's how the PegasusX Rewards Program works:

Submit a Pull Request: This can be a code enhancement, bug fix, documentation update, new feature, or any improvement to the project.
Review and Approval: Our team will review your contribution. If it gets approved and merged, you become eligible for the rewards program.
Revenue Share: Once your pull request is merged, you will receive a percentage of the revenue generated by the PegasusX API. The percentage will be determined based on the significance and impact of your contribution.

Becoming a Paid API

As part of our growth strategy, we will be deploying PegasusX as a Paid API. The revenue generated from this API will not only sustain and further the project, but also fund the rewards program.

How to Start Contributing

If you're ready to become a part of PegasusX and contribute to the future of multimodal embeddings, here's what you need to do:

Fork the repository.
Make your improvements or additions in your forked repository.
Submit a pull request detailing the changes you've made.
Our team will review your submission. If it's approved, it will be merged into the main repository, and you will become part of the PegasusX Rewards Program.

Roadmap

PegasusX is a constant work in progress, and we're always striving for better. Our roadmap provides a snapshot of where we're heading.

Reconfiguring the ImageBind Model: To improve our handling of diverse data, we are reconfiguring the ImageBind model to utilize Flash Attention. This shift will allow us to manage longer context lengths and handle more complex inputs effectively.
Pretraining with Diverse Datasets: Quality embeddings require quality training. To ensure our model is versatile and robust, we're pretraining PegasusX using the same datasets that ImageBind has been trained on. This step ensures our model inherits the benefits of proven training methodologies while also incorporating our enhancements.
Benchmarking: It's important to know where we stand. After we've reconfigured and pretrained our model, we will conduct comprehensive benchmarking tests. This process will highlight any areas of strength or potential improvement, allowing us to further refine our model.
Finetuning on Long Samples: We believe that PegasusX can handle more than short snippets of data. To prove this, we'll finetune our model using long data samples, pushing the boundaries of what's possible with embedding models.
Continued Innovation: Our roadmap doesn't stop with finetuning. As we move forward, we're excited to explore new methodologies and techniques to enhance PegasusX.
Advanced Training Techniques: We'll look into more sophisticated methods to make the training process faster and more efficient.
Expanding Modality Types: We aim to support more types of modalities, ensuring that PegasusX is truly a universal tool for multi-modal data.
Integration with More Frameworks: We want PegasusX to be accessible and easy to use with popular machine learning and data processing frameworks.
Optimizing for Real-Time Processing: We're focused on making PegasusX capable of generating embeddings in real-time, a critical feature for many applications.
Community Driven Enhancements: We're excited to see what the community suggests and contributes - the possibilities are endless!
Production-Level API Deployment: PegasusX will enter Agora's paid API line up so you can effortlessly make API requests and receive your embeddings no complicated setup necessary
Making it Extremely Fast Through Quantization: By utilizing quantization techniques, we aim to significantly increase the speed and efficiency of the PegasusX model.
Parallelization, Asynchrony, and Other Optimizations: To ensure seamless operation even with large amounts of data, we're planning to implement parallelization, asynchronous operations, and other optimizations in the model.
Remake in Jax using dynamic sparse flash attention

Thank you for considering contributing to PegasusX. Your expertise and commitment to this project are what make it thrive. Let's build the future of multimodal embeddings together.

Demo

Demos

Swarm Video Demo {Click for more}

Project details

These details have not been verified by PyPI

Project links

GitHub Statistics

View statistics for this project via Libraries.io, or by using our public dataset on Google BigQuery

Release history Release notifications | RSS feed

This version

0.4.0

Sep 12, 2023

0.3.9

Jul 17, 2023

0.3.8

Jul 17, 2023

0.3.7

Jul 17, 2023

0.3.6

Jul 17, 2023

0.3.5

Jul 17, 2023

0.3.4

Jul 17, 2023

0.3.2

Jul 17, 2023

0.3.1

Jul 17, 2023

0.3.0

Jul 17, 2023

0.2.9

Jul 17, 2023

0.2.7

Jul 17, 2023

0.2.5

Jul 17, 2023

0.2.4

Jul 17, 2023

0.2.2

Jul 17, 2023

0.2.1

Jul 17, 2023

0.2.0

Jul 17, 2023

0.1.7

Jul 17, 2023

0.1.6

Jul 17, 2023

0.1.5

Jul 17, 2023

0.1.4

Jul 17, 2023

0.1.3

Jul 17, 2023

0.1.1

Jul 17, 2023

0.1.0

Jul 17, 2023

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

pegasusx-0.4.0.tar.gz (2.8 MB view hashes)

Uploaded Sep 12, 2023 Source

Built Distribution

pegasusx-0.4.0-py3-none-any.whl (2.8 MB view hashes)

Uploaded Sep 12, 2023 Python 3

Hashes for pegasusx-0.4.0.tar.gz

Hashes for pegasusx-0.4.0.tar.gz
Algorithm	Hash digest
SHA256	`deda54e89891314a2ac4adbc42287b3bca4beeeb614f9e87c3e2eb52641f83c7`
MD5	`ef1594ee7e2577e317e81631d21a84e1`
BLAKE2b-256	`a8a83e6eb28e5a7e072df11d1b59b9cd9d99ea1ac0dea787b7f2af00fd838c1c`

Hashes for pegasusx-0.4.0-py3-none-any.whl

Hashes for pegasusx-0.4.0-py3-none-any.whl
Algorithm	Hash digest
SHA256	`5ee69444a6febd37b3c16a7f342419f0d2c03125ab8f8fcff3ec636b5464968f`
MD5	`58becd74a07534d40fa1373c4f1517ca`
BLAKE2b-256	`9a8dc936435c6b8f03c6d7925fd3405d8a699af0624bad1a509dd6edc5efcedb`