Skip to main content

ETL Para banco macro, transforma los chunks de pdf bancarios y los tranforma en embeddings de 768 dimenciones y la sube a s3

Project description

macro-embedding-flow

macro-embedding-flow es una librería de Python para transformar archivos Parquet en embeddings de 768 dimensiones usando modelos de Sentence Transformers, y subirlos nuevamente a S3 ya procesados.


Español

Descripción

Esta librería permite procesar un archivo Parquet chunk desde un bucket S3 y generar embeddings de 768 dimensiones. El resultado se guarda como un nuevo archivo Parquet y se sube al mismo bucket S3, en la ruta indicada por la variable de entorno AWS_S3_PREFIX_EMBEDDINGS.

Instalación

pip install macro-embedding-flow
AWS_ACCESS_KEY_ID=
AWS_SECRET_ACCESS_KEY=
AWS_DEFAULT_REGION=
AWS_BUCKET_NAME=
AWS_S3_PREFIX_EMBEDDINGS=

##Uso básico

from macro_embedding_flow import transform

# URL S3 del archivo parquet chunk
s3_url = "s3://mi-bucket/path/al/chunk.parquet"

# Transformar y subir embeddings
transform(s3_url)

macro-embedding-flow

macro-embedding-flow is a Python library for transforming Parquet files into 768-dimensional embeddings using Sentence Transformers models and uploading them back to S3 after processing.


English

Description

This library allows you to process a Parquet chunk file from an S3 bucket and generate 768-dimensional embeddings. The result is saved as a new Parquet file and uploaded to the same S3 bucket, in the path specified by the AWS_S3_PREFIX_EMBEDDINGS environment variable.

Facility

pip install macro-embedding-flow
AWS_ACCESS_KEY_ID=
AWS_SECRET_ACCESS_KEY=
AWS_DEFAULT_REGION=
AWS_BUCKET_NAME=
AWS_S3_PREFIX_EMBEDDINGS=

##Basic use

from macro_embedding_flow import transform

# S3 URL of the parquet chunk file
s3_url = "s3://mi-bucket/path/al/chunk.parquet"

# Transform and upload embeddings
transform(s3_url)

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

macro_embedding_flow-0.0.1.tar.gz (6.1 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

macro_embedding_flow-0.0.1-py3-none-any.whl (8.3 kB view details)

Uploaded Python 3

File details

Details for the file macro_embedding_flow-0.0.1.tar.gz.

File metadata

  • Download URL: macro_embedding_flow-0.0.1.tar.gz
  • Upload date:
  • Size: 6.1 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.9.23

File hashes

Hashes for macro_embedding_flow-0.0.1.tar.gz
Algorithm Hash digest
SHA256 9fbd3217de416954d11e702eb53ac4bc80160daf4ac5a1542bed7dbcd32210a3
MD5 56edf3c4269435645a9f00bff9b06161
BLAKE2b-256 ef6a39839002974a844b6d608d579d337c9ae8a4ce6eaaaa6b1cbd5db0d24982

See more details on using hashes here.

File details

Details for the file macro_embedding_flow-0.0.1-py3-none-any.whl.

File metadata

File hashes

Hashes for macro_embedding_flow-0.0.1-py3-none-any.whl
Algorithm Hash digest
SHA256 f4334e3c751045510fce7351263ce28e1ee02b5b45c55b788b961fab40f481ba
MD5 d3aa047bf640ec35880b29692dd0d510
BLAKE2b-256 aacb9dbe3982842323341879458cf108c35705962ae86b21b3f84d09c2919af4

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page