Skip to main content

FlexiData is an open-source Python package designed for processing unstructured data.

Project description

flexidata

FlexiData is an open-source Python package designed for processing unstructured data. Currently, it supports PDF extraction with plans to expand to other file types in the future.

Features

  • Document Parser: The tool is designed to parse various file types, efficiently extracting text blocks and their metadata. This enables streamlined data extraction and content manipulation across multiple document formats.

    Supported File Formats:

    • PDF: For detailed extraction of text and metadata.
    • Images (JPEG, PNG, BMP): Enables image data and metadata processing.
    • EPUB: Converts and extracts content from EPUB files.
    • HTML: Parses HTML content for data extraction.
    • reStructuredText (RST): Handles conversion and parsing of RST files.
    • Rich Text Format (RTF): Facilitates conversion and content extraction from RTF documents.
    • DOCX: Allows extraction from DOCX documents, providing access to structured content and metadata.

Upcoming Features:

  • Content Chunking: This will enable dividing text into meaningful and manageable pieces for better processing and analysis.
  • Data Embedding: Planned to support embedding textual data into vector spaces for advanced data analysis and machine learning applications.

Contributing

Contributions to FlexiData are welcome! If you're interested in contributing, please read our contributing guidelines.

Development Status

FlexiData is currently in active development and we are working towards releasing our first version soon.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

flexidata-0.0.19.tar.gz (61.5 kB view details)

Uploaded Source

Built Distribution

flexidata-0.0.19-py3-none-any.whl (80.2 kB view details)

Uploaded Python 3

File details

Details for the file flexidata-0.0.19.tar.gz.

File metadata

  • Download URL: flexidata-0.0.19.tar.gz
  • Upload date:
  • Size: 61.5 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/5.1.0 CPython/3.12.2

File hashes

Hashes for flexidata-0.0.19.tar.gz
Algorithm Hash digest
SHA256 dc50385c66f85a0d89e1b818c0cc8133b1c4bb8b8ea0aa37c518d84764a4a975
MD5 c33d2d6691e584c2bd296b308556225c
BLAKE2b-256 0c38b5b82bc8c21abaea5f68b5277ba33ab3c26b5591753382f7aacc274dede5

See more details on using hashes here.

File details

Details for the file flexidata-0.0.19-py3-none-any.whl.

File metadata

  • Download URL: flexidata-0.0.19-py3-none-any.whl
  • Upload date:
  • Size: 80.2 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/5.1.0 CPython/3.12.2

File hashes

Hashes for flexidata-0.0.19-py3-none-any.whl
Algorithm Hash digest
SHA256 3da22c3d0c99487ac88bf3bc8f34b7ada1b9c2e476b310353391e782b5bc5fff
MD5 46e7e6bbf57853ad52d7bee8a20bb205
BLAKE2b-256 eee6e91e2122624e2458f7cad35d1d2e05e06d79804341da16d11b764874ee48

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page