Skip to main content

Analyzes data files for common structures

Project description

structa is a small utility for analyzing repeating structures in large data sources. Typically this is something like a document oriented database in JSON format, or a CSV file of a database dump, or a YAML document.

Usage

Use from the command line:

structa <filename>

The usual --help and --version switches are available for more information. The full documentation may also help understanding the myriad switches!

Examples

The People in Space API shows the number of people currently in space, and their names and craft name:

curl -s http://api.open-notify.org/astros.json | structa

Output:

{
    'message': str range="success" pattern="success",
    'number': int range=10,
    'people': [
        {
            'craft': str range="ISS".."Tiangong",
            'name': str range="Akihiko Hoshide".."Thomas Pesquet"
        }
    ]
}

The Python Package Index (PyPI) provides a JSON API for packages:

curl -s https://pypi.org/pypi/numpy/json | structa

Output:

{
    'info': { str: value },
    'last_serial': int range=9.0M,
    'releases': {
        str range="0.9.6".."1.9.3": [
            {
                'comment_text': str,
                'digests': {
                    'md5': str pattern="xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx",
                    'sha256': str pattern="xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx"
                },
                'downloads': int range=-1,
                'filename': str,
                'has_sig': bool,
                'md5_digest': str pattern="xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx",
                'packagetype': str range="bdist_wheel".."sdist",
                'python_version': str range="2.5".."source",
                'requires_python': value,
                'size': int range=1.9M..24.5M,
                'upload_time': str of timestamp range=2006-12-02 02:07:43..2020-12-25 03:30:00 pattern=%Y-%m-%dT%H:%M:%S,
                'upload_time_iso_8601': str of timestamp range=2009-04-06 06:19:25..2020-12-25 03:30:00 pattern=%Y-%m-%dT%H:%M:%S.%f%z,
                'url': URL,
                'yanked': bool,
                'yanked_reason': value
            }
        ]
    },
    'urls': [
        {
            'comment_text': str range="",
            'digests': {
                'md5': str pattern="xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx",
                'sha256': str pattern="xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx"
            },
            'downloads': int range=-1,
            'filename': str,
            'has_sig': bool,
            'md5_digest': str pattern="xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx",
            'packagetype': str range="bdist_wheel" pattern="bdist_wheel",
            'python_version': str range="cp36".."pp36" pattern="Ip3d",
            'requires_python': str range="&gt;=3.6" pattern="&gt;=3.6",
            'size': int range=7.3M..15.4M,
            'upload_time': str of timestamp range=2020-11-02 15:46:22..2020-11-02 16:18:20 pattern=%Y-%m-%dT%H:%M:%S,
            'upload_time_iso_8601': str of timestamp range=2020-11-02 15:46:22..2020-11-02 16:18:20 pattern=%Y-%m-%dT%H:%M:%S.%f%z,
            'url': URL,
            'yanked': bool,
            'yanked_reason': value
        }
    ]
}

The Ubuntu Security Notices database contains the list of all security issues in releases of Ubuntu (warning, this one takes some time to analyze and eats about a gigabyte of RAM while doing so):

curl -s https://usn.ubuntu.com/usn-db/database.json | structa

Output:

{
    str range="1430-1".."4630-1" pattern="dddd-d": {
        'action'?: str,
        'cves': [ str ],
        'description': str,
        'id': str range="1430-1".."4630-1" pattern="dddd-d",
        'isummary'?: str,
        'releases': {
            str range="artful".."zesty": {
                'allbinaries'?: {
                    str: { 'version': str }
                },
                'archs'?: {
                    str range="all".."source": {
                        'urls': {
                            URL: {
                                'md5': str pattern="xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx",
                                'size': int range=20..1.2G
                            }
                        }
                    }
                },
                'binaries': {
                    str: { 'version': str }
                },
                'sources': {
                    str: {
                        'description': str,
                        'version': str
                    }
                }
            }
        },
        'summary': str,
        'timestamp': float of timestamp range=2012-04-27 12:57:41..2020-11-11 18:01:48,
        'title': str
    }
}

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

structa-0.2.tar.gz (39.3 kB view details)

Uploaded Source

Built Distribution

structa-0.2-py3-none-any.whl (41.6 kB view details)

Uploaded Python 3

File details

Details for the file structa-0.2.tar.gz.

File metadata

  • Download URL: structa-0.2.tar.gz
  • Upload date:
  • Size: 39.3 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.1.1 pkginfo/1.4.2 requests/2.22.0 setuptools/45.2.0 requests-toolbelt/0.8.0 tqdm/4.30.0 CPython/3.8.10

File hashes

Hashes for structa-0.2.tar.gz
Algorithm Hash digest
SHA256 0a03ca716f76f44a64c64dde13d4c913a64496ac795742b46c9d784a3ac9a09d
MD5 677ece819ce62bd8625c4e4eed7732cd
BLAKE2b-256 d625177223c236dec1e0864d5f25e4b8e3c48d7b937b011ab1450e2afae0666a

See more details on using hashes here.

File details

Details for the file structa-0.2-py3-none-any.whl.

File metadata

  • Download URL: structa-0.2-py3-none-any.whl
  • Upload date:
  • Size: 41.6 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.1.1 pkginfo/1.4.2 requests/2.22.0 setuptools/45.2.0 requests-toolbelt/0.8.0 tqdm/4.30.0 CPython/3.8.10

File hashes

Hashes for structa-0.2-py3-none-any.whl
Algorithm Hash digest
SHA256 b0bd10756ccea305725fa08ff56cbdae1bd89363c45cde3b3a0c08a730a06d01
MD5 7fe52f9c3366dd871ec616261126c847
BLAKE2b-256 077e63893205932c6fa9042c5e1f83a0dc2d7ee9f2c96ecf443345d99d5e08b3

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page