Skip to main content

Data extraction and analysis tool

Project description

GitData

Data Wrangling for Everyone.

GitData is an easy to use, fast, scalable, distributed data extraction system with a rich set of commands that provide ways to gather, manage and query data in an unusually rich variety of ways.

Concepts

GitData stores data as facts.

Facts are triples of the form (subject, predicate, object) where subject is typically an entity, predicate is typically an attribute of that entity and object is the value of the attribute. In the case where the attribute represents a relationship between entities, the object is another entity.

Commands

GitData shares many of the commands and concepts you are familiar with from the git source code revision control system with some important differences which make it ideal for working with data.

Data repostitories

Data repositories are where GitData stores the data it is managing. That data is typically pulled in from other data sources and is stored in the data repository for quick access.

gitdata init   # initialize a new data repository
gitdata status # show repository status

Remotes

Remotes are connections you can establish within your data repository to make it easier to access data from external sources like the internet or somewhere on your network or even a local disk. When you add a remote you give it a name which can then be used to refer to that remote from within the repostitory.

To see the remotes for a data repository you can just run the gitdata remote command which will list the names of the repositories. If you want to see the URLs the remotes correspond to you can use the -v flag to produce a verbose listing.

gitdata remote      # list remotes
gitdata remote -v   # verbose list remotes
Adding Remotes

Adding a remote so you can refer to the remote by the short name is as simple as using remote add <shortname> <url>.

Removing Remotes

You can remove a remote from your project by using the gitdata remote rm <shortname> command.

Showing

Data repositories are a collection of entities containing facts. To view any entity within the repostitory you can use the gitdata show <name> command, where name is the name of the entity. So, for example, if you've stored a remote in your repostitry, you can see the details of that remote by using the show command.

Fetch

The gitdata fetch command copies facts from a somewhere else into your gitdata repository. The location being fetched from can be a remote or can be anywhere else you can get to from your computer. The facts fetched will be placed into a temporary holding area that will allow you to work with them without committing to making them a permanent part of your repository.

To fetch simply gitdata fetch <location> where <location> is either a remote that you've already added to your repository, or any other location such as a URL or a local file.

When you run fetch it will read the data in whatever form it is and digest it into facts ready for you to work with alongside any other data in your repository. If you decide you want to keep the facts as part of your data repository then you can use the gitdata add and gitdata commit commands to add them to your data repository.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

gitdata-cli-0.2.0.tar.gz (13.5 kB view details)

Uploaded Source

Built Distribution

gitdata_cli-0.2.0-py3-none-any.whl (18.7 kB view details)

Uploaded Python 3

File details

Details for the file gitdata-cli-0.2.0.tar.gz.

File metadata

  • Download URL: gitdata-cli-0.2.0.tar.gz
  • Upload date:
  • Size: 13.5 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/2.0.0 pkginfo/1.5.0.1 requests/2.25.1 setuptools/40.8.0 requests-toolbelt/0.9.1 tqdm/4.38.0 CPython/3.7.3

File hashes

Hashes for gitdata-cli-0.2.0.tar.gz
Algorithm Hash digest
SHA256 0537c020890f2bf4156f3f4eae712f80b79f4df30d54a916b9e41ee8ae5c733d
MD5 1c0038b6ae91d5da58636b02d8822fab
BLAKE2b-256 36a8e6e69b6f87114dc9957913a3af675171aeeee24543eedb4215e76b16cc01

See more details on using hashes here.

File details

Details for the file gitdata_cli-0.2.0-py3-none-any.whl.

File metadata

  • Download URL: gitdata_cli-0.2.0-py3-none-any.whl
  • Upload date:
  • Size: 18.7 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/2.0.0 pkginfo/1.5.0.1 requests/2.25.1 setuptools/40.8.0 requests-toolbelt/0.9.1 tqdm/4.38.0 CPython/3.7.3

File hashes

Hashes for gitdata_cli-0.2.0-py3-none-any.whl
Algorithm Hash digest
SHA256 7c5a6d412ddd351c6e735c3394c6e11a4b2cefbb6962c6762756d2a091e3f326
MD5 b86d66bb0440973971f9f4d4d1f296a0
BLAKE2b-256 9c505070fb7fa46d2e2e26a3421bd9350c4f129dd84c9a680ecd93be22de4419

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page