Data extraction and analysis tool
Data Wrangling for Everyone.
GitData is an easy to use, fast, scalable, distributed data extraction system with a rich set of commands that provide ways to gather, manage and query data in an unusually rich variety of ways.
GitData stores data as facts.
Facts are triples of the form (subject, predicate, object) where subject is typically an entity, predicate is typically an attribute of that entity and object is the value of the attribute. In the case where the attribute represents a relationship between entities, the object is another entity.
GitData shares many of the commands and concepts you are familiar with from the git source code revision control system with some important differences which make it ideal for working with data.
Data repositories are where GitData stores the data it is managing. That data is typically pulled in from other data sources and is stored in the data repository for quick access.
gitdata init # initialize a new data repository gitdata status # show repository status
Remotes are connections you can establish within your data repository to make it easier to access data from external sources like the internet or somewhere on your network or even a local disk. When you add a remote you give it a name which can then be used to refer to that remote from within the repostitory.
To see the remotes for a data repository you can just run the
gitdata remote command
which will list the names of the repositories. If you want to see the URLs the remotes
correspond to you can use the
-v flag to produce a verbose listing.
gitdata remote # list remotes gitdata remote -v # verbose list remotes
Adding a remote so you can refer to the remote by the short name is as simple as
remote add <shortname> <url>.
You can remove a remote from your project by using the
gitdata remote rm <shortname> command.
Data repositories are a collection of entities containing facts. To view any
entity within the repostitory you can use the
gitdata show <name> command, where
name is the name of the entity. So, for example, if you've stored a remote
in your repostitry, you can see the details of that remote by using the show
The gitdata fetch command copies facts from a somewhere else into your gitdata repository. The location being fetched from can be a remote or can be anywhere else you can get to from your computer. The facts fetched will be placed into a temporary holding area that will allow you to work with them without committing to making them a permanent part of your repository.
To fetch simply
gitdata fetch <location> where
<location> is either a remote
that you've already added to your repository, or any other location such as a URL
or a local file.
When you run
fetch it will read the data in whatever form it is and digest it
into facts ready for you to work with alongside any other data in your repository.
If you decide you want to keep the facts as part of your data repository then you
can use the
gitdata add and
gitdata commit commands to add them to your data
Release history Release notifications | RSS feed
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Hashes for gitdata_cli-0.2.0-py3-none-any.whl