Data extraction and analysis tool
Project description
GitData
Data Wrangling for Everyone.
GitData is an easy to use, fast, scalable, distributed data extraction system with a rich set of commands that provide ways to gather, manage and query data in an unusually rich variety of ways.
Concepts
GitData stores data as facts.
Facts are triples of the form (subject, predicate, object) where subject is typically an entity, predicate is typically an attribute of that entity and object is the value of the attribute. In the case where the attribute represents a relationship between entities, the object is another entity.
Commands
GitData shares many of the commands and concepts you are familiar with from the git source code revision control system with some important differences which make it ideal for working with data.
Data repostitories
Data repositories are where GitData stores the data it is managing. That data is typically pulled in from other data sources and is stored in the data repository for quick access.
gitdata init # initialize a new data repository gitdata status # show repository status
Remotes
Remotes are connections you can establish within your data repository to make it easier to access data from external sources like the internet or somewhere on your network or even a local disk. When you add a remote you give it a name which can then be used to refer to that remote from within the repostitory.
To see the remotes for a data repository you can just run the gitdata remote
command
which will list the names of the repositories. If you want to see the URLs the remotes
correspond to you can use the -v
flag to produce a verbose listing.
gitdata remote # list remotes gitdata remote -v # verbose list remotes
Adding Remotes
Adding a remote so you can refer to the remote by the short name is as simple as
using remote add <shortname> <url>
.
Removing Remotes
You can remove a remote from your project by using the gitdata remote rm <shortname>
command.
Showing
Data repositories are a collection of entities containing facts. To view any
entity within the repostitory you can use the gitdata show <name>
command, where
name is the name of the entity. So, for example, if you've stored a remote
in your repostitry, you can see the details of that remote by using the show
command.
Fetch
The gitdata fetch command copies facts from a somewhere else into your gitdata repository. The location being fetched from can be a remote or can be anywhere else you can get to from your computer. The facts fetched will be placed into a temporary holding area that will allow you to work with them without committing to making them a permanent part of your repository.
To fetch simply gitdata fetch <location>
where <location>
is either a remote
that you've already added to your repository, or any other location such as a URL
or a local file.
When you run fetch
it will read the data in whatever form it is and digest it
into facts ready for you to work with alongside any other data in your repository.
If you decide you want to keep the facts as part of your data repository then you
can use the gitdata add
and gitdata commit
commands to add them to your data
repository.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
File details
Details for the file gitdata-cli-0.2.0.tar.gz
.
File metadata
- Download URL: gitdata-cli-0.2.0.tar.gz
- Upload date:
- Size: 13.5 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/2.0.0 pkginfo/1.5.0.1 requests/2.25.1 setuptools/40.8.0 requests-toolbelt/0.9.1 tqdm/4.38.0 CPython/3.7.3
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 0537c020890f2bf4156f3f4eae712f80b79f4df30d54a916b9e41ee8ae5c733d |
|
MD5 | 1c0038b6ae91d5da58636b02d8822fab |
|
BLAKE2b-256 | 36a8e6e69b6f87114dc9957913a3af675171aeeee24543eedb4215e76b16cc01 |
File details
Details for the file gitdata_cli-0.2.0-py3-none-any.whl
.
File metadata
- Download URL: gitdata_cli-0.2.0-py3-none-any.whl
- Upload date:
- Size: 18.7 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/2.0.0 pkginfo/1.5.0.1 requests/2.25.1 setuptools/40.8.0 requests-toolbelt/0.9.1 tqdm/4.38.0 CPython/3.7.3
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 7c5a6d412ddd351c6e735c3394c6e11a4b2cefbb6962c6762756d2a091e3f326 |
|
MD5 | b86d66bb0440973971f9f4d4d1f296a0 |
|
BLAKE2b-256 | 9c505070fb7fa46d2e2e26a3421bd9350c4f129dd84c9a680ecd93be22de4419 |