Skip to main content

Create a SQLite database containing metadata from Google Drive

Project description

google-drive-to-sqlite

PyPI Changelog Tests License

Create a SQLite database containing metadata from Google Drive

If you use Google Drive, and especially if you have shared drives with other people there's a good chance you have hundreds or even thousands of files that you may not be fully aware of.

This tool can download metadata about those files - their names, sizes, folders, content types, permissions, creation dates and more - and store them in a SQLite database.

This lets you use SQL to analyze your Google Drive contents, using Datasette or the SQLite command-line tool or any other SQLite database browsing software.

Installation

Install this tool using pip:

pip install google-drive-to-sqlite

Quickstart

Authenticate with Google Drive by running:

google-drive-to-sqlite auth

Now create a SQLite database with metadata about all of the files you have starred using:

google-drive-to-sqlite files starred.db --starred

You can explore the resulting database using Datasette:

$ pip install datasette
$ datasette starred.db
INFO:     Started server process [24661]
INFO:     Uvicorn running on http://127.0.0.1:8001

Authentication

:warning: This application has not yet been verified by Google - you may find you are unable to authenticate until that verification is complete. #10

First, authenticate with Google Drive using the auth command:

$ google-drive-to-sqlite auth
Visit the following URL to authenticate with Google Drive

https://accounts.google.com/o/oauth2/v2/auth?...

Then return here and paste in the resulting code:
Paste code here: 

Follow the link, sign in with Google Drive and then copy and paste the resulting code back into the tool.

This will save an authentication token to the file called auth.json in the current directory.

To specify a different location for that file, use the --auth option:

google-drive-to-sqlite auth --auth ~/google-drive-auth.json

The auth command also provides options for using a different scope, Google client ID and Google client secret. You can use these to create your own custom authentication tokens that can work with other Google APIs, see issue #5 for details.

Full --help:

Usage: google-drive-to-sqlite auth [OPTIONS]

  Authenticate user and save credentials

Options:
  -a, --auth FILE              Path to save token, defaults to auth.json
  --google-client-id TEXT      Custom Google client ID
  --google-client-secret TEXT  Custom Google client secret
  --scope TEXT                 Custom token scope
  --help                       Show this message and exit.

To revoke the token that is stored in auth.json, such that it cannot be used to access Google Drive in the future, run the revoke command:

google-drive-to-sqlite revoke

Or if your token is stored in another location:

google-drive-to-sqlite revoke -a ~/google-drive-auth.json

You will need to obtain a fresh token using the auth command in order to continue using this tool.

google-drive-to-sqlite files

To retrieve metadata about the files in your Google Drive, or a folder or search within it, use the google-drive-to-sqlite files command.

This will default to writing details about every file in your Google Drive to a SQLite database:

google-drive-to-sqlite files files.db

Files and folders will be written to databases tables, which will be created if they do not yet exist. The database schema is shown below.

If a file or folder already exists, based on a matching id, it will be replaced with fresh data.

Instead of writing to SQLite you can use --json to output as JSON, or --nl to output as newline-delimited JSON:

google-drive-to-sqlite files --nl

Use --folder ID to retrieve everything in a specified folder and its sub-folders:

google-drive-to-sqlite files files.db --folder 1E6Zg2X2bjjtPzVfX8YqdXZDCoB3AVA7i

Use --q QUERY to use a custom search query:

google-drive-to-sqlite files files.db -q "viewedByMeTime > '2022-01-01'"

The following shortcut options help build queries:

  • --full-text TEXT to search for files where the full text matches a search term
  • --starred for files and folders you have starred
  • --trashed for files and folders in the trash
  • --shared-with-me for files and folders that have been shared with you

You can combine these - for example, this returns all files that you have starred and that were shared with you:

google-drive-to-sqlite files highlights.db --starred --shared-with-me

You can use --stop-after X to stop after retrieving X files, useful for trying out a new search pattern and seeing results straight away.

The --import-json and --import-nl options are mainly useful for testing and developing this tool. They allow you to replay the JSON or newline-delimited JSON that was previously fetched using --json or --nl and use it to create a fresh SQLite database, without needing to make any outbound API calls:

# Fetch all starred files from the API, write to starred.json
google-drive-to-sqlite files -q 'starred = true' --json > starred.json
# Now import that data into a new SQLite database file
google-drive-to-sqlite files starred.db --import-json starred.json

Full --help:

Usage: google-drive-to-sqlite files [OPTIONS] [DATABASE]

  Retrieve metadata for files in Google Drive, and write to a SQLite database or
  output as JSON.

      google-drive-to-sqlite files files.db

  Use --json to output JSON, --nl for newline-delimited JSON:

      google-drive-to-sqlite files files.db --json

  Use a folder ID to recursively fetch every file in that folder and its sub-
  folders:

      google-drive-to-sqlite files files.db --folder
      1E6Zg2X2bjjtPzVfX8YqdXZDCoB3AVA7i

  Fetch files you have starred:

      google-drive-to-sqlite files starred.db --starred

Options:
  -a, --auth FILE       Path to auth.json token file
  --folder TEXT         Files in this folder ID and its sub-folders
  -q TEXT               Files matching this query
  --full-text TEXT      Search for files with text match
  --starred             Files you have starred
  --trashed             Files in the trash
  --shared-with-me      Files that have been shared with you
  --json                Output JSON rather than write to DB
  --nl                  Output newline-delimited JSON rather than write to DB
  --stop-after INTEGER  Stop paginating after X results
  --import-json FILE    Import from this JSON file instead of the API
  --import-nl FILE      Import from this newline-delimited JSON file
  --help                Show this message and exit.

google-drive-to-sqlite download FILE_ID

The download command can be used to download files from Google Drive.

You'll need one or more file IDs, which look something like 0B32uDVNZfiEKLUtIT1gzYWN2NDI4SzVQYTFWWWxCWUtvVGNB.

To download the file, run this:

google-drive-to-sqlite download 0B32uDVNZfiEKLUtIT1gzYWN2NDI4SzVQYTFWWWxCWUtvVGNB

This will detect the content type of the file and use that as the extension - so if this file is a JPEG the file would be downloaded as:

0B32uDVNZfiEKLUtIT1gzYWN2NDI4SzVQYTFWWWxCWUtvVGNB.jpeg

You can pass multiple file IDs to the command at once.

To hide the progress bar and filename output, use -s or --silent.

If you are downloading a single file you can use the -o output to specify a filename and location:

google-drive-to-sqlite download 0B32uDVNZfiEKLUtIT1gzYWN2NDI4SzVQYTFWWWxCWUtvVGNB \
  -o my-image.jpeg

Use -o - to write the file contents to standard output:

google-drive-to-sqlite download 0B32uDVNZfiEKLUtIT1gzYWN2NDI4SzVQYTFWWWxCWUtvVGNB \
  -o - > my-image.jpeg

Full --help:

Usage: google-drive-to-sqlite download [OPTIONS] FILE_IDS...

  Download one or more file IDs to disk

Options:
  -a, --auth FILE    Path to auth.json token file
  -o, --output FILE  File to write to, or - for standard output
  -s, --silent       Hide progress bar and filename
  --help             Show this message and exit.

google-drive-to-sqlite get URL

The get command makes authenticated requests to the specified URL, using credentials derived from the auth.json file.

For example:

$ google-drive-to-sqlite get 'https://www.googleapis.com/drive/v3/about?fields=*'
{
    "kind": "drive#about",
    "user": {
        "kind": "drive#user",
        "displayName": "Simon Willison",
# ...

If the resource you are fetching supports pagination you can use --paginate key to paginate through all of the rows in a specified key. For example, the following API has a nextPageToken key and a files list, suggesting it supports pagination:

$ google-drive-to-sqlite get https://www.googleapis.com/drive/v3/files
{
    "kind": "drive#fileList",
    "nextPageToken": "~!!~AI9...wogHHYlc=",
    "incompleteSearch": false,
    "files": [
        {
            "kind": "drive#file",
            "id": "1YEsITp_X8PtDUJWHGM0osT-TXAU1nr0e7RSWRM2Jpyg",
            "name": "Title of a spreadsheet",
            "mimeType": "application/vnd.google-apps.spreadsheet"
        },

To paginate through everything in the files list you would use --paginate files like this:

$ google-drive-to-sqlite get https://www.googleapis.com/drive/v3/files --paginate files
[
  {
    "kind": "drive#file",
    "id": "1YEsITp_X8PtDUJWHGM0osT-TXAU1nr0e7RSWRM2Jpyg",
    "name": "Title of a spreadsheet",
    "mimeType": "application/vnd.google-apps.spreadsheet"
  },
  # ...

Add --nl to stream paginated data as newline-delimited JSON:

$ google-drive-to-sqlite get https://www.googleapis.com/drive/v3/files --paginate files --nl
{"kind": "drive#file", "id": "1YEsITp_X8PtDUJWHGM0osT-TXAU1nr0e7RSWRM2Jpyg", "name": "Title of a spreadsheet", "mimeType": "application/vnd.google-apps.spreadsheet"}
{"kind": "drive#file", "id": "1E6Zg2X2bjjtPzVfX8YqdXZDCoB3AVA7i", "name": "Subfolder", "mimeType": "application/vnd.google-apps.folder"}

Add --stop-after 5 to stop after 5 records - useful for testing.

Full --help:

Usage: google-drive-to-sqlite get [OPTIONS] URL

  Make an authenticated HTTP GET to the specified URL

Options:
  -a, --auth FILE       Path to auth.json token file
  --paginate TEXT       Paginate through all results in this key
  --nl                  Output paginated data as newline-delimited JSON
  --stop-after INTEGER  Stop paginating after X results
  --help                Show this message and exit.

Database schema

The database created by this tool has the following schema:

CREATE TABLE [drive_users] (
   [permissionId] TEXT PRIMARY KEY,
   [kind] TEXT,
   [displayName] TEXT,
   [photoLink] TEXT,
   [me] INTEGER,
   [emailAddress] TEXT
);
CREATE TABLE [drive_folders] (
   [id] TEXT PRIMARY KEY,
   [_parent] TEXT,
   [lastModifyingUser] TEXT,
   [kind] TEXT,
   [name] TEXT,
   [mimeType] TEXT,
   [starred] INTEGER,
   [trashed] INTEGER,
   [explicitlyTrashed] INTEGER,
   [parents] TEXT,
   [spaces] TEXT,
   [version] TEXT,
   [webViewLink] TEXT,
   [iconLink] TEXT,
   [hasThumbnail] INTEGER,
   [thumbnailVersion] TEXT,
   [viewedByMe] INTEGER,
   [createdTime] TEXT,
   [modifiedTime] TEXT,
   [modifiedByMe] INTEGER,
   [shared] INTEGER,
   [ownedByMe] INTEGER,
   [viewersCanCopyContent] INTEGER,
   [copyRequiresWriterPermission] INTEGER,
   [writersCanShare] INTEGER,
   [folderColorRgb] TEXT,
   [quotaBytesUsed] TEXT,
   [isAppAuthorized] INTEGER,
   [linkShareMetadata] TEXT,
   FOREIGN KEY([_parent]) REFERENCES [drive_folders]([id]),
   FOREIGN KEY([lastModifyingUser]) REFERENCES [drive_users]([permissionId])
);
CREATE TABLE [drive_folders_owners] (
   [item_id] TEXT REFERENCES [drive_folders]([id]),
   [user_id] TEXT REFERENCES [drive_users]([permissionId]),
   PRIMARY KEY ([item_id],
   [user_id])
);
CREATE TABLE [drive_files] (
   [id] TEXT PRIMARY KEY,
   [_parent] TEXT,
   [lastModifyingUser] TEXT,
   [kind] TEXT,
   [name] TEXT,
   [mimeType] TEXT,
   [starred] INTEGER,
   [trashed] INTEGER,
   [explicitlyTrashed] INTEGER,
   [parents] TEXT,
   [spaces] TEXT,
   [version] TEXT,
   [webViewLink] TEXT,
   [iconLink] TEXT,
   [hasThumbnail] INTEGER,
   [thumbnailVersion] TEXT,
   [viewedByMe] INTEGER,
   [createdTime] TEXT,
   [modifiedTime] TEXT,
   [modifiedByMe] INTEGER,
   [shared] INTEGER,
   [ownedByMe] INTEGER,
   [viewersCanCopyContent] INTEGER,
   [copyRequiresWriterPermission] INTEGER,
   [writersCanShare] INTEGER,
   [quotaBytesUsed] TEXT,
   [isAppAuthorized] INTEGER,
   [linkShareMetadata] TEXT,
   FOREIGN KEY([_parent]) REFERENCES [drive_folders]([id]),
   FOREIGN KEY([lastModifyingUser]) REFERENCES [drive_users]([permissionId])
);
CREATE TABLE [drive_files_owners] (
   [item_id] TEXT REFERENCES [drive_files]([id]),
   [user_id] TEXT REFERENCES [drive_users]([permissionId]),
   PRIMARY KEY ([item_id],
   [user_id])
);

Privacy policy

This tool requests access to your Google Drive account in order to retrieve metadata about your files there. It also offers a feature that can download the content of those files.

The credentials used to access your account are stored in the auth.json file on your computer. The metadata and content retrieved from Google Drive is also stored only on your own personal computer.

At no point to the developers of this tool gain access to any of your data.

Development

To contribute to this tool, first checkout the code. Then create a new virtual environment:

cd google-drive-to-sqlite
python -m venv venv
source venv/bin/activate

Or if you are using pipenv:

pipenv shell

Now install the dependencies and test dependencies:

pip install -e '.[test]'

To run the tests:

pytest

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

google-drive-to-sqlite-0.3.tar.gz (19.9 kB view hashes)

Uploaded Source

Built Distribution

google_drive_to_sqlite-0.3-py3-none-any.whl (17.3 kB view hashes)

Uploaded Python 3

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page