Skip to main content

Download Google Drive files/folders and upload them to the Internet Archive

Project description

IAdrive

Lint Unit Tests License Button PyPI Button

IAdrive is a tool for archiving Google Drive files/folders and Google Docs/Sheets/Slides and uploading them to the Internet Archive, It downloads the content, creates appropriate metadata, and uploads to IA with preservation of folder structure

  • This project is heavily based off tubeup by bibanon, credits to them

Features

  • Google Drive Support: Downloads files and/or folders from Google Drive using gdown
  • Google Docs Integration: Directly exports Google Docs, Sheets, and Slides in multiple formats
  • Multiple Format Export: For Google Docs, automatically exports in all available formats (PDF, DOCX, TXT, HTML, etc)
  • Preserves folder structure when uploading (can be disabled with --disable-slash-files)
  • Extract file modification dates to determine the creation date for the item
  • Pass custom metadata to Archive.org using --metadata=<key:value>
  • Supports quiet mode (--quiet) and debug mode (--debug) for log output
  • Automatically cleans up downloaded files after upload
  • Sanitizes identifiers and truncates subject tags to fit Archive.org requirements
  • Falls back to "IAdrive" as publisher since Google Drive collaborators fetching is not yet implemented
  • Improved error handling and debug output

Installation

Requires Python 3.9 or newer

pip install iadrive

The package makes a console script named iadrive once installed, You can also install from the source using pip install .

Configuration

ia configure

You're gonna be prompted to enter your IA account's email and password

Optional envs:

  • GOOGLE_API_KEY – if set, the tool attempts to look up the owner names of the Google Drive file or folder for the creator field in metadata (not yet implemented)

Usage

iadrive <url> [--metadata=<key:value>...] [--disable-slash-files] [--quiet] [--debug]

Arguments:

  • <url> – Google Drive file/folder URL or Google Docs/Sheets/Slides URL to archive

Options:

  • --metadata=<key:value> – custom metadata to add to the IA item
  • --disable-slash-files – upload files without preserving folder structure
  • --quiet – only print errors
  • --debug – print all logs to stdout

Google Docs Support

IAdrive can directly archive Google Docs, Sheets, and Slides by exporting them in all available formats, it uses public export URLs

Available Formats

Google Documents:

  • pdf
  • docx
  • odt
  • rtf
  • txt
  • html
  • epub

Google Spreadsheets:

  • xlsx
  • ods
  • pdf
  • csv
  • tsv
  • html

Google Presentations:

  • pdf
  • pptx
  • odp
  • txt
  • jpeg
  • png
  • svg

Automatic Export Behavior

For example, a Google Document will be automatically exported and uploaded as:

  • placeholder.pdf
  • placeholder.docx
  • placeholder.odt
  • placeholder.rtf
  • placeholder.txt
  • placeholder.html
  • placeholder.epub

Google Docs Examples

# Archive Google Document
iadrive https://docs.google.com/document/d/1abc123/edit

# Archive Google Spreadsheet
iadrive https://docs.google.com/spreadsheets/d/1abc123/edit

# Archive Google Slides with custom metadata
iadrive https://docs.google.com/presentation/d/1abc123/edit --metadata=collection:placeholder --metadata=creator:placeholder

# Debug mode with Google Docs
iadrive https://docs.google.com/document/d/1abc123/edit --debug

Google Drive Examples

# Upload with folder structure preserved (default)
iadrive https://drive.google.com/drive/folders/placeholder --metadata=collection:placeholder

# Upload with flat structure
iadrive https://drive.google.com/drive/folders/placeholder --disable-slash-files

# Debug mode with custom metadata
iadrive https://drive.google.com/drive/folders/placeholder --metadata=collection:placeholder \
        --metadata=mediatype:data --debug

Folder Structure Preservation

By default, IAdrive preserves the folder structure from Google Drive when uploading to Internet Archive, For example, if your Google Drive link contains:

placeholder.txt
placeholder.mp3
folder/
  ├── placeholder.pdf
  └── folder/
      └── placeholder.mp4

The files will be uploaded to Internet Archive as:

  • placeholder.txt
  • placeholder.mp3
  • folder/placeholder.pdf
  • folder/folder/placeholder.mp4

If you use the --disable-slash-files command argument, all files will be uploaded to the root level:

  • placeholder.txt
  • placeholder.mp3
  • placeholder.pdf
  • placeholder.mp4

Note: When using flat structure, duplicate filenames are automatically handled by adding a number (e.g., placeholder.pdf, placeholder_1.pdf).

How it works

Google Drive Files/Folders

  1. iadrive uses gdown to fetch the specified Google Drive file or folder
  2. It walks the downloaded directory and extracts file extensions and modification dates
  3. Metadata is made including a file listing (with sizes), oldest file modification date, and original URL
  4. The content is uploaded to Archive.org with identifier format drive-{drive-id}

Google Docs/Sheets/Slides

  1. iadrive detects Google Docs URLs and determines the document type
  2. It automatically exports the document in all available formats using Google's public export URLs
  3. Each format is downloaded and saved with descriptive filenames
  4. Metadata includes comprehensive format information and document type
  5. The content is uploaded to Archive.org with identifier format docs-{doc-id}

Common Steps

  • Identifiers are sanitized and subject tags are truncated to fit Archive.org requirements
  • Publisher defaults to "IAdrive" since collaborator fetching is not yet implemented
  • Folder structure is preserved by default (can be disabled with --disable-slash-files)
  • Downloaded files are automatically cleaned up after upload
  • Errors are handled gracefully, and debug output is available with --debug

Supported Platforms

For a list of supported platforms for archiving, please see SUPPORTEDPLATFORMS.md

To-do list

  • Google Drive collaborator fetching to use as creator metadata through the Google API
  • Batch processing

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

iadrive-1.0.6.tar.gz (20.0 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

iadrive-1.0.6-py3-none-any.whl (19.0 kB view details)

Uploaded Python 3

File details

Details for the file iadrive-1.0.6.tar.gz.

File metadata

  • Download URL: iadrive-1.0.6.tar.gz
  • Upload date:
  • Size: 20.0 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.11.3

File hashes

Hashes for iadrive-1.0.6.tar.gz
Algorithm Hash digest
SHA256 6fc3ba9f313990a7556614e03a4b146ff661ced082f6136a66f090b541d5cb29
MD5 a570adc987228d7f0659c82bbdab6c46
BLAKE2b-256 039fa9d6db4e10eacfa23d042cc3881175ae43a86d8c7ad48428d8d570e7f394

See more details on using hashes here.

File details

Details for the file iadrive-1.0.6-py3-none-any.whl.

File metadata

  • Download URL: iadrive-1.0.6-py3-none-any.whl
  • Upload date:
  • Size: 19.0 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.11.3

File hashes

Hashes for iadrive-1.0.6-py3-none-any.whl
Algorithm Hash digest
SHA256 74b155c5cef9aaf754819cf6fdbda694c04ea82565fbe31d47588a15e2396412
MD5 0b27f4310e57f0557ec81a8bb34519e4
BLAKE2b-256 9b24abafa1fd777412877f5e1e88f58e01ce63f271495d4887cfc9e1d3a0444b

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page