Download Google Drive files/folders and upload them to the Internet Archive
Project description
IAdrive
IAdrive is a tool for archiving Google Drive files/folders and Google Docs/Sheets/Slides and uploading them to the Internet Archive, It downloads the content, creates appropriate metadata, and uploads to IA with preservation of folder structure
- This project is heavily based off tubeup by bibanon, credits to them
Features
- Google Drive Support: Downloads files and/or folders from Google Drive using gdown
- Google Docs Integration: Directly exports Google Docs, Sheets, and Slides in multiple formats
- Multiple Format Export: For Google Docs, automatically exports in all available formats (PDF, DOCX, TXT, HTML, etc)
- Preserves folder structure when uploading (can be disabled with
--disable-slash-files) - Extract file modification dates to determine the creation date for the item
- Pass custom metadata to Archive.org using
--metadata=<key:value> - Supports quiet mode (
--quiet) and debug mode (--debug) for log output - Automatically cleans up downloaded files after upload
- Sanitizes identifiers and truncates subject tags to fit Archive.org requirements
- Falls back to "IAdrive" as publisher since Google Drive collaborators fetching is not yet implemented
- Improved error handling and debug output
Installation
Requires Python 3.9 or newer
pip install iadrive
The package makes a console script named iadrive once installed, You can also install from the source using pip install .
Configuration
ia configure
You're gonna be prompted to enter your IA account's email and password
Optional envs:
GOOGLE_API_KEY– if set, the tool attempts to look up the owner names of the Google Drive file or folder for thecreatorfield in metadata (not yet implemented)
Usage
iadrive <url> [--metadata=<key:value>...] [--disable-slash-files] [--quiet] [--debug]
Arguments:
<url>– Google Drive file/folder URL or Google Docs/Sheets/Slides URL to archive
Options:
--metadata=<key:value>– custom metadata to add to the IA item--disable-slash-files– upload files without preserving folder structure--quiet– only print errors--debug– print all logs to stdout
Google Docs Support
IAdrive can directly archive Google Docs, Sheets, and Slides by exporting them in all available formats, it uses public export URLs
Available Formats
Google Documents:
pdfdocxodtrtftxthtmlepub
Google Spreadsheets:
xlsxodspdfcsvtsvhtml
Google Presentations:
pdfpptxodptxtjpegpngsvg
Automatic Export Behavior
For example, a Google Document will be automatically exported and uploaded as:
placeholder.pdfplaceholder.docxplaceholder.odtplaceholder.rtfplaceholder.txtplaceholder.htmlplaceholder.epub
Google Docs Examples
# Archive Google Document
iadrive https://docs.google.com/document/d/1abc123/edit
# Archive Google Spreadsheet
iadrive https://docs.google.com/spreadsheets/d/1abc123/edit
# Archive Google Slides with custom metadata
iadrive https://docs.google.com/presentation/d/1abc123/edit --metadata=collection:placeholder --metadata=creator:placeholder
# Debug mode with Google Docs
iadrive https://docs.google.com/document/d/1abc123/edit --debug
Google Drive Examples
# Upload with folder structure preserved (default)
iadrive https://drive.google.com/drive/folders/placeholder --metadata=collection:placeholder
# Upload with flat structure
iadrive https://drive.google.com/drive/folders/placeholder --disable-slash-files
# Debug mode with custom metadata
iadrive https://drive.google.com/drive/folders/placeholder --metadata=collection:placeholder \
--metadata=mediatype:data --debug
Folder Structure Preservation
By default, IAdrive preserves the folder structure from Google Drive when uploading to Internet Archive, For example, if your Google Drive link contains:
placeholder.txt
placeholder.mp3
folder/
├── placeholder.pdf
└── folder/
└── placeholder.mp4
The files will be uploaded to Internet Archive as:
placeholder.txtplaceholder.mp3folder/placeholder.pdffolder/folder/placeholder.mp4
If you use the --disable-slash-files command argument, all files will be uploaded to the root level:
placeholder.txtplaceholder.mp3placeholder.pdfplaceholder.mp4
Note: When using flat structure, duplicate filenames are automatically handled by adding a number (e.g., placeholder.pdf, placeholder_1.pdf).
How it works
Google Drive Files/Folders
iadriveusesgdownto fetch the specified Google Drive file or folder- It walks the downloaded directory and extracts file extensions and modification dates
- Metadata is made including a file listing (with sizes), oldest file modification date, and original URL
- The content is uploaded to Archive.org with identifier format
drive-{drive-id}
Google Docs/Sheets/Slides
iadrivedetects Google Docs URLs and determines the document type- It automatically exports the document in all available formats using Google's public export URLs
- Each format is downloaded and saved with descriptive filenames
- Metadata includes comprehensive format information and document type
- The content is uploaded to Archive.org with identifier format
docs-{doc-id}
Common Steps
- Identifiers are sanitized and subject tags are truncated to fit Archive.org requirements
- Publisher defaults to "IAdrive" since collaborator fetching is not yet implemented
- Folder structure is preserved by default (can be disabled with
--disable-slash-files) - Downloaded files are automatically cleaned up after upload
- Errors are handled gracefully, and debug output is available with
--debug
Supported Platforms
For a list of supported platforms for archiving, please see SUPPORTEDPLATFORMS.md
To-do list
- Google Drive collaborator fetching to use as creator metadata through the Google API
- Batch processing
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file iadrive-1.0.6.tar.gz.
File metadata
- Download URL: iadrive-1.0.6.tar.gz
- Upload date:
- Size: 20.0 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.1.0 CPython/3.11.3
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
6fc3ba9f313990a7556614e03a4b146ff661ced082f6136a66f090b541d5cb29
|
|
| MD5 |
a570adc987228d7f0659c82bbdab6c46
|
|
| BLAKE2b-256 |
039fa9d6db4e10eacfa23d042cc3881175ae43a86d8c7ad48428d8d570e7f394
|
File details
Details for the file iadrive-1.0.6-py3-none-any.whl.
File metadata
- Download URL: iadrive-1.0.6-py3-none-any.whl
- Upload date:
- Size: 19.0 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.1.0 CPython/3.11.3
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
74b155c5cef9aaf754819cf6fdbda694c04ea82565fbe31d47588a15e2396412
|
|
| MD5 |
0b27f4310e57f0557ec81a8bb34519e4
|
|
| BLAKE2b-256 |
9b24abafa1fd777412877f5e1e88f58e01ce63f271495d4887cfc9e1d3a0444b
|