Convert Google docs to markdown

Project description

wikinator

Convert a Google drive download into a markdown-based wiki.

Note: This is a work in progress, and not all features will be supported or working properly.

tl;dr

Install uv and then:

uvx wikinator --help
Usage: wikinator [OPTIONS] COMMAND [ARGS]...

Options:
  --version  Display version and exit.
  -v         Show verbose logging.
  -vv        Show debug logging.
  -vvv       Show full trace logging.
  --help     Show this message and exit.

Commands:
  upload   Convert and upload a file hierarchy to a GraphQL wiki.
  convert  Given the URL of a specific gdoc:
  config   View or set configuration settings.

Given a directory, convert supported file types into markdown-based files while maintaining names and directory structure. This can then be uploaded into various wiki systems.

Supported File Types

DOCX files (default for GDocs) are converted to markdown
images are extracted, uploaded and embedded in the markdown
FUTURE text and code file types are wrapped in markdown code blocks
FUTURE CSV and XSLT are converted to markdown tables
FUTURE for any document that is converted to markdown, a copy of the original is uploaded and attached

Supported Wiki Import

wiki.js (and other GraphQL-based wikis)
Obsidian

The development log will be kept here until the 1.0 release.

Usage

uxv wikinator convert https://gdoc/full/url path=test

uvx wikinator upload target_dir
uvx wikinator upload target_file.md new/path

convert will take a single URL to a google doc, convert it to markdown, and upload that to the configured GraphQL server. An optional path option is provided to specify to path in the wiki to upload the document to.

upload loads a full directory into the wiki. In the above examples:

Upload the directoy tree at target_dir into the wikipath target_dir
Upload the file target_file.docx into the wiki path new/path/target_file

Assuming the en locale, the final paths in the wiki will be:

$GRAPH_DB/en/target_dir/...
$GRAPH_DB/en/new/path/target_file

Configuration

There is nothing to install, the wikinator command can be run from anywhere uvx is installed.

To upload to your wiki, you must have:

the URL
an authentication token

Configuring the values in wikinator:

uvx wikinator config db_url https://db.example.com/graphql
uvx wikinator config db_token <authentication-token-for-your-graphdb>

When accessing Google docs, wikinator will confirm access to the requested files with a browser-based user authentication. These details will be stored in the configuration directory (uvx wikinator config config_dir) in token.json for future use.

wiki.js

This section is specific to the getting configuration values for a wiki.js server.

You'll need the URL of the server, and the authentication token for you account.

With those values, configure the db_url and db_token with the wikinator config command, as above.

Once this file is set up correctly, confirm with with:

uvx wikinator config

Build & Test

Clone

git clone https://github.com/philion/wikinator.git
cd wikinator

Run, with uv
```
uv run wikinator [options]
```
Test, with pytest
```
uv run pytest
```

Development Log

2026-03-08

Adding config and convert commands.

config helps manage config
convert will read, convert and upload a google doc

2025-08-07

Refactored and disabled (for now) the convert, extract and teleport commands. The code remains in place, but the current focus is on convert and upload to graphql, and I wan to disable any code that's not in that path while testing.

Added a verbose logging option, -v, to watch files being processed.

2025-07-08

Initial (buggy, probably) implementation of the full command set:

convert converts directory full of DOCX into markdown.
extract extracts the docs from google docs as markdown.
upload loads a full directory into wiki.js
teleport goes directly from google drive to wiki.js.

2025-07-07

Decent progress with google drive download. Still lots of problems.

get single file and dir params working.
fix \n translation problem. where are they coming from
for single input file, assume single output filename (if doen't exists). if does, and is dir, write -in-.
simple formatting tests
research which converter google is using

pandoc doesn't do embedding the same way (HTML-only): https://pandoc.org/MANUAL.html#option--embed-resources%5B

2025-07-06

Getting into formatting details, and I want to decompse and stream-line the docxit converter.

docxit creates in memory
better page handling, read and write files to disk
get first test working
simple formatting tests
build commands (unimplemented)
for single input file, assume single output filename (if doen't exists). if does, and is dir, write -in-.

Let's combine testing lists with a simple test:

load a file with a list
convert
confirm it contains the correct list

Moved the code around to simplify and remove potential circular dependencies.

Code runs as expected, as does trival test case.

Bumping version to 0.5, but not yet ready to release.

Thinking about commands:

convert : files -> files
extract : from googledocs -> file system
upload : from files -> graphql
teleport : from googledoc -> graphql

Refactored main to better commands. Got -v working.

Original behavior is working as wikinator convert.

Now looking at extract command.

2025-07-05

Starting work on image preservation.

Looking first at https://github.com/haesleinhuepf/docx2markdown for images.

Created a Docx2MarkdownConverter which almost works: images are put in the wrong path in the MD (s/images/ instead of just images). There's probably an easy fix, but lets try a pandoc version.

Creating PandocConverter to try and compare output.

pandoc {indoc} -f docx -t markdown --wrap=none --markdown-headings=atx --extract-media=images -o {outdoc}

Neither produces desired results.

Trying a literal hack of docx2markdown, to see how quickly I can fix the little problems I saw.

Got it working quickly, removed a little bug, got the images.

Now looking over the DOCX XML format to see how much I can scrape out.

https://learn.microsoft.com/en-us/dotnet/api/documentformat.openxml.wordprocessing.fontsizecomplexscript?view=openxml-3.0.1

Added detection for strikethru and Courier New (as "code font").

This is good enough for v0.2!

Oops. minor bug. fixing with v0.3

Noticed when working on strikethru that nested lists didn't seem to be working.

Next tasks:

List handling
Code cleanup (remove unused libs)
restructure docxit for in-memory
simple testing.
recognize and handle single file
default output to local dir

Looking over the raw XML, it looks like...

<w:numPr>
    <w:ilvl w:val="0" />
    <w:numId w:val="2" />
</w:numPr>

It looks like:

w:ilvl is the zero-based indent level
w:numId is an ID from numbering.xml, roughly mapping to:
- val=1 ordered list: 1.
- val=2 checklist: - [ ]
- val=3 bullet: *

Restructured and cleaned up. Removed unneeded code and libraried.

Created a simple docx doc for testing.

Far enough that a new release feels right. v0.4!

2025-07-04

Let's make a project! Today's goals:

clean up code and README
add CLI options, using type (not all implemented)
initial commit to github
add image handling
upload to pypi and confirm uvx commands

Cruft removed. README updated. (author waves, breaking 4th wall)

Moving on the main() cleanup and adding support for https://github.com/fastapi/typer

Added simple CLI options for src and dest. Got end-to-end tree processing.

Added Makefile to help with release management. Got PyPI setup: https://pypi.org/project/wikinator/

uvx wikinator is working.

Let's go for git and call it a day!

2025-07-03

Next steps are testing different document converters and accessing google drive via API.

Markdown conversion libraries

pandoc, see https://docs.asciidoctor.org/asciidoctor/latest/migrate/ms-word/
markitdown, https://github.com/microsoft/markitdown
docx2markdown, https://github.com/haesleinhuepf/docx2markdown
docx2md, https://github.com/mattn/docx2md

Reference:

https://www.docstomarkdown.pro/convert-word-or-docs-to-markdown-using-pandoc/

Google Drive API

Starting with https://developers.google.com/workspace/drive/api/quickstart/python

Note: Follow those Google directions for setting up everything. It's complicated compared to simply generating a service token. Your intrepid author made different tokens in different accounts and couldn't access anything! And get permissions right! Document specific needs in intstall docs.

Further aside: There are two versions of the tool: file-based and google-takeout. The google related stuff will always be a bear to setup.

Made suffienct progress to feel like there a seperate CLI tool here. Set aside for now, and focus on:

Build file-based output
Generate and link images
Clean up for initial 0.1 version

2025-07-02

Initial time-boxed work started to examine what would be required to migrate our existing GoogleDocs-based info repo into a wiki, with wiki.js being targeted.

Initial proof-of-concept goals:

Convert a docx page to md or asciidoc
Upload test pages to wiki.js

[^3]: At 25-08-10 19:35, Paul Philion said: This is a test comment.

I was able to get this working in sample code in a few hours.

Project details

Release history Release notifications | RSS feed

0.9.0

Mar 23, 2026

0.8.0

Mar 20, 2026

This version

0.7.0

Mar 19, 2026

0.6.2

Aug 13, 2025

0.6.1

Aug 13, 2025

0.6.0

Aug 11, 2025

0.5.0

Jul 10, 2025

0.4.0

Jul 6, 2025

0.3.0

Jul 5, 2025

0.2.0

Jul 5, 2025

0.1.0

Jul 5, 2025

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

wikinator-0.7.0.tar.gz (26.9 kB view details)

Uploaded Mar 19, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

wikinator-0.7.0-py3-none-any.whl (30.6 kB view details)

Uploaded Mar 19, 2026 Python 3

File details

Details for the file wikinator-0.7.0.tar.gz.

File metadata

Download URL: wikinator-0.7.0.tar.gz
Upload date: Mar 19, 2026
Size: 26.9 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: uv/0.10.11 {"installer":{"name":"uv","version":"0.10.11","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"macOS","version":null,"id":null,"libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":null}

File hashes

Hashes for wikinator-0.7.0.tar.gz
Algorithm	Hash digest
SHA256	`df2932327f525934d7038dcc05bbec12a4b66ca288787b512785baa0c1a36dd5`
MD5	`58caaeb46b0c90fcc4894f00d900f166`
BLAKE2b-256	`2ce7c22882f4c0bb8170e80a5b3668636229fe8cda9f85d8596bb7a395c3a628`

See more details on using hashes here.

File details

Details for the file wikinator-0.7.0-py3-none-any.whl.

File metadata

Download URL: wikinator-0.7.0-py3-none-any.whl
Upload date: Mar 19, 2026
Size: 30.6 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: uv/0.10.11 {"installer":{"name":"uv","version":"0.10.11","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"macOS","version":null,"id":null,"libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":null}

File hashes

Hashes for wikinator-0.7.0-py3-none-any.whl
Algorithm	Hash digest
SHA256	`7270f855d62e97867d7409031faddb0dccd41063bde0fa77b4c3e68c68129498`
MD5	`6cf671cc924924874dd3e3f85837388b`
BLAKE2b-256	`69c6ccabf74d7efc34b43bb6b8bb94cda871baba4fdfc4ed8d273b838e83d6c3`

See more details on using hashes here.

wikinator 0.7.0

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Project description

wikinator

tl;dr

Supported File Types

Supported Wiki Import

Usage

Configuration

wiki.js

Build & Test

Development Log

2026-03-08

2025-08-07

2025-07-08

2025-07-07

2025-07-06

2025-07-05

2025-07-04

2025-07-03

Markdown conversion libraries

Google Drive API

2025-07-02

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes