Skip to main content

Compare MongoDB collections from the command line.

Project description

mongo-diff

mongo-diff is a command-line tool people can use to compare two MongoDB collections.

Those collections can reside in either a single database or two separate databases (even across servers).

%% This is the source code of a Mermaid diagram, which GitHub will render as a diagram.
%% Note: PyPI does not render Mermaid diagrams, and instead displays their source code.
%%       Reference: https://github.com/pypi/warehouse/issues/13083
graph LR
    script[["mongo_diff.py"]]
    result["List of<br>differences"]

    subgraph s1 \[Server]
        subgraph d1 \[Database]
            collection_a[("Collection A")]
        end
    end

    subgraph s2 \[Server]
        subgraph d2 \[Database]
            collection_b[("Collection B")]
        end
    end

    collection_a --> script
    collection_b --> script
    script --> result

Usage

Installation

Assuming you have pipx installed, you can install the tool by running the following command:

pipx install mongo-diff

pipx is a tool people can use to download and install Python scripts that are hosted on PyPI. You can install pipx by running $ python -m pip install pipx.

Running

You can display the tool's --help snippet by running:

mongo-diff --help

At the time of this writing, the tool's --help snippet is:

 Usage: mongo-diff [OPTIONS]

 Compare two MongoDB collections.
 Those collections can reside in either a single database or two separate
 databases (even across servers).

╭─ Options ────────────────────────────────────────────────────────────────────╮
│ --include-oid,--include-id          Include the `_id` field when comparing   │
│                                     documents (`--include-id` is             │
│                                     deprecated).                             │
│ --help                              Show this message and exit.              │
╰──────────────────────────────────────────────────────────────────────────────╯
╭─ Collection A ───────────────────────────────────────────────────────────────╮
│ *  --mongo-uri-a                    TEXT  Connection string for accessing    │
│                                           the MongoDB server containing      │
│                                           collection A.                      │
│                                           [env var: MONGO_URI_A]             │
│                                           [required]                         │
│ *  --database-name-a                TEXT  Name of the database containing    │
│                                           collection A.                      │
│                                           [required]                         │
│ *  --collection-name-a              TEXT  Name of collection A. [required]   │
│    --identifier-field-name-a        TEXT  Name of the field of each document │
│                                           in collection A to use to identify │
│                                           a corresponding document in        │
│                                           collection B. The values in this   │
│                                           field must be unique within each   │
│                                           collection.                        │
│                                           [default: id]                      │
╰──────────────────────────────────────────────────────────────────────────────╯
╭─ Collection B ───────────────────────────────────────────────────────────────╮
│ --mongo-uri-b                    TEXT  Connection string for accessing the   │
│                                        MongoDB server containing collection  │
│                                        B (if different from that specified   │
│                                        for collection A).                    │
│                                        [env var: MONGO_URI_B]                │
│ --database-name-b                TEXT  Name of the database containing       │
│                                        collection B (if different from that  │
│                                        specified for collection A).          │
│ --collection-name-b              TEXT  Name of collection B (if different    │
│                                        from that specified for collection    │
│                                        A).                                   │
│ --identifier-field-name-b        TEXT  Name of the field of each document in │
│                                        collection B to use to identify a     │
│                                        corresponding document in collection  │
│                                        A (if different from that specified   │
│                                        for collection A). The values in this │
│                                        field must be unique within each      │
│                                        collection.                           │
╰──────────────────────────────────────────────────────────────────────────────╯

Note: The above snippet was captured from a terminal window whose width was 80 characters.

MongoDB connection strings

As documented in the --help snippet above, you can provide the MongoDB connection strings to the tool via either (a) command-line options; or (b) environment variables named MONGO_URI_A and MONGO_URI_B. The latter can come in handy for MongoDB connection strings that contain passwords.

Here's how you could create those environment variables:

export MONGO_URI_A='mongodb://localhost:27017'
export MONGO_URI_B='mongodb://username:password@host.example.com:22222'

Note: That will only create those environment variables in the current shell process. You can persist them by adding those same commands to your shell initialization script (e.g. ~/.bashrc, ~/.zshrc).

Example output

As the tool compares the collections, it will display the differences it detects; like this:

Document differs between collections:
--- Collection A: id='x:food-00001'
+++ Collection B: id='x:food-00001'
@@ -13,7 +13,7 @@
       "type": "food"
     }
   ],
-  "name": "Salmon",
+  "name": "Salmon Fillet",
   "vendor": "Alaskan Foods",
   "vendorCountry": "USA",
   "vendorRegion": "Alaska",

Document exists in collection A only: id='x:food-00002'
Document exists in collection A only: id='x:food-00003'
Document exists in collection B only: id='x:food-00004'

When the tool finishes comparing the collections, it will display a summary of the result; like this:

                         Result                         
╭───────────────────────────────────────────┬──────────╮
│ Description                               │ Quantity │
├───────────────────────────────────────────┼──────────┤
│ Documents in collection A                 │        4 │
│ Documents in collection B                 │        3 │
├───────────────────────────────────────────┼──────────┤
│ Documents in collection A only            │        2 │
│ Documents in collection B only            │        1 │
├───────────────────────────────────────────┼──────────┤
│ Documents that differ between collections │        1 │
╰───────────────────────────────────────────┴──────────╯

Updating

You can update the tool to the latest version available on PyPI by running:

pipx upgrade mongo-diff

Uninstallation

You can uninstall the tool from your computer by running:

pipx uninstall mongo-diff

Development

We use Poetry to both (a) manage dependencies and (b) publish packages to PyPI.

  • pyproject.toml: Configuration file for Poetry and other tools (was generated via $ poetry init)
  • poetry.lock: List of dependencies, direct and indirect (was generated via $ poetry update)

Clone repository

git clone https://github.com/eecavanna/mongo-diff.git
cd mongo-diff

Create virtual environment

Create a Poetry virtual environment and attach to its shell:

poetry shell

You can see information about the Poetry virtual environment by running: $ poetry env info

You can detach from the Poetry virtual environment's shell by running: $ exit

From now on, I'll refer to the Poetry virtual environment's shell as the "Poetry shell."

Install dependencies

At the Poetry shell, install the project's dependencies:

poetry install

Make changes

Edit the tool's source code and documentation however you want.

While editing the tool's source code, you can run the tool as you normally would in order to test things out.

mongo-diff --help

Run tests

We currently only have a smattering of doctests in this codebase. You can run them via:

poetry run python -m doctest ./mongo_diff/mongo_diff.py

We may eventually populate the tests/ directory with a more exhaustive test suite, using pytest and mongomock.

Build package

Update package version

PyPI doesn't allow people to publish the same "version" of a package multiple times.

You can update the version identifier of the package by running:

poetry version {version_or_keyword}

You can replace {version_or_keyword} with either a literal version identifier (e.g. 0.1.1) or a keyword (e.g. major, minor, or patch). You can run $ poetry version --help to see the valid keywords.

Alternatively, you can manually edit a line in pyproject.toml:

- version = "0.1.0"
+ version = "0.1.1"

Build package

At the Poetry shell, build the package based upon the latest source code:

poetry build

That will create both a source distribution file (whose name ends with .tar.gz) and a wheel file (whose name ends with .whl) in the dist directory.

Publish package

Set up PyPI credentials

At the Poetry shell, create the following environment variable, which Poetry will check for if credentials aren't specified to it in another way.

export POETRY_PYPI_TOKEN_PYPI="{api_token}"

Replace {api_token} with a PyPI API token whose scope includes the PyPI project to which you want to publish the package.

Publish package to PyPI

At the Poetry shell, publish the newly-built package to PyPI:

poetry publish

At this point, people will be able to download and install the package from PyPI.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

mongo_diff-0.3.1.tar.gz (12.3 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

mongo_diff-0.3.1-py3-none-any.whl (10.4 kB view details)

Uploaded Python 3

File details

Details for the file mongo_diff-0.3.1.tar.gz.

File metadata

  • Download URL: mongo_diff-0.3.1.tar.gz
  • Upload date:
  • Size: 12.3 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: poetry/2.3.2 CPython/3.12.11 Darwin/25.4.0

File hashes

Hashes for mongo_diff-0.3.1.tar.gz
Algorithm Hash digest
SHA256 29b3104bac17f0a37e3526222af136355075645801dacdb830b0be76118317d6
MD5 0adf86421f21367f0d17f6eb3aa8f0e5
BLAKE2b-256 48cdb02e617c04f6305375f765b67af81885f892659611e6e9b5c63149901746

See more details on using hashes here.

File details

Details for the file mongo_diff-0.3.1-py3-none-any.whl.

File metadata

  • Download URL: mongo_diff-0.3.1-py3-none-any.whl
  • Upload date:
  • Size: 10.4 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: poetry/2.3.2 CPython/3.12.11 Darwin/25.4.0

File hashes

Hashes for mongo_diff-0.3.1-py3-none-any.whl
Algorithm Hash digest
SHA256 7d06e19a50248ceb1581c3d8c515bd891cf7183730d6c861e6aad4b58ce428cc
MD5 715135403b824356275cad8e51672b34
BLAKE2b-256 80c10941959843eca992e8483a07652563c305be7202679875faba7b7fe2bf23

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page