Skip to main content

Python drawing utilities for Venn and Euler diagrams visualizing the relationships between two or more sets.

Project description

Matplotlib Set Diagrams

Draw Euler diagrams and Venn diagrams with Matplotlib.

Euler and Venn diagrams are used to visualise the relationships between sets. Both typically employ circles to represent sets, and areas where two circles overlap represent subsets common to both supersets. Venn diagrams show all possible relationships of inclusion and exclusion between two or more sets. In Euler diagrams, the area corresponding to each subset is scaled according to the size of the subset. If a subset doesn't exist, the corresponding area doesn't exist.

This library was inspired by matplotlib-venn, but developed independently. It adds support for creating set diagrams for an arbitrary number of sets, visualising set and subset contents, and implements an improved layout engine. For more details, see below. This library also improves on and replaces matplotlib_venn_wordcloud.

[!CAUTION] This package uses circles for visualizing set relationships. With four or more sets, it can be impossible to arrange the four or more circles such that all non-empty subsets are shown. A warning is raised in these instances. See this issue for a discussion of the problem. A list of references regarding the maximum number of plane divisions by circles can be found here.

Installation

pip install matplotlib_set_diagrams

Documentation

Numerous tutorials, code examples, and a complete documentation of the API can be found on ReadTheDocs.

Quickstart

This section is for the impatient. For more comprehensive, step-by-step guides, please consult the documentation.

Quickstart output figure

import matplotlib.pyplot as plt

from matplotlib_set_diagrams import EulerDiagram, VennDiagram

fig, axes = plt.subplots(2, 4, figsize=(15, 5))

for ii, SetDiagram in enumerate([EulerDiagram, VennDiagram]):

    # Initialise from a list of sets:
    SetDiagram.from_sets(
        [
            {"a", "b", "c", "d", "e"},
            {"e", "f", "g"},
        ],
        ax=axes[ii, 0])

    # Alternatively, initialise directly from pre-computed subset sizes.
    SetDiagram(
        {
            (1, 0) : 4, # {"a", "b", "c", "d"}
            (0, 1) : 2, # {"f", "g"}
            (1, 1) : 1, # {"e"}
        },
        ax=axes[ii, 1])

    # Visualise subset items as word clouds:
    text_1 = """Lorem ipsum dolor sit amet, consectetur adipiscing elit,
    sed do eiusmod tempor incididunt ut labore et dolore magna aliqua. Ut
    enim ad minim veniam, quis nostrud exercitation ullamco laboris nisi
    ut aliquip ex ea commodo consequat."""

    text_2 = """Duis aute irure dolor in reprehenderit in voluptate velit
    esse cillum dolore eu fugiat nulla pariatur. Lorem ipsum dolor sit
    amet."""

    def word_tokenize(text):
        """Break a string into its constituent words, and convert the words
        into their 'standard' form (tokens).

        The procedure below is a poor-man's tokenization.
        Consider using the Natural Language Toolkit (NLTK) instead:

        >>> import nltk; words = nltk.word_tokenize(text)

        """
        # get a word list
        words = text.split(' ')
        # remove non alphanumeric characters
        words = [''.join(ch for ch in word if ch.isalnum()) for word in words]
        # convert to all lower case
        words = [word.lower() for word in words]
        return words

    # Tokenize strings.
    sets = [set(word_tokenize(text)) for text in [text_1, text_2]]

    SetDiagram.as_wordcloud(sets, ax=axes[ii, 2])

    # The implementation generalises to any number of sets.
    # However, exact solutions are only guaranteed for two given sets,
    # and the more sets are given, the less likely it becomes that
    # the optimisation procedure finds even an approximate solution.
    # Furthermore, above four or five sets, diagrams become unintelligible.
    # Here an example of a 4-way set diagram:
    SetDiagram(
        {
            (1, 0, 0, 0) : 4.0,
            (0, 1, 0, 0) : 3.0,
            (0, 0, 1, 0) : 2.0,
            (0, 0, 0, 1) : 1.0,
            (1, 1, 0, 0) : 0.9,
            (1, 0, 1, 0) : 0.8,
            (1, 0, 0, 1) : 0.7,
            (0, 1, 1, 0) : 0.6,
            (0, 1, 0, 1) : 0.5,
            (0, 0, 1, 1) : 0.4,
            (1, 1, 1, 0) : 0.3,
            (1, 1, 0, 1) : 0.25,
            (1, 0, 1, 1) : 0.2,
            (0, 1, 1, 1) : 0.15,
            (1, 1, 1, 1) : 0.1,
        },
    ax=axes[ii, 3])

    # set row titles
    axes[ii, 0].annotate(
        SetDiagram.__name__,
        xy         = (0, 0.5),
        xycoords   = 'axes fraction',
        xytext     = (-10, 0),
        textcoords = "offset points",
        ha         = 'right',
        va         = 'center',
        fontsize   = 'large',
        fontweight = 'bold',
    )

fig.tight_layout()
plt.show()

Alternative python libraries

matplotlib-venn: the inspiration for this library. However, matplotlib-venn has some significant drawbacks:

  1. It only produces two-way and three-way set diagrams.
  2. There is no support for visualising set contents other than external libraries such as my matplotlib_venn_wordcloud.
  3. The layout engine often generates incorrect results for three-way set diagrams, and a lot of issues on the matplotlib-venn issue tracker boil down to this problem. Consider the example below, adapted from issue #34:
  • Subset (1, 0, 0) / abC / (A - B - C) is annotated with the label for subset (1, 1, 0) / ABc / (A & B - C).
  • Subset (1, 1, 0) / ABc / (A & B - C) is not visualised at all.

matplotlib-venn / matplotlib_set_diagrams comparison

import matplotlib.pyplot as plt

from matplotlib_set_diagrams import EulerDiagram
from matplotlib_venn import venn3

fig, axes = plt.subplots(1, 2, figsize=(6, 3))

subset_sizes = {
    (1, 0, 0) : 167, # Abc in matplotlib-venn nomenclature
    (0, 1, 0) : 7,   # aBc
    (0, 0, 1) : 25,  # abC
    (1, 1, 0) : 41,  # ABc
    (0, 1, 1) : 174, # aBC
    (1, 0, 1) : 171, # AbC
    (1, 1, 1) : 51,  # ABC
}

axes[0].set_title("matplotlib-venn")
print(tuple(subset_sizes.values()))
# (167, 7, 25, 41, 174, 171, 51)
venn3(tuple(subset_sizes.values()), ax=axes[0])

axes[1].set_title("matplotlib_set_diagrams")
EulerDiagram(subset_sizes, ax=axes[1])

plt.show()

pyvenn: Uses pre-built images to produce Venn diagrams for up to 6 sets. The visualisations are hence not area-proportional; only the subset labels are adjusted based on user input.

pyvenn example visualisation

supervenn: Produces area-proportional, Euler diagram-equivalent visualisations, that are, however, not Euler or Venn diagrams. Generalises well to arbitrary numbers of sets and thus easily the superior choice for diagnostic purposes (its intended use-case). However, the produced visualisations are more difficult to communicate to the unfamiliar reader, and thus probably less appropriate for publications.

supervenn example visualisation

from supervenn import supervenn

sets = [{1, 2, 3, 4}, {3, 4, 5}, {1, 6, 7, 8}]
labels = ['alice', 'bob', 'third party']
supervenn(sets, labels)
plt.show()

Recent Changes

  • 0.2.0 Various improvements to set and subset label placement and styling (issue #5)
  • 0.1.1 Fix a bug that occurred when attempting to draw word clouds for empty (sub-)sets (issue #6)
  • 0.1.0 Implement set artists such that set outlines can be shown (issue #4)

Contributing & Support

If you get stuck and have a question that is not covered in the documentation, please raise an issue on the issue tracker. If applicable, make a sketch of the desired result. If you submit a bug report, please make sure to include the complete error trace. Include any relevant code and data in a minimal, reproducible example. Pull requests are always welcome.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

matplotlib_set_diagrams-0.2.2.tar.gz (31.4 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

matplotlib_set_diagrams-0.2.2-py3-none-any.whl (27.9 kB view details)

Uploaded Python 3

File details

Details for the file matplotlib_set_diagrams-0.2.2.tar.gz.

File metadata

  • Download URL: matplotlib_set_diagrams-0.2.2.tar.gz
  • Upload date:
  • Size: 31.4 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for matplotlib_set_diagrams-0.2.2.tar.gz
Algorithm Hash digest
SHA256 2d698c25b3fdc455d896bcaec46c3441f0e78e469a8a73fa839aaaa29d2cce57
MD5 c067cda2374debc3b54c9f001a4b23b7
BLAKE2b-256 590ccc22aa93507ad479bd3128689d89896c88d03219e23592c73157063fab81

See more details on using hashes here.

Provenance

The following attestation bundles were made for matplotlib_set_diagrams-0.2.2.tar.gz:

Publisher: publish.yml on paulbrodersen/matplotlib_set_diagrams

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file matplotlib_set_diagrams-0.2.2-py3-none-any.whl.

File metadata

File hashes

Hashes for matplotlib_set_diagrams-0.2.2-py3-none-any.whl
Algorithm Hash digest
SHA256 f794c130a4988086f33d37e344e31f74f388202a06e6e89d8e43f15ee7c55d6f
MD5 b8c7421f8ee5fbd88f9b9a8c7b0bf52f
BLAKE2b-256 6cc55b9e14cb8f6dd8522251032451f328f2e1c4ab56996ffd58312e9bb6f0c5

See more details on using hashes here.

Provenance

The following attestation bundles were made for matplotlib_set_diagrams-0.2.2-py3-none-any.whl:

Publisher: publish.yml on paulbrodersen/matplotlib_set_diagrams

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page