Skip to main content

Python drawing utilities for Venn and Euler diagrams visualizing the relationships between two or more sets.

Project description

Matplotlib Set Diagrams

Draw Euler diagrams and Venn diagrams with Matplotlib.

Euler and Venn diagrams are used to visualise the relationships between sets. Both typically employ circles to represent sets, and areas where two circles overlap represent subsets common to both supersets. Venn diagrams show all possible relationships of inclusion and exclusion between two or more sets. In Euler diagrams, the area corresponding to each subset is scaled according to the size of the subset. If a subset doesn't exist, the corresponding area doesn't exist.

This library was inspired by matplotlib-venn, but developed independently. It adds support for creating set diagrams for an arbitrary number of sets, visualising set and subset contents, and implements an improved layout engine. For more details, see below. This library also improves on and replaces matplotlib_venn_wordcloud.

Installation

pip install matplotlib_set_diagrams

Documentation

Numerous tutorials, code examples, and a complete documentation of the API can be found on ReadTheDocs.

Quickstart

This section is for the impatient. For more comprehensive, step-by-step guides, please consult the documentation.

Quickstart output figure

import matplotlib.pyplot as plt

from matplotlib_set_diagrams import EulerDiagram, VennDiagram

fig, axes = plt.subplots(2, 4, figsize=(15, 5))

for ii, SetDiagram in enumerate([EulerDiagram, VennDiagram]):

    # Initialise from a list of sets:
    SetDiagram.from_sets(
        [
            {"a", "b", "c", "d", "e"},
            {"e", "f", "g"},
        ],
        ax=axes[ii, 0])

    # Alternatively, initialise directly from pre-computed subset sizes.
    SetDiagram(
        {
            (1, 0) : 4, # {"a", "b", "c", "d"}
            (0, 1) : 2, # {"f", "g"}
            (1, 1) : 1, # {"e"}
        },
        ax=axes[ii, 1])

    # Visualise subset items as word clouds:
    text_1 = """Lorem ipsum dolor sit amet, consectetur adipiscing elit,
    sed do eiusmod tempor incididunt ut labore et dolore magna aliqua. Ut
    enim ad minim veniam, quis nostrud exercitation ullamco laboris nisi
    ut aliquip ex ea commodo consequat."""

    text_2 = """Duis aute irure dolor in reprehenderit in voluptate velit
    esse cillum dolore eu fugiat nulla pariatur. Lorem ipsum dolor sit
    amet."""

    def word_tokenize(text):
        """Break a string into its constituent words, and convert the words
        into their 'standard' form (tokens).

        The procedure below is a poor-man's tokenization.
        Consider using the Natural Language Toolkit (NLTK) instead:

        >>> import nltk; words = nltk.word_tokenize(text)

        """
        # get a word list
        words = text.split(' ')
        # remove non alphanumeric characters
        words = [''.join(ch for ch in word if ch.isalnum()) for word in words]
        # convert to all lower case
        words = [word.lower() for word in words]
        return words

    # Tokenize strings.
    sets = [set(word_tokenize(text)) for text in [text_1, text_2]]

    SetDiagram.as_wordcloud(sets, ax=axes[ii, 2])

    # The implementation generalises to any number of sets.
    # However, exact solutions are only guaranteed for two given sets,
    # and the more sets are given, the less likely it becomes that
    # the optimisation procedure finds even an approximate solution.
    # Furthermore, above four or five sets, diagrams become unintelligible.
    # Here an example of a 4-way set diagram:
    SetDiagram(
        {
            (1, 0, 0, 0) : 4.0,
            (0, 1, 0, 0) : 3.0,
            (0, 0, 1, 0) : 2.0,
            (0, 0, 0, 1) : 1.0,
            (1, 1, 0, 0) : 0.9,
            (1, 0, 1, 0) : 0.8,
            (1, 0, 0, 1) : 0.7,
            (0, 1, 1, 0) : 0.6,
            (0, 1, 0, 1) : 0.5,
            (0, 0, 1, 1) : 0.4,
            (1, 1, 1, 0) : 0.3,
            (1, 1, 0, 1) : 0.25,
            (1, 0, 1, 1) : 0.2,
            (0, 1, 1, 1) : 0.15,
            (1, 1, 1, 1) : 0.1,
        },
    ax=axes[ii, 3])

    # set row titles
    axes[ii, 0].annotate(
        SetDiagram.__name__,
        xy         = (0, 0.5),
        xycoords   = 'axes fraction',
        xytext     = (-10, 0),
        textcoords = "offset points",
        ha         = 'right',
        va         = 'center',
        fontsize   = 'large',
        fontweight = 'bold',
    )

fig.tight_layout()
plt.show()

Alternative python libraries

matplotlib-venn: the inspiration for this library. However, matplotlib-venn has some significant drawbacks:

  1. It only produces two-way and three-way set diagrams.
  2. There is no support for visualising set contents other than external libraries such as my matplotlib_venn_wordcloud.
  3. The layout engine often generates incorrect results for three-way set diagrams, and a lot of issues on the matplotlib-venn issue tracker boil down to this problem. Consider the example below, adapted from issue #34:
  • Subset (1, 0, 0) / abC / (A - B - C) is annotated with the label for subset (1, 1, 0) / ABc / (A & B - C).
  • Subset (1, 1, 0) / ABc / (A & B - C) is not visualised at all.

matplotlib-venn / matplotlib_set_diagrams comparison

import matplotlib.pyplot as plt

from matplotlib_set_diagrams import EulerDiagram
from matplotlib_venn import venn3

fig, axes = plt.subplots(1, 2, figsize=(6, 3))

subset_sizes = {
    (1, 0, 0) : 167, # Abc in matplotlib-venn nomenclature
    (0, 1, 0) : 7,   # aBc
    (0, 0, 1) : 25,  # abC
    (1, 1, 0) : 41,  # ABc
    (0, 1, 1) : 174, # aBC
    (1, 0, 1) : 171, # AbC
    (1, 1, 1) : 51,  # ABC
}

axes[0].set_title("matplotlib-venn")
print(tuple(subset_sizes.values()))
# (167, 7, 25, 41, 174, 171, 51)
venn3(tuple(subset_sizes.values()), ax=axes[0])

axes[1].set_title("matplotlib_set_diagrams")
EulerDiagram(subset_sizes, ax=axes[1])

plt.show()

pyvenn: Uses pre-built images to produce Venn diagrams for up to 6 sets. The visualisations are hence not area-proportional; only the subset labels are adjusted based on user input.

pyvenn example visualisation

supervenn: Produces area-proportional, Euler diagram-equivalent visualisations, that are, however, not Euler or Venn diagrams. Generalises well to arbitrary numbers of sets and thus easily the superior choice for diagnostic purposes (its intended use-case). However, the produced visualisations are more difficult to communicate to the unfamiliar reader, and thus probably less appropriate for publications.

supervenn example visualisation

from supervenn import supervenn

sets = [{1, 2, 3, 4}, {3, 4, 5}, {1, 6, 7, 8}]
labels = ['alice', 'bob', 'third party']
supervenn(sets, labels)
plt.show()

Contributing & Support

If you get stuck and have a question that is not covered in the documentation, please raise an issue on the issue tracker. If applicable, make a sketch of the desired result. If you submit a bug report, please make sure to include the complete error trace. Include any relevant code and data in a minimal, reproducible example. Pull requests are always welcome.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

matplotlib_set_diagrams-0.0.2.tar.gz (30.1 kB view details)

Uploaded Source

Built Distribution

matplotlib_set_diagrams-0.0.2-py3-none-any.whl (26.8 kB view details)

Uploaded Python 3

File details

Details for the file matplotlib_set_diagrams-0.0.2.tar.gz.

File metadata

File hashes

Hashes for matplotlib_set_diagrams-0.0.2.tar.gz
Algorithm Hash digest
SHA256 09d9054a85538423b593b6698c0568063c696d6bef4f77ad01a8c2e0e1a3540d
MD5 fe0e853ebb7174485b2294e6629e9f53
BLAKE2b-256 4baf128507a975f41b7a38245c51d618bf0c3d49fdfca3957db08ac6f3596d61

See more details on using hashes here.

File details

Details for the file matplotlib_set_diagrams-0.0.2-py3-none-any.whl.

File metadata

File hashes

Hashes for matplotlib_set_diagrams-0.0.2-py3-none-any.whl
Algorithm Hash digest
SHA256 5cb9a4979f730318b01043b39f8562e93257f18ca12db8bf35b35305aaa3a5c8
MD5 7ba52ce2a8b01b0d96a4bdef8e893b58
BLAKE2b-256 c4049635fa5c9de55e8ddda3f5f256dfb7de86941a658bfb11a5e450e6ef73c3

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page