anjana

ANJANA is an open source framework for applying different anonymity techniques.

These details have not been verified by PyPI

Project links

Repository

Project description

PyPI Python version

Anonymity as major assurance of personal data privacy

ANJANA is a Python library for anonymizing sensitive data.

The following anonymity techniques are implemented, based on the Python library pyCANON:

k-anonymity.
(α,k)-anonymity.
ℓ-diversity.
Entropy ℓ-diversity.
Recursive (c,ℓ)-diversity.
t-closeness.
Basic β-likeness.
Enhanced β-likeness.
δ-disclosure privacy.

Installation

First, we strongly recommend the use of a virtual environment. In linux:

virtualenv .venv -p python3
source .venv/bin/activate

Using pip:

Install anjana (linux and windows):

pip install anjana

Using git:

Install the most updated version of anjana (linux and windows):

pip install git+https://github.com/IFCA-Advanced-Computing/anjana.git

Getting started

For anonymizing your data you need to introduce:

The pandas dataframe with the data to be anonymized. Each column can contain: identifiers, quasi-indentifiers or sensitive attributes.
The list with the names of the identifiers in the dataframe, in order to suppress them.
The list with the names of the quasi-identifiers in the dataframe.
The sentive attribute (only one) in case of applying other techniques than k-anonymity.
The level of anonymity to be applied, e.g. k (for k-anonymity), ℓ (for ℓ-diversity), t (for t-closeness), β (for basic or enhanced β-likeness), etc.
Maximum level of record suppression allowed (from 0 to 100, acting as the percentage of suppressed records).
Dictionary containing one dictionary for each quasi-identifier with the hierarchies and the levels.

Example: apply k-anonymity, ℓ-diversity and t-closeness to the adult dataset with some predefined hierarchies:

import pandas as pd
import anjana
from anjana.anonymity import k_anonymity, l_diversity, t_closeness

# Read and process the data
data = pd.read_csv("adult.csv")
data.columns = data.columns.str.strip()
cols = [
    "workclass",
    "education",
    "marital-status",
    "occupation",
    "sex",
    "native-country",
]
for col in cols:
    data[col] = data[col].str.strip()

# Define the identifiers, quasi-identifiers and the sensitive attribute
quasi_ident = [
    "age",
    "education",
    "marital-status",
    "occupation",
    "sex",
    "native-country",
]
ident = ["race"]
sens_att = "salary-class"

# Select the desired level of k, l and t
k = 10
l_div = 2
t = 0.5

# Select the suppression limit allowed
supp_level = 50

# Import the hierarquies for each quasi-identifier. Define a dictionary containing them
hierarchies = {
    "age": dict(pd.read_csv("hierarchies/age.csv", header=None)),
    "education": dict(pd.read_csv("hierarchies/education.csv", header=None)),
    "marital-status": dict(pd.read_csv("hierarchies/marital.csv", header=None)),
    "occupation": dict(pd.read_csv("hierarchies/occupation.csv", header=None)),
    "sex": dict(pd.read_csv("hierarchies/sex.csv", header=None)),
    "native-country": dict(pd.read_csv("hierarchies/country.csv", header=None)),
}

# Apply the three functions: k-anonymity, l-diversity and t-closeness
data_anon = k_anonymity(data, ident, quasi_ident, k, supp_level, hierarchies)
data_anon = l_diversity(
    data_anon, ident, quasi_ident, sens_att, k, l_div, supp_level, hierarchies
)
data_anon = t_closeness(
    data_anon, ident, quasi_ident, sens_att, k, t, supp_level, hierarchies
)

The previous code can be executed in less than 4 seconds for the more than 30,000 records of the original dataset.

Define your own hierarchies

All the anonymity functions available in ANJANA receive a dictionary with the hierarchies to be applied to the quasi-identifiers. In particular, this dictionary has as key the names of the columns that are quasi-identifiers to which a hierarchy is to be applied (it may happen that you do not want to generalize some QIs and therefore no hierarchy is to be applied to them, just do not include them in this dictionary). The value for each key (QI) is formed by a dictionary in such a way that the value 0 has as value the raw column (as it is in the original dataset), the value 1 corresponds to the first level of transformation to be applied, in relation to the values of the original column, and so on with as many keys as levels of hierarchies have been established.

For a better understanding, let's look at the following example. Supose that we have the following simulated dataset (extracted from the hospital_extended.csv dataset used for testing purposes) with age, gender and city as quasi-identifiers, name as identifier and disease as sensitive attribute. Regarding the QI, we want to apply the following hierarquies: interval of 5 years (first level) and 10 years (second level) for the age. Suppression as first level for both gender and city.

name	age	gender	city	disease
Ramsha	29	Female	Tamil Nadu	Cancer
Yadu	24	Female	Kerala	Viralinfection
Salima	28	Female	Tamil Nadu	TB
Sunny	27	Male	Karnataka	No illness
Joan	24	Female	Kerala	Heart-related
Bahuksana	23	Male	Karnataka	TB
Rambha	19	Male	Kerala	Cancer
Kishor	29	Male	Karnataka	Heart-related
Johnson	17	Male	Kerala	Heart-related
John	19	Male	Kerala	Viralinfection

Then, in order to create the hierarquies we can define the following dictionary:

import numpy as np

age = data['age'].values
# Values: [29 24 28 27 24 23 19 29 17 19] (note that the following can be automatized)
age_5years = ['[25, 30)', '[20, 25)', '[25, 30)',
              '[25, 30)', '[20, 25)', '[20, 25)',
              '[15, 20)', '[25, 30)', '[15, 20)', '[15, 20)']

age_10years = ['[20, 30)', '[20, 30)', '[20, 30)',
               '[20, 30)', '[20, 30)', '[20, 30)',
               '[10, 20)', '[20, 30)', '[10, 20)', '[10, 20)']

hierarchies = {
    "age": {0: age,
            1: age_5years,
            2: age_10years},
    "gender": {
        0: data["gender"].values,
        1: np.array(["*"] * len(data["gender"].values)) # Suppression
    },
    "city": {0: data["city"].values,
             1: np.array(["*"] * len(data["city"].values))} # Suppression
}

You can also use the function generate_intervals() from utils for creating the interval-based hierarchy as follows:

import numpy as np
from anjana.anonymity import utils

age = data['age'].values

hierarchies = {
    "age": {
        0: data["age"].values,
        1: utils.generate_intervals(data["age"].values, 0, 100, 5),
        2: utils.generate_intervals(data["age"].values, 0, 100, 10),
    },
    "gender": {
        0: data["gender"].values,
        1: np.array(["*"] * len(data["gender"].values)) # Suppression
    },
    "city": {0: data["city"].values,
             1: np.array(["*"] * len(data["city"].values))} # Suppression
}

License

This project is licensed under the Apache 2.0 license.

Citation

If you are using anjana you can cite it as follows:

@article{sainzpardo2024anjana,
    title={An Open Source Python Library for Anonymizing Sensitive Data},
    author={S{\'a}inz-Pardo D{\'\i}az, Judith and L{\'o}pez Garc{\'\i}a, {\'A}lvaro},
    journal={Scientific data},
    volume={11},
    number={1},
    pages={1289},
    year={2024},
    publisher={Nature Publishing Group UK London}
  }

Related work

If you are using anjana, you may also be interested in:

pyCANON: a Python library for checking the level of anonymity of a dataset.
trasgoDP: a Python library which implements different mechanisms to apply local differential privacy directly to your data.

Funding and acknowledgments

This work is funded by European Union through the SIESTA project (Horizon Europe) under Grant number 101131957.

Note: Anjana and the mythology of Cantabria

"La Anjana" is a character from the mythology of Cantabria. Known as the good fairy of Cantabria, generous and protective of all people, she helps the poor, the suffering and those who stray in the forest.

- Partially extracted from: Cotera, Gustavo. Mitología de Cantabria. Ed. Tantin, Santander, 1998.

Project details

These details have not been verified by PyPI

Project links

Repository

Release history Release notifications | RSS feed

1.2.2

May 21, 2026

1.2.1

May 13, 2026

This version

1.2.0

May 12, 2026

1.1.0

Jan 31, 2025

1.0.0

Aug 14, 2024

0.2.2

May 15, 2024

0.2.1

May 13, 2024

0.0.2

Apr 18, 2024

0.0.1.post1

Apr 11, 2024

0.0.1

Apr 11, 2024

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

anjana-1.2.0.tar.gz (17.3 kB view details)

Uploaded May 12, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

anjana-1.2.0-py3-none-any.whl (23.1 kB view details)

Uploaded May 12, 2026 Python 3

File details

Details for the file anjana-1.2.0.tar.gz.

File metadata

Download URL: anjana-1.2.0.tar.gz
Upload date: May 12, 2026
Size: 17.3 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.10.20

File hashes

Hashes for anjana-1.2.0.tar.gz
Algorithm	Hash digest
SHA256	`9639a477545e6cb6bf3437b3256667a094d7d5151fcc1400d3304f7e77df9544`
MD5	`2fcfcfb726a3384728d20baaeb23051e`
BLAKE2b-256	`aec969ae3645068349bc1f43a5e76b611ca99dc74df9fbe3780becb2c9aed646`

See more details on using hashes here.

File details

Details for the file anjana-1.2.0-py3-none-any.whl.

File metadata

Download URL: anjana-1.2.0-py3-none-any.whl
Upload date: May 12, 2026
Size: 23.1 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.10.20

File hashes

Hashes for anjana-1.2.0-py3-none-any.whl
Algorithm	Hash digest
SHA256	`0ca5c36860fe1e2802afb42cdd8cd5b4ddd283be259eb1817ac5c9713182b01a`
MD5	`1951f76b90198f76149191ec2a9fac1d`
BLAKE2b-256	`4b632897c781932543a7300fb6d44aae8481f46ac7682d58366c92a72e698fd7`

See more details on using hashes here.

anjana 1.2.0

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

Installation

Getting started

Example: apply k-anonymity, ℓ-diversity and t-closeness to the adult dataset with some predefined hierarchies:

Define your own hierarchies

License

Citation

Related work

Funding and acknowledgments

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes