Skip to main content

Performs iterative proportional fitting on tabular data

Project description

IPFpy

Iterative proportionial fitting that can work with larger than memory tables.

inputs tables can be either pandas dataframes, .csv file or .parquet file

  input: table
      Thif table lists all the cells or units in a table whose value will be adjusted by Iterative proportional fitting along with boundaries whose adjusted value is meant to stay within.
      unit_id    : identifier for the decision variables
      weight     : decision variables. >=0
      lb	     : weight >= lb
      ub	     : weight <= up

  constraints : table
      This table maps for each constaint identifier, which unit_id to aggregate
      unit_id   : identifier for the decision variables
      cons_id   : identifier for each marging

  targets : table
      This table lists all the target values that the margins should add up to once adjusted
      cons_id   : identifier for each marging
      cons_type	: constraint must be greater or equal (ge) the target, lesser or equal (le), or equal (eq)
      target    : value for the constaint
  
  unit_id : name of the column that identifies each value to be adjusted (default "unit_id")
  var     : name of the column that contains the value to be adjusted    (default "weight")
  cons_id : name of the column that identifies each constraints          (default "cons_id")
                        
  db_file (optional ): name of the database file on disc that will hold the temporary tables. Default is in memory

  out_parquet (optional): name path of the parquet output file
  out_csv (optional)    : name path of the csv output file

  silent (optinal default false): Whether or not to print progress to screen
  
  output : table
      Output table lists all the initials cells/units along with their adjusted values.
      untiId    : identifier for the decision variables
      weight    : adjusted weight. Will fit in the interval lb <=	weight <= ub

Example

from IPFpy import *
import numpy as np


# test IPF
#step1 - create a table and generate the margins as well as the file that maps the cells of the inner table to the margins
raw_table = generate_random_table(4,8,scale=2)
input_table, margins, constraints = aggregate_table(raw_table, by=[0,1,2,3], var="value")
margins = margins.rename(columns={"value":"target"}) #rename margin column

# step2 - modify the margins by adding noise to the inner cells
new_table = input_table.copy().drop("unit_id",axis=1)
new_table["value"] =  input_table["value"] * np.random.uniform(0, 2, input_table.shape[0])
modified_table, modified_margins, constraints = aggregate_table(new_table, by=[0,1,2,3], var="value")
modified_margins = modified_margins.rename(columns={"value":"target"})

# write table as csv
input_table.to_csv('input_table.csv', index=False)
constraints.to_csv('constraints.csv', index=False)
modified_margins.to_csv('modified_margins.csv', index=False)


df.to_parquet('my_data.parquet', engine='pyarrow')

# adjust the table in step1 to the margin obtained in step2
adjusted_table = ipf(   input=input_table,
                        constraints=constraints,
                        targets=modified_margins,
                        unit_id="unit_id",
                        var="value",
                        cons_id="cons_id",
                        db_file=None,
                        tol=0.1,
                        maxIter=1000)

# output to a file
ipf(input       =input_table,
    constraints =constraints,
    targets     =modified_margins,
    unit_id     ="unit_id",
    var         ="value",
    cons_id     ="cons_id",
    tol         =0.1,
    maxIter     =1000,
    out_csv     ="adjusted_table.csv",
    silent=True)

# input directly from files
# paths to the input files have to be adjusted to correspond to the location of the input files
ipf(input       ="/home/Desktop/Programming/IPF/IPF/input_table.csv",
    constraints ="/home/Desktop/Programming/IPF/IPF/constraints.csv",
    targets     ="/home/Desktop/Programming/IPF/IPF/modified_margins.csv",
    unit_id     ="unit_id",
    var         ="value",
    cons_id     ="cons_id",
    tol         =0.1,
    maxIter     =1000,
    out_csv     ="adjusted_table.csv",
    silent=True)

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

ipfpy-0.1.0.tar.gz (6.9 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

ipfpy-0.1.0-py3-none-any.whl (7.0 kB view details)

Uploaded Python 3

File details

Details for the file ipfpy-0.1.0.tar.gz.

File metadata

  • Download URL: ipfpy-0.1.0.tar.gz
  • Upload date:
  • Size: 6.9 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for ipfpy-0.1.0.tar.gz
Algorithm Hash digest
SHA256 820918972902207617c55f1587937442865bbbcb16df650efa5f6d4c0cf69886
MD5 bedf6b94ba7e7da7a5671b9b7dc5ae84
BLAKE2b-256 885847c5abc35b81f4816b4ea3878afab773686da94f9f57a01da73c49d5f851

See more details on using hashes here.

Provenance

The following attestation bundles were made for ipfpy-0.1.0.tar.gz:

Publisher: python-publish.yml on Veozen/IPFpy

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file ipfpy-0.1.0-py3-none-any.whl.

File metadata

  • Download URL: ipfpy-0.1.0-py3-none-any.whl
  • Upload date:
  • Size: 7.0 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for ipfpy-0.1.0-py3-none-any.whl
Algorithm Hash digest
SHA256 96433f5582865e9902469a1593f02db269702b932f3a60f35c1df6c0ef1c3d86
MD5 97583476afa903c41d2e56585227653e
BLAKE2b-256 f8004df6b74e258a0e04de5d3d557e8b4c04895e37f90999c27ff832bcd3b96b

See more details on using hashes here.

Provenance

The following attestation bundles were made for ipfpy-0.1.0-py3-none-any.whl:

Publisher: python-publish.yml on Veozen/IPFpy

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page