Skip to main content

Python implementation of subsampling methods for big data under GLMs from NeEDS4BigData.

Project description

NeEDS4BigDataPy

Project Status: Active - The project has reached a stable, usable state and is being actively developed. GitHub issues

MIT license

The python library “NeEDS4BigDataPy” provides approaches to implement subsampling methods to analyse big data and is the python version of NeEDS4BigData.

What is “NeEDS4BigData” an abbreviation for?

New Experimental Design based Subsampling methods for Big Data.

How to engage with NeEDS4BigDataPy for the first time?

# Installing from PyPI
pip install NeEDS4BigDataPy
# Importing the package
import NeEDS4BigDataPy

Subsampling Methods

  1. A- and L-optimality based subsampling for GLMs.
  2. A-optimality based subsampling for Gaussian Linear Models.
  3. Leverage sampling for GLMs.
  4. Local case control sampling for logistic regression.
  5. A-optimality based subsampling under measurement constraints for GLMs.
  6. Model robust subsampling method for GLMs.
  7. Subsampling method for GLMs when the model is potentially misspecified.

These seven methods are described in the following articles under the topics

  1. Introduction - explains the need for subsampling methods.
  2. Model based subsampling
  3. Model robust and misspecification
  4. Benchmarking Functions

For $2)$ we assume the main effects model can describe the data. While for $3)$ first we consider there are several models that can describe the big data, then later we assume the given main effects model is misspecified. Under these conditions from $2)$ and $3)$ we explore subsampling for four given big data sets. Further, to explore the computation time we ran simulations for the scenarios $2)$ and $3)$ where we compare our subsampling functions against full data modelling in $4)$.

Thank You

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

needs4bigdatapy-1.0.1.tar.gz (25.8 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

needs4bigdatapy-1.0.1-py3-none-any.whl (46.1 kB view details)

Uploaded Python 3

File details

Details for the file needs4bigdatapy-1.0.1.tar.gz.

File metadata

  • Download URL: needs4bigdatapy-1.0.1.tar.gz
  • Upload date:
  • Size: 25.8 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.11.9

File hashes

Hashes for needs4bigdatapy-1.0.1.tar.gz
Algorithm Hash digest
SHA256 02195531b62ebe9005a3223205e7d2459b8191a7d19840ba9bdca47e644caa55
MD5 212cf11235912f28de7baff1650e997b
BLAKE2b-256 60874f62c88eb10d379e4b213cf87e5428b7eb1678c81e4a44c891585e2e9fa2

See more details on using hashes here.

File details

Details for the file needs4bigdatapy-1.0.1-py3-none-any.whl.

File metadata

File hashes

Hashes for needs4bigdatapy-1.0.1-py3-none-any.whl
Algorithm Hash digest
SHA256 536166b8b004bbde35d7946d7c846f8afcbf16ad31d0bdc42c50a7594d21b9d1
MD5 27b08aa817f4d23f0ab70efd83680715
BLAKE2b-256 dd058b53c812a51e2022662e40b0caff02f56a2ee9288f0d22a16ebdfaa426cb

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page