Python implementation of subsampling methods for big data under GLMs from NeEDS4BigData.
Project description
NeEDS4BigDataPy
The python library “NeEDS4BigDataPy” provides approaches to implement subsampling methods to analyse big data and is the python version of NeEDS4BigData.
What is “NeEDS4BigData” an abbreviation for?
New Experimental Design based Subsampling methods for Big Data.
How to engage with NeEDS4BigDataPy for the first time?
# Installing from PyPI
pip install NeEDS4BigDataPy
# Importing the package
import NeEDS4BigDataPy
Subsampling Methods
- A- and L-optimality based subsampling for GLMs.
- A-optimality based subsampling for Gaussian Linear Models.
- Leverage sampling for GLMs.
- Local case control sampling for logistic regression.
- A-optimality based subsampling under measurement constraints for GLMs.
- Model robust subsampling method for GLMs.
- Subsampling method for GLMs when the model is potentially misspecified.
These seven methods are described in the following articles under the topics
- Introduction - explains the need for subsampling methods.
- Model based subsampling
- Model robust and misspecification
- Benchmarking Functions
For $2)$ we assume the main effects model can describe the data. While for $3)$ first we consider there are several models that can describe the big data, then later we assume the given main effects model is misspecified. Under these conditions from $2)$ and $3)$ we explore subsampling for four given big data sets. Further, to explore the computation time we ran simulations for the scenarios $2)$ and $3)$ where we compare our subsampling functions against full data modelling in $4)$.
Thank You
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file needs4bigdatapy-1.0.1.tar.gz.
File metadata
- Download URL: needs4bigdatapy-1.0.1.tar.gz
- Upload date:
- Size: 25.8 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.11.9
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
02195531b62ebe9005a3223205e7d2459b8191a7d19840ba9bdca47e644caa55
|
|
| MD5 |
212cf11235912f28de7baff1650e997b
|
|
| BLAKE2b-256 |
60874f62c88eb10d379e4b213cf87e5428b7eb1678c81e4a44c891585e2e9fa2
|
File details
Details for the file needs4bigdatapy-1.0.1-py3-none-any.whl.
File metadata
- Download URL: needs4bigdatapy-1.0.1-py3-none-any.whl
- Upload date:
- Size: 46.1 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.11.9
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
536166b8b004bbde35d7946d7c846f8afcbf16ad31d0bdc42c50a7594d21b9d1
|
|
| MD5 |
27b08aa817f4d23f0ab70efd83680715
|
|
| BLAKE2b-256 |
dd058b53c812a51e2022662e40b0caff02f56a2ee9288f0d22a16ebdfaa426cb
|