Skip to content

HBossier/BigDataPIM

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

62 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Fitting Probabilistic Index Models on Large Datasets

Short introduction

The following is a very dense summary of the class of Probabilistic Index Models used in this project. We refer to Thas et al. (2012) for more information.

Probabilistic Index

The Probabilistic Index is defined as

PI

If (Y,X) and (Y',X') are i.i.d, then a Probabilistic Index Model is defined as:

PIM

m(.) is here a function with range [0,1] and some smoothness condition. This function is restricted so that it is related to a linear predictor Zbeta where Z is a p-dimensional vector with elements that may depend on X and X' and p equal to the amount of predictors.

Problem

The goal of any regression approach is to estimate the beta parameters. For the PIM model, this quickly becomes computationally too demanding. To estimate these paremeters, we need to use the set of pseudo-observations pseudo. Given some contraints which we do not discus here, the set of pseudo-observations increases quadratic in its limiting behavior with n. That is the set of indices = O.

Goal

The goal of this project is to fit the PIM to a large dataset. We will try to accomplish this by subsampling from the set of pseudo-observations. Though the term Big Data is usually reserved for a broader setting in which data storage, distribution, volume, velocity, etc. play an important role. In our setting, we do not need a massive dataset as it is already nearly imposible to fit a PIM when N > 100.000. Hence, we will work with the term large datasets to restrict our focus.

Project structure

  • 1_Scripts: contains R scripts.
  • 2_Reports: contains most reports. Some reports are R notebooks which render in Github. For html reports, you need to download the entire repository.

Application: mental well-being

Analyses for chapter of application can be found under: 2_Reports/Application

References

[1] Thas, O., De Neve, J., Clement, L., and Ottoy, J.P. (2012) Probabilistic index models (with discussion). Journal of the Royal Statistical Society - Series B, 74:623-671.

About

Fitting Probabilistic Index Models to Big Data

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published