Estimation and Feature Selection in High-Dimensional Mixtures-of-Experts Models
The statistical analysis of heterogeneous and high-dimensional data is being a challenging problem, both from modeling, and inference point of views, especially with the today’s big data phenomenon. This suggests new strategies, particularly in advanced analyses going from density estimation to pred...
Đã lưu trong:
Tác giả chính: | |
---|---|
Định dạng: | Doctoral thesis |
Ngôn ngữ: | English |
Được phát hành: |
Caen, France
2023
|
Những chủ đề: | |
Truy cập trực tuyến: | https://scholar.dlu.edu.vn/handle/123456789/2336 |
Các nhãn: |
Thêm thẻ
Không có thẻ, Là người đầu tiên thẻ bản ghi này!
|
Thư viện lưu trữ: | Thư viện Trường Đại học Đà Lạt |
---|
id |
oai:scholar.dlu.edu.vn:123456789-2336 |
---|---|
record_format |
dspace |
spelling |
oai:scholar.dlu.edu.vn:123456789-23362023-06-14T04:22:27Z Estimation and Feature Selection in High-Dimensional Mixtures-of-Experts Models Huỳnh, Bảo Tuyên Mixture models Mixture of Experts Regularized Estimation Feature Selection Lasso L1-regularization Sparsity EM algorithm MM Algorithm Proximal-Newton Coordinate Ascent Clustering Classification Regression Prediction The statistical analysis of heterogeneous and high-dimensional data is being a challenging problem, both from modeling, and inference point of views, especially with the today’s big data phenomenon. This suggests new strategies, particularly in advanced analyses going from density estimation to prediction, as well as the unsupervised classification, of many kinds of such data with complex distribution. Mixture models are known to be very successful in modeling heterogeneity in data, in many statistical data science problems, including density estimation and clustering, and their elegant Mixtures-of-Experts (MoE) variety, which strengthen the link with supervised learning and hence deals furthermore with prediction from heterogeneous regressiontype data, and for classification. In a high-dimensional scenario, particularly for data arising from a heterogeneous population, using such MoE models requires addressing modeling and estimation questions, since the state-of-the art estimation methodologies are limited. This thesis deals with the problem of modeling and estimation of high-dimensional MoE models, towards effective density estimation, prediction and clustering of such heterogeneous and high-dimensional data. We propose new strategies based on regularized maximum-likelihood estimation (MLE) of MoE models to overcome the limitations of standard methods, including MLE estimation with Expectation-Maximization (EM) algorithms, and to simultaneously perform feature selection so that sparse models are encouraged in such a high-dimensional setting. We first introduce a mixture-of-experts’ parameter estimation and variable selection methodology, based on L1 (lasso) regularizations and the EM framework, for regression and clustering suited to highdimensional contexts. Then, we extend the method to regularized mixture of experts models for discrete data, including classification. We develop efficient algorithms to maximize the proposed L1-penalized observed-data log-likelihood function. Our proposed strategies enjoy the efficient monotone maximization of the optimized criterion, and unlike previous approaches, they do not rely on approximations on the penalty functions, avoid matrix inversion, and exploit the efficiency of the coordinate ascent algorithm, particularly within the proximal Newton-based approach. 2023-05-19T10:17:15Z 2023-05-19T10:17:15Z 2019 2016 2019 Doctoral thesis Luận văn, luận án Khoa học tự nhiên https://scholar.dlu.edu.vn/handle/123456789/2336 en Caen, France |
institution |
Thư viện Trường Đại học Đà Lạt |
collection |
Thư viện số |
language |
English |
topic |
Mixture models Mixture of Experts Regularized Estimation Feature Selection Lasso L1-regularization Sparsity EM algorithm MM Algorithm Proximal-Newton Coordinate Ascent Clustering Classification Regression Prediction |
spellingShingle |
Mixture models Mixture of Experts Regularized Estimation Feature Selection Lasso L1-regularization Sparsity EM algorithm MM Algorithm Proximal-Newton Coordinate Ascent Clustering Classification Regression Prediction Huỳnh, Bảo Tuyên Estimation and Feature Selection in High-Dimensional Mixtures-of-Experts Models |
description |
The statistical analysis of heterogeneous and high-dimensional data is being a challenging problem, both from modeling, and inference point of views, especially with the today’s big data phenomenon. This suggests new strategies, particularly in advanced analyses going from density estimation to prediction, as well as the unsupervised classification, of many kinds of such data with complex distribution. Mixture models are known to be very successful in modeling heterogeneity in data, in many statistical data science problems, including density estimation and clustering, and their elegant Mixtures-of-Experts (MoE) variety, which strengthen the link with supervised learning and hence deals furthermore with prediction from heterogeneous regressiontype data, and for classification. In a high-dimensional scenario, particularly for data arising from a heterogeneous population, using such MoE models requires addressing modeling and estimation questions, since the state-of-the art estimation methodologies are limited.
This thesis deals with the problem of modeling and estimation of high-dimensional MoE models, towards effective density estimation, prediction and clustering of such heterogeneous and high-dimensional data. We propose new strategies based on regularized maximum-likelihood estimation (MLE) of MoE models to overcome the limitations of standard methods, including MLE estimation with Expectation-Maximization (EM) algorithms, and to simultaneously perform feature selection so that sparse models are encouraged in such a high-dimensional setting. We first introduce a mixture-of-experts’ parameter estimation and variable selection methodology, based on L1 (lasso) regularizations and the EM framework, for regression and clustering suited to highdimensional contexts. Then, we extend the method to regularized mixture of experts models for discrete data, including classification. We develop efficient algorithms to maximize the proposed L1-penalized observed-data log-likelihood function. Our proposed strategies enjoy the efficient monotone maximization of the optimized criterion, and unlike previous approaches, they do not rely on approximations on the penalty functions, avoid matrix inversion, and exploit the efficiency of the coordinate ascent algorithm, particularly within the proximal Newton-based approach. |
format |
Doctoral thesis |
author |
Huỳnh, Bảo Tuyên |
author_facet |
Huỳnh, Bảo Tuyên |
author_sort |
Huỳnh, Bảo Tuyên |
title |
Estimation and Feature Selection in High-Dimensional Mixtures-of-Experts Models |
title_short |
Estimation and Feature Selection in High-Dimensional Mixtures-of-Experts Models |
title_full |
Estimation and Feature Selection in High-Dimensional Mixtures-of-Experts Models |
title_fullStr |
Estimation and Feature Selection in High-Dimensional Mixtures-of-Experts Models |
title_full_unstemmed |
Estimation and Feature Selection in High-Dimensional Mixtures-of-Experts Models |
title_sort |
estimation and feature selection in high-dimensional mixtures-of-experts models |
publisher |
Caen, France |
publishDate |
2023 |
url |
https://scholar.dlu.edu.vn/handle/123456789/2336 |
_version_ |
1778233854047289344 |