Regularized Mixtures of Experts in High-Dimensional Data

Mixture of experts (MoE) models are successful neural-network architectures for modeling heterogeneous data in many machine learning problems including regression, clustering and classification. The model learning is in general performed by maximum likelihood estimation (MLE). For high-dimensional...

Mô tả đầy đủ

Đã lưu trong:
Chi tiết về thư mục
Những tác giả chính: Faicel, Chamroukhi, Huỳnh, Bảo Tuyên
Định dạng: Conference paper
Ngôn ngữ:English
Được phát hành: 2023
Những chủ đề:
Truy cập trực tuyến:https://scholar.dlu.edu.vn/handle/123456789/2335
Các nhãn: Thêm thẻ
Không có thẻ, Là người đầu tiên thẻ bản ghi này!
Thư viện lưu trữ: Thư viện Trường Đại học Đà Lạt
id oai:scholar.dlu.edu.vn:123456789-2335
record_format dspace
institution Thư viện Trường Đại học Đà Lạt
collection Thư viện số
language English
topic Mixtures of Experts
high-dimensional data
Regularized maximum likelihood estimation
spellingShingle Mixtures of Experts
high-dimensional data
Regularized maximum likelihood estimation
Faicel, Chamroukhi
Huỳnh, Bảo Tuyên
Regularized Mixtures of Experts in High-Dimensional Data
description Mixture of experts (MoE) models are successful neural-network architectures for modeling heterogeneous data in many machine learning problems including regression, clustering and classification. The model learning is in general performed by maximum likelihood estimation (MLE). For high-dimensional data, a regularization is needed in order to avoid possible degeneracies or infeasibility of the MLE related to high-dimensional and possibly redundant and correlated features in a high-dimensional scenario. Regularized maximum likelihood estimation allows the selection of a relevant subset of features for prediction and thus encourages sparse solutions. The problem of variable selection is challenging in the modeling of heterogeneous data, including with MoE models. We consider the MoE for heterogeneous regression data and propose a regularized maximum-likelihood estimation with possibly high-dimensional features, based on a dedicated EM algorithm which integrates coordinate ascent updates of the parameters. Unlike state-of-the art regularized MLE for MoE, the proposed modeling does not require an approximate of the regularization. The proposed algorithm allows to automatically obtaining sparse solutions without thresholding, and includes coordinate ascent updates avoiding matrix inversion, and can thus be scalable. An experimental study shows the good performance of the algorithm in terms of recovering the actual sparse solutions, in parameter estimation, and in clustering of heterogeneous regression data.
format Conference paper
author Faicel, Chamroukhi
Huỳnh, Bảo Tuyên
author_facet Faicel, Chamroukhi
Huỳnh, Bảo Tuyên
author_sort Faicel, Chamroukhi
title Regularized Mixtures of Experts in High-Dimensional Data
title_short Regularized Mixtures of Experts in High-Dimensional Data
title_full Regularized Mixtures of Experts in High-Dimensional Data
title_fullStr Regularized Mixtures of Experts in High-Dimensional Data
title_full_unstemmed Regularized Mixtures of Experts in High-Dimensional Data
title_sort regularized mixtures of experts in high-dimensional data
publishDate 2023
url https://scholar.dlu.edu.vn/handle/123456789/2335
_version_ 1778233853676093440
spelling oai:scholar.dlu.edu.vn:123456789-23352023-06-14T04:22:51Z Regularized Mixtures of Experts in High-Dimensional Data Faicel, Chamroukhi Huỳnh, Bảo Tuyên Mixtures of Experts high-dimensional data Regularized maximum likelihood estimation Mixture of experts (MoE) models are successful neural-network architectures for modeling heterogeneous data in many machine learning problems including regression, clustering and classification. The model learning is in general performed by maximum likelihood estimation (MLE). For high-dimensional data, a regularization is needed in order to avoid possible degeneracies or infeasibility of the MLE related to high-dimensional and possibly redundant and correlated features in a high-dimensional scenario. Regularized maximum likelihood estimation allows the selection of a relevant subset of features for prediction and thus encourages sparse solutions. The problem of variable selection is challenging in the modeling of heterogeneous data, including with MoE models. We consider the MoE for heterogeneous regression data and propose a regularized maximum-likelihood estimation with possibly high-dimensional features, based on a dedicated EM algorithm which integrates coordinate ascent updates of the parameters. Unlike state-of-the art regularized MLE for MoE, the proposed modeling does not require an approximate of the regularization. The proposed algorithm allows to automatically obtaining sparse solutions without thresholding, and includes coordinate ascent updates avoiding matrix inversion, and can thus be scalable. An experimental study shows the good performance of the algorithm in terms of recovering the actual sparse solutions, in parameter estimation, and in clustering of heterogeneous regression data. 2023-05-19T09:57:57Z 2023-05-19T09:57:57Z 2018 Conference paper Bài báo đăng trên KYHT quốc tế (có ISBN) https://scholar.dlu.edu.vn/handle/123456789/2335 en 2018 International Joint Conference on Neural Networks (IJCNN) [1] R. A. Jacobs, M. I. Jordan, S. J. Nowlan, and G. E. Hinton, “Adaptive mixtures of local experts,” Neural computation, vol. 3, no. 1, pp. 79–87, 1991. [2] H. D. Nguyen and F. Chamroukhi, “An introduction to the practical and theoretical aspects of mixture-of-experts modeling,” ArXiv preprint arXiv:1707.03538v1, Jul 2017. [Online]. Available:https://arxiv.org/abs/1707.03538v1 [3] S. E. Yuksel, J. N. W., and P. D. Gader, “Twenty years of mixture of experts,” IEEE transactions on neural networks and learning systems, vol. 23, no. 8, pp. 1177–1193, 2012. [4] A. Khalili and J. Chen, “Variable selection in finite mixture of regression models,” Journal of the American Statistical association, vol. 102, no. 479, pp. 1025–1038, 2007. [5] N. St¨adler, P. B¨uhlmann, and S. Van De Geer, “l1-penalization for mixture regression models,” Test, vol. 19, no. 2, pp. 209–256, 2010. [6] C. Meynet, “An L1-oracle inequality for the lasso in finite mixture gaussian regression models,” ESAIM: Probability and Statistics, vol. 17, pp. 650–671, 2013. [7] E. Devijver, “An L1-oracle inequality for the lasso in multivariate finite mixture of multivariate gaussian regression models,” ESAIM: Probability and Statistics, vol. 19, pp. 649–670, 2015. [8] F. K. Hui, D. I. Warton, S. D. Foster et al., “Multi-species distribution modeling using penalized mixture of regressions,” The Annals of Applied Statistics, vol. 9, no. 2, pp. 866–882, 2015. [9] K. Lange, Optimization (2nd edition). Springer, 2013. [10] L. R. Lloyd-Jones, H. D. Nguyen, and G. J. McLachlan, “A globally convergent algorithm for lasso-penalized mixture of linear regression models,” arXiv:1603.08326, 2016. [11] A. Khalili, “New estimation and feature selection methods in mixtureof-experts models,” Canadian Journal of Statistics, vol. 38, no. 4, pp.519–539, 2010. [12] A. P. Dempster, N. M. Laird, and D. B. Rubin, “Maximum likelihood from incomplete data via the em algorithm,” J. of the royal statistical society. Series B, pp. 1–38, 1977. 13] W. Jiang and M. A. Tanner, “On the approximation rate of hierarchical mixtures-of-experts for generalized linear models,” Neural computation, vol. 11, no. 5, pp. 1183–1198, 1999. [14] F. Chamroukhi, “Skew-normal mixture of experts,” in 2016 International Joint Conference on Neural Networks (IJCNN), July 2016, pp. 3000–3007. [15] ——, “Skew t mixture of experts,” Neurocomputing, vol. 266, pp. 390 – 408, 2017. [16] ——, “Robust mixture of experts modeling using the t distribution”, Neural Networks, vol. 79, pp. 20–36, 2016. [17] H. D. Nguyen and G. J. McLachlan, “Laplace mixture of linear experts,” Computational Statistics & Data Analysis, vol. 93, pp. 177– 191, 2016. [18] J. Fan and R. Li, “Variable selection via nonconcave penalized likelihood and its oracle properties,” Journal of the American statistical Association, vol. 96, no. 456, pp. 1348–1360, 2001. [19] P. Tseng, “Coordinate ascent for maximizing nondifferentiable concave functions,” 1988. [20] ——, “Convergence of a block coordinate descent method for nondifferentiable minimization,” Journal of optimization theory and applications, vol. 109, no. 3, pp. 475–494, 2001. [21] T. Hastie, R. Tibshirani, and M. Wainwright, Statistical Learning with Sparsity: The Lasso and Generalizations. Taylor & Francis, 2015. [22] D. R. Hunter and R. Li, “Variable selection using mm algorithms,” Annals of statistics, vol. 33, no. 4, p. 1617, 2005.