Regularized Mixtures of Experts in High-Dimensional Data
Mixture of experts (MoE) models are successful neural-network architectures for modeling heterogeneous data in many machine learning problems including regression, clustering and classification. The model learning is in general performed by maximum likelihood estimation (MLE). For high-dimensional...
Đã lưu trong:
Những tác giả chính: | , |
---|---|
Định dạng: | Conference paper |
Ngôn ngữ: | English |
Được phát hành: |
2023
|
Những chủ đề: | |
Truy cập trực tuyến: | https://scholar.dlu.edu.vn/handle/123456789/2335 |
Các nhãn: |
Thêm thẻ
Không có thẻ, Là người đầu tiên thẻ bản ghi này!
|
Thư viện lưu trữ: | Thư viện Trường Đại học Đà Lạt |
---|
id |
oai:scholar.dlu.edu.vn:123456789-2335 |
---|---|
record_format |
dspace |
institution |
Thư viện Trường Đại học Đà Lạt |
collection |
Thư viện số |
language |
English |
topic |
Mixtures of Experts high-dimensional data Regularized maximum likelihood estimation |
spellingShingle |
Mixtures of Experts high-dimensional data Regularized maximum likelihood estimation Faicel, Chamroukhi Huỳnh, Bảo Tuyên Regularized Mixtures of Experts in High-Dimensional Data |
description |
Mixture of experts (MoE) models are successful neural-network architectures for modeling heterogeneous
data in many machine learning problems including regression, clustering and classification. The model learning is in general performed by maximum likelihood estimation (MLE). For high-dimensional data, a regularization is needed in order to avoid possible degeneracies or infeasibility of the MLE related to high-dimensional and possibly redundant and correlated features in a high-dimensional scenario. Regularized maximum likelihood estimation allows the selection of a relevant subset of features for prediction and thus encourages sparse solutions. The problem of variable selection is challenging in the modeling of heterogeneous data, including with MoE models. We consider the MoE for heterogeneous regression data and propose a regularized maximum-likelihood estimation with possibly high-dimensional features, based on a dedicated EM algorithm which integrates coordinate ascent updates of the parameters. Unlike state-of-the art regularized MLE for MoE, the proposed modeling does not require an approximate of the regularization. The proposed algorithm allows to automatically obtaining sparse solutions without thresholding, and includes coordinate ascent updates avoiding matrix inversion, and can thus be scalable. An experimental study shows the good performance of the
algorithm in terms of recovering the actual sparse solutions, in parameter estimation, and in clustering of heterogeneous regression data. |
format |
Conference paper |
author |
Faicel, Chamroukhi Huỳnh, Bảo Tuyên |
author_facet |
Faicel, Chamroukhi Huỳnh, Bảo Tuyên |
author_sort |
Faicel, Chamroukhi |
title |
Regularized Mixtures of Experts in High-Dimensional Data |
title_short |
Regularized Mixtures of Experts in High-Dimensional Data |
title_full |
Regularized Mixtures of Experts in High-Dimensional Data |
title_fullStr |
Regularized Mixtures of Experts in High-Dimensional Data |
title_full_unstemmed |
Regularized Mixtures of Experts in High-Dimensional Data |
title_sort |
regularized mixtures of experts in high-dimensional data |
publishDate |
2023 |
url |
https://scholar.dlu.edu.vn/handle/123456789/2335 |
_version_ |
1778233853676093440 |
spelling |
oai:scholar.dlu.edu.vn:123456789-23352023-06-14T04:22:51Z Regularized Mixtures of Experts in High-Dimensional Data Faicel, Chamroukhi Huỳnh, Bảo Tuyên Mixtures of Experts high-dimensional data Regularized maximum likelihood estimation Mixture of experts (MoE) models are successful neural-network architectures for modeling heterogeneous data in many machine learning problems including regression, clustering and classification. The model learning is in general performed by maximum likelihood estimation (MLE). For high-dimensional data, a regularization is needed in order to avoid possible degeneracies or infeasibility of the MLE related to high-dimensional and possibly redundant and correlated features in a high-dimensional scenario. Regularized maximum likelihood estimation allows the selection of a relevant subset of features for prediction and thus encourages sparse solutions. The problem of variable selection is challenging in the modeling of heterogeneous data, including with MoE models. We consider the MoE for heterogeneous regression data and propose a regularized maximum-likelihood estimation with possibly high-dimensional features, based on a dedicated EM algorithm which integrates coordinate ascent updates of the parameters. Unlike state-of-the art regularized MLE for MoE, the proposed modeling does not require an approximate of the regularization. The proposed algorithm allows to automatically obtaining sparse solutions without thresholding, and includes coordinate ascent updates avoiding matrix inversion, and can thus be scalable. An experimental study shows the good performance of the algorithm in terms of recovering the actual sparse solutions, in parameter estimation, and in clustering of heterogeneous regression data. 2023-05-19T09:57:57Z 2023-05-19T09:57:57Z 2018 Conference paper Bài báo đăng trên KYHT quốc tế (có ISBN) https://scholar.dlu.edu.vn/handle/123456789/2335 en 2018 International Joint Conference on Neural Networks (IJCNN) [1] R. A. Jacobs, M. I. Jordan, S. J. Nowlan, and G. E. Hinton, “Adaptive mixtures of local experts,” Neural computation, vol. 3, no. 1, pp. 79–87, 1991. [2] H. D. Nguyen and F. Chamroukhi, “An introduction to the practical and theoretical aspects of mixture-of-experts modeling,” ArXiv preprint arXiv:1707.03538v1, Jul 2017. [Online]. Available:https://arxiv.org/abs/1707.03538v1 [3] S. E. Yuksel, J. N. W., and P. D. Gader, “Twenty years of mixture of experts,” IEEE transactions on neural networks and learning systems, vol. 23, no. 8, pp. 1177–1193, 2012. [4] A. Khalili and J. Chen, “Variable selection in finite mixture of regression models,” Journal of the American Statistical association, vol. 102, no. 479, pp. 1025–1038, 2007. [5] N. St¨adler, P. B¨uhlmann, and S. Van De Geer, “l1-penalization for mixture regression models,” Test, vol. 19, no. 2, pp. 209–256, 2010. [6] C. Meynet, “An L1-oracle inequality for the lasso in finite mixture gaussian regression models,” ESAIM: Probability and Statistics, vol. 17, pp. 650–671, 2013. [7] E. Devijver, “An L1-oracle inequality for the lasso in multivariate finite mixture of multivariate gaussian regression models,” ESAIM: Probability and Statistics, vol. 19, pp. 649–670, 2015. [8] F. K. Hui, D. I. Warton, S. D. Foster et al., “Multi-species distribution modeling using penalized mixture of regressions,” The Annals of Applied Statistics, vol. 9, no. 2, pp. 866–882, 2015. [9] K. Lange, Optimization (2nd edition). Springer, 2013. [10] L. R. Lloyd-Jones, H. D. Nguyen, and G. J. McLachlan, “A globally convergent algorithm for lasso-penalized mixture of linear regression models,” arXiv:1603.08326, 2016. [11] A. Khalili, “New estimation and feature selection methods in mixtureof-experts models,” Canadian Journal of Statistics, vol. 38, no. 4, pp.519–539, 2010. [12] A. P. Dempster, N. M. Laird, and D. B. Rubin, “Maximum likelihood from incomplete data via the em algorithm,” J. of the royal statistical society. Series B, pp. 1–38, 1977. 13] W. Jiang and M. A. Tanner, “On the approximation rate of hierarchical mixtures-of-experts for generalized linear models,” Neural computation, vol. 11, no. 5, pp. 1183–1198, 1999. [14] F. Chamroukhi, “Skew-normal mixture of experts,” in 2016 International Joint Conference on Neural Networks (IJCNN), July 2016, pp. 3000–3007. [15] ——, “Skew t mixture of experts,” Neurocomputing, vol. 266, pp. 390 – 408, 2017. [16] ——, “Robust mixture of experts modeling using the t distribution”, Neural Networks, vol. 79, pp. 20–36, 2016. [17] H. D. Nguyen and G. J. McLachlan, “Laplace mixture of linear experts,” Computational Statistics & Data Analysis, vol. 93, pp. 177– 191, 2016. [18] J. Fan and R. Li, “Variable selection via nonconcave penalized likelihood and its oracle properties,” Journal of the American statistical Association, vol. 96, no. 456, pp. 1348–1360, 2001. [19] P. Tseng, “Coordinate ascent for maximizing nondifferentiable concave functions,” 1988. [20] ——, “Convergence of a block coordinate descent method for nondifferentiable minimization,” Journal of optimization theory and applications, vol. 109, no. 3, pp. 475–494, 2001. [21] T. Hastie, R. Tibshirani, and M. Wainwright, Statistical Learning with Sparsity: The Lasso and Generalizations. Taylor & Francis, 2015. [22] D. R. Hunter and R. Li, “Variable selection using mm algorithms,” Annals of statistics, vol. 33, no. 4, p. 1617, 2005. |