Regularized Mixtures of Experts in High-Dimensional Data
Mixture of experts (MoE) models are successful neural-network architectures for modeling heterogeneous data in many machine learning problems including regression, clustering and classification. The model learning is in general performed by maximum likelihood estimation (MLE). For high-dimensional...
Đã lưu trong:
Những tác giả chính: | , |
---|---|
Định dạng: | Conference paper |
Ngôn ngữ: | English |
Được phát hành: |
2023
|
Những chủ đề: | |
Truy cập trực tuyến: | https://scholar.dlu.edu.vn/handle/123456789/2335 |
Các nhãn: |
Thêm thẻ
Không có thẻ, Là người đầu tiên thẻ bản ghi này!
|
Thư viện lưu trữ: | Thư viện Trường Đại học Đà Lạt |
---|
Tóm tắt: | Mixture of experts (MoE) models are successful neural-network architectures for modeling heterogeneous
data in many machine learning problems including regression, clustering and classification. The model learning is in general performed by maximum likelihood estimation (MLE). For high-dimensional data, a regularization is needed in order to avoid possible degeneracies or infeasibility of the MLE related to high-dimensional and possibly redundant and correlated features in a high-dimensional scenario. Regularized maximum likelihood estimation allows the selection of a relevant subset of features for prediction and thus encourages sparse solutions. The problem of variable selection is challenging in the modeling of heterogeneous data, including with MoE models. We consider the MoE for heterogeneous regression data and propose a regularized maximum-likelihood estimation with possibly high-dimensional features, based on a dedicated EM algorithm which integrates coordinate ascent updates of the parameters. Unlike state-of-the art regularized MLE for MoE, the proposed modeling does not require an approximate of the regularization. The proposed algorithm allows to automatically obtaining sparse solutions without thresholding, and includes coordinate ascent updates avoiding matrix inversion, and can thus be scalable. An experimental study shows the good performance of the
algorithm in terms of recovering the actual sparse solutions, in parameter estimation, and in clustering of heterogeneous regression data. |
---|