Incorporating Natural Language-Based and Sequence-Based Features to Predict Protein Sumoylation Sites

Lecture Notes in Networks and Systems (LNNS, volume 734); CITA: Conference on Information Technology and its Applications; pp: 74-88.

में बचाया:
ग्रंथसूची विवरण
मुख्य लेखकों: Tran, Thi Xuan, Nguyen, Van Nui, Le, Nguyen Quoc Khanh
स्वरूप: Bài viết
भाषा:English
प्रकाशित: Springer Nature 2023
विषय:
SVM
ऑनलाइन पहुंच:https://link.springer.com/chapter/10.1007/978-3-031-36886-8_7
http://elib.vku.udn.vn/handle/123456789/2745
टैग : टैग जोड़ें
कोई टैग नहीं, इस रिकॉर्ड को टैग करने वाले पहले व्यक्ति बनें!
Thư viện lưu trữ: Trường Đại học Công nghệ Thông tin và Truyền thông Việt Hàn - Đại học Đà Nẵng
id oai:elib.vku.udn.vn:123456789-2745
record_format dspace
spelling oai:elib.vku.udn.vn:123456789-27452023-09-26T02:21:03Z Incorporating Natural Language-Based and Sequence-Based Features to Predict Protein Sumoylation Sites Tran, Thi Xuan Nguyen, Van Nui Le, Nguyen Quoc Khanh SUMOylation sites prediction Machine Learning Word2Vec Random forest XGBoost SVM Lecture Notes in Networks and Systems (LNNS, volume 734); CITA: Conference on Information Technology and its Applications; pp: 74-88. The incidence of thyroid cancer and breast cancer is increasing every year, and the specific pathogenesis is unclear. Post-translational modifications are an important regulatory mechanism that affects the function of almost all proteins. They are essential for a diverse and well-functioning proteome and can integrate metabolism with physiological and pathological processes. In recent years, post-translational modifications have become a research hotspot, with methylation, phosphorylation, acetylation and succinylation being the main focus. SUMOylated proteins are predominantly localized in the nucleus, and SUMO regulates nuclear processes, including cell cycle control and DNA repair. SUMOylated proteins are predominantly localized in the nucleus, and SUMO regulates nuclear processes, including cell cycle control and DNA repair. SUMOylation has been increasingly implicated in cancer, Alzheimer’s, and Parkinson’s diseases. Therefore, identification and characterization SUMOylation sites are essential for determining modification-specific proteomics. This study aims to propose a novel schema for predicting protein SUMOylation sites based on the incorporation of natural language features (Word2Vec) and sequence-based features. In addition, the novel model, called RSX_SUMO, is proposed for the prediction of protein SUMOylation sites. Our experiments reveal that the performance of RSX_SUMO model achieves the highest performance in both five-fold cross-validation and independent testing, obtain the performance on independent testing with acccuracy at 88.6% and MCC value of 0.743. In addition, the comparison with several existing prediction models show that our proposed model outperforms and obtains the highest performance. We hope that our findings would provide effective suggestions and be a great helpful for researchers related to their related studies. 2023-09-26T02:20:55Z 2023-09-26T02:20:55Z 2023-07 Working Paper 978-3-031-36886-8 https://link.springer.com/chapter/10.1007/978-3-031-36886-8_7 http://elib.vku.udn.vn/handle/123456789/2745 en application/pdf Springer Nature
institution Trường Đại học Công nghệ Thông tin và Truyền thông Việt Hàn - Đại học Đà Nẵng
collection DSpace
language English
topic SUMOylation sites prediction
Machine Learning
Word2Vec
Random forest
XGBoost
SVM
spellingShingle SUMOylation sites prediction
Machine Learning
Word2Vec
Random forest
XGBoost
SVM
Tran, Thi Xuan
Nguyen, Van Nui
Le, Nguyen Quoc Khanh
Incorporating Natural Language-Based and Sequence-Based Features to Predict Protein Sumoylation Sites
description Lecture Notes in Networks and Systems (LNNS, volume 734); CITA: Conference on Information Technology and its Applications; pp: 74-88.
format Working Paper
author Tran, Thi Xuan
Nguyen, Van Nui
Le, Nguyen Quoc Khanh
author_facet Tran, Thi Xuan
Nguyen, Van Nui
Le, Nguyen Quoc Khanh
author_sort Tran, Thi Xuan
title Incorporating Natural Language-Based and Sequence-Based Features to Predict Protein Sumoylation Sites
title_short Incorporating Natural Language-Based and Sequence-Based Features to Predict Protein Sumoylation Sites
title_full Incorporating Natural Language-Based and Sequence-Based Features to Predict Protein Sumoylation Sites
title_fullStr Incorporating Natural Language-Based and Sequence-Based Features to Predict Protein Sumoylation Sites
title_full_unstemmed Incorporating Natural Language-Based and Sequence-Based Features to Predict Protein Sumoylation Sites
title_sort incorporating natural language-based and sequence-based features to predict protein sumoylation sites
publisher Springer Nature
publishDate 2023
url https://link.springer.com/chapter/10.1007/978-3-031-36886-8_7
http://elib.vku.udn.vn/handle/123456789/2745
_version_ 1849200732862939136