Mining Hidden Topics from Newspaper Quotations: The COVID-19 Pandemic

In this paper, we extract quotations from Al Jazeera’s news articles containing keywords related to the COVID-19 pandemic. We apply Latent Dirichlet allocation (LDA), coherence measures, and clustering algorithms to unsupervisedly explore latent topics from the dataset of about 3400 quotations to se...

Πλήρης περιγραφή

Αποθηκεύτηκε σε:
Λεπτομέρειες βιβλιογραφικής εγγραφής
Κύριος συγγραφέας: Tạ, Hoàng Thắng
Μορφή: Conference paper
Γλώσσα:English
Έκδοση: 2023
Διαθέσιμο Online:http://scholar.dlu.edu.vn/handle/123456789/2006
Ετικέτες: Προσθήκη ετικέτας
Δεν υπάρχουν, Καταχωρήστε ετικέτα πρώτοι!
Thư viện lưu trữ: Thư viện Trường Đại học Đà Lạt
Περιγραφή
Περίληψη:In this paper, we extract quotations from Al Jazeera’s news articles containing keywords related to the COVID-19 pandemic. We apply Latent Dirichlet allocation (LDA), coherence measures, and clustering algorithms to unsupervisedly explore latent topics from the dataset of about 3400 quotations to see how coronavirus impacts human beings. By combining noun phrases as inputs before the training and Cv measure for coherence values, we obtain an average coherence value of 0.66 with a least average number of topics of 24.8. The result covers some of the top issues that our world has been facing against the COVID-19 pandemic.