Irrelevancy Detection in Multilingual Tourism Review
Kata Kunci:
Tourism Reviews, Multilingual Embeddings, Irrelevancy Detection, Sentiment Analysis, Knowledge DistillationAbstrak
This study investigates irrelevancy within multilingual tourism reviews, focusing on how off-topic or ambiguous user-generated content can undermine reliable insight for travelers. A consolidated dataset is constructed by combining a publicly available resource from Kaggle with additional posts acquired from X (formerly Twitter). Each review is manually labeled as relevant or ambiguous to capture instances where the content fails to clearly address travel or hotel-related topics. We employ a multilingual BERT embedding model to encode the diverse language inputs, enriched with a sentiment vector derived via knowledge distillation from twitter-xlm-roberta-base to DistilBERT. A gating mechanism then fuses the semantic and emotional signals, highlighting parts of each review most influenced by user attitudes. The final classification stage involves fine-tuning a BERT-based network to distinguish between unambiguous and ambiguous content. Experimental comparisons with a Monolingual BERT approach and a baseline (multilingual embedding without sentiment) reveal that incorporating sentiment features yields consistent improvements in accuracy, precision, recall, and F1-score. This outcome underscores the importance of capturing emotional cues to mitigate errors arising from partial dissatisfaction, unclear references, or cultural nuances. From a practical standpoint, the results point to potential applications in automated moderation, improved recommendation systems, and policy guidelines for tourism platforms. Overall, this work demonstrates that sentiment-aware, multilingual models can enhance detection of irrelevancy and ambiguity, fostering more trustworthy and context-rich online review ecosystems in the travel domain.
Unduhan
Referensi
B. Thompson, S. G. Roberts, and G. Lupyan, ‘Cultural influences on word meanings revealed through large-scale semantic alignment’, Nature Human Behaviour, vol. 4, no. 10, pp. 1029–1038, 2020.
Y. Chen et al., ‘Cross-modal Ambiguity Learning for Multimodal Fake News Detection’, in Proceedings of the ACM Web Conference 2022, Virtual Event, Lyon France: ACM, Apr. 2022, pp. 2897–2905. doi: 10.1145/3485447.3511968.
A. Ferrari and A. Esuli, ‘An NLP approach for cross-domain ambiguity detection in requirements engineering’, Autom Softw Eng, vol. 26, no. 3, pp. 559–598, Sep. 2019, doi: 10.1007/s10515-019-00261-7.
F. Peng, X. Wu, Y. Zhao, and Y. Li, ‘Anaphora Ambiguity Detection Method Based on Cross-domain Pronoun Substitution (S).’, in SEKE, 2023, pp. 646–649. Accessed: Jan. 17, 2025. [Online]. Available: https://ksiresearch.org/seke/seke23paper/paper173.pdf
F. Pittke, H. Leopold, and J. Mendling, ‘Automatic detection and resolution of lexical ambiguity in process models’, IEEE Transactions on Software Engineering, vol. 41, no. 6, pp. 526–544, 2015.
M. Figlerowicz and M. Figlerowicz, ‘Multilingual style’, Textual Practice, vol. 35, no. 6, pp. 1015–1036, Jun. 2021, doi: 10.1080/0950236X.2021.1936760.
S. Seo, C. Kim, H. Kim, K. Mo, and P. Kang, ‘Comparative study of deep learning-based sentiment classification’, IEEE Access, vol. 8, pp. 6861–6875, 2020.
O. Araque, I. Corcuera-Platas, J. F. Sánchez-Rada, and C. A. Iglesias, ‘Enhancing deep learning sentiment analysis with ensemble techniques in social applications’, Expert Systems with Applications, vol. 77, pp. 236–246, 2017.
Z. Gao, A. Feng, X. Song, and X. Wu, ‘Target-dependent sentiment classification with BERT’, Ieee Access, vol. 7, pp. 154290–154299, 2019.
J. Yu and J. Jiang, ‘Adapting BERT for target-oriented multimodal sentiment classification’, IJCAI, 2019. Accessed: Feb. 01, 2025. [Online]. Available: https://ink.library.smu.edu.sg/sis_research/4441/
M. Hu and B. Liu, ‘Mining and summarizing customer reviews’, in Proceedings of the tenth ACM SIGKDD international conference on Knowledge discovery and data mining, Seattle WA USA: ACM, Aug. 2004, pp. 168–177. doi: 10.1145/1014052.1014073.
M. Ott, Y. Choi, C. Cardie, and J. T. Hancock, ‘Finding Deceptive Opinion Spam by Any Stretch of the Imagination’, Jul. 22, 2011, arXiv: arXiv:1107.4557. doi: 10.48550/arXiv.1107.4557.
Z. Xiang, Z. Schwartz, J. H. Gerdes Jr, and M. Uysal, ‘What can big data and text analytics tell us about hotel guest experience and satisfaction?’, International journal of hospitality management, vol. 44, pp. 120–130, 2015.
S. M. Kumar, N. Reddy, A. Malapati, and L. Kumar, ‘An Ensemble Model for Sentiment Classification on Code-Mixed Data in Dravidian Languages.’, in FIRE (Working Notes), 2021, pp. 1085–1093. Accessed: Feb. 01, 2025. [Online]. Available: https://easychair.org/publications/preprint_download/sKB5
X. Chen, Y. Sun, B. Athiwaratkun, C. Cardie, and K. Weinberger, ‘Adversarial deep averaging networks for cross-lingual sentiment classification’, Transactions of the Association for Computational Linguistics, vol. 6, pp. 557–570, 2018.
X.-Y. Zhang, S. Wang, and X. Yun, ‘Bidirectional active learning: A two-way exploration into unlabeled and labeled data set’, IEEE transactions on neural networks and learning systems, vol. 26, no. 12, pp. 3034–3044, 2015.
H. Xiao and L. Luo, ‘An Automatic Sentiment Analysis Method for Short Texts Based on Transformer-BERT Hybrid Model’, IEEE Access, 2024, Accessed: Feb. 01, 2025. [Online]. Available: https://ieeexplore.ieee.org/abstract/document/10580959/
M. S. Viñán-Ludeña and L. M. De Campos, ‘Discovering a tourism destination with social media data: BERT-based sentiment analysis’, JHTT, vol. 13, no. 5, pp. 907–921, Nov. 2022, doi: 10.1108/JHTT-09-2021-0259.
W. Wang, L. Chen, K. Thirunarayan, and A. P. Sheth, ‘Cursing in English on twitter’, in Proceedings of the 17th ACM conference on Computer supported cooperative work & social computing, Baltimore Maryland USA: ACM, Feb. 2014, pp. 415–425. doi: 10.1145/2531602.2531734.
Y. Zhu, W. Zheng, and H. Tang, ‘Interactive Dual Attention Network for Text Sentiment Classification’, Computational Intelligence and Neuroscience, vol. 2020, pp. 1–11, Nov. 2020, doi: 10.1155/2020/8858717.
S. Dong and C. Liu, ‘Sentiment Classification for Financial Texts Based on Deep Learning’, Computational Intelligence and Neuroscience, vol. 2021, no. 1, p. 9524705, Jan. 2021, doi: 10.1155/2021/9524705.
L. Khan, A. Amjad, N. Ashraf, and H.-T. Chang, ‘Multi-class sentiment analysis of urdu text using multilingual BERT’, Scientific Reports, vol. 12, no. 1, p. 5436, 2022.
K. R. Mabokela, T. Celik, and M. Raborife, ‘Multilingual sentiment analysis for under-resourced languages: a systematic review of the landscape’, IEEE Access, vol. 11, pp. 15996–16020, 2022.
M. Pota, M. Ventura, H. Fujita, and M. Esposito, ‘Multilingual evaluation of pre-processing for BERT-based sentiment analysis of tweets’, Expert Systems with Applications, vol. 181, p. 115119, 2021.
S. K. Akpatsa et al., ‘Online News Sentiment Classification Using DistilBERT.’, Journal of Quantum Computing, vol. 4, no. 1, 2022, Accessed: Feb. 05, 2025. [Online]. Available: https://cdn.techscience.cn/ueditor/files/jqc/TSP_JQC-4-1/TSP_JQC_26658/TSP_JQC_26658.pdf
V. Dogra, A. Singh, S. Verma, Kavita, N. Z. Jhanjhi, and M. N. Talib, ‘Analyzing DistilBERT for Sentiment Classification of Banking Financial News’, in Intelligent Computing and Innovation on Data Science, vol. 248, S.-L. Peng, S.-Y. Hsieh, S. Gopalakrishnan, and B. Duraisamy, Eds., in Lecture Notes in Networks and Systems, vol. 248. , Singapore: Springer Nature Singapore, 2021, pp. 501–510. doi: 10.1007/978-981-16-3153-5_53.
M. Jojoa, P. Eftekhar, B. Nowrouzi-Kia, and B. Garcia-Zapirain, ‘Natural language processing analysis applied to COVID-19 open-text opinions using a distilBERT model for sentiment categorization’, AI & Soc, vol. 39, no. 3, pp. 883–890, Jun. 2024, doi: 10.1007/s00146-022-01594-w.
S. Ruder, I. Vulić, and A. Søgaard, ‘A survey of cross-lingual word embedding models’, Journal of Artificial Intelligence Research, vol. 65, pp. 569–631, 2019.
Z. Xiang and U. Gretzel, ‘Role of social media in online travel information search’, Tourism management, vol. 31, no. 2, pp. 179–188, 2010.
A. Abdi, S. M. Shamsuddin, S. Hasan, and J. Piran, ‘Deep learning-based sentiment classification of evaluative text based on Multi-feature fusion’, Information Processing & Management, vol. 56, no. 4, pp. 1245–1259, 2019.
N. Aldunate, M. Villena-González, F. Rojas-Thomas, V. López, and C. A. Bosman, ‘Mood detection in ambiguous messages: the interaction between text and emoticons’, Frontiers in psychology, vol. 9, p. 423, 2018.
V. Sanh, L. Debut, J. Chaumond, and T. Wolf, ‘DistilBERT, a distilled version of BERT: smaller, faster, cheaper and lighter’, Mar. 01, 2020, arXiv: arXiv:1910.01108. doi: 10.48550/arXiv.1910.01108.
Unduhan
Diterbitkan
Cara Mengutip
Terbitan
Bagian
Lisensi
Hak Cipta (c) 2024 Journal Informatics Nivedita

Artikel ini berlisensiCreative Commons Attribution-NonCommercial-ShareAlike 4.0 International License.