Indonesian to Bengkulu Malay Statistical Machine Translation System
DOI:
https://doi.org/10.59395/ijadis.v5i2.1323Keywords:
Bengkulu Malay Language, BLEU, Parallel Corpus, Statistical Machine Translation, Indonesian language, natural language processingAbstract
Machine translation is an automatic tool that can process language translation from one language to another. This research focuses on developing Statistical Machine Translation (SMT) from Indonesian to Bengkulu Malay and evaluating the quality of the machine translation output. The training and testing data consist of parallel corpora taken from Bengkulu Malay dictionaries and online resources for Indonesian corpora, with a total of 5261 parallel sentence pairs. Several steps are performed in SMT. The initial step is preprocessing, aimed at preparing the parallel corpus. After that, a training phase is conducted, where the parallel corpus is processed to build language and translation models. Subsequently, a testing phase is carried out, followed by an evaluation phase. Based on the research results, various factors influence the quality of SMT translation output. The most important factor is the quantity and quality of the parallel corpus used as the foundation for developing translation and language models. The machine translation output is automatically evaluated using the Bilingual Evaluation Understudy (BLEU), indicating accuracy values observed when using 500 sentences, 1500 sentences, 2500 sentences, 4000 sentences, and 5261 sentences are 80.56%, 90.86%, 92.50%, 92.91%, and 94.48% respectively.
Downloads
References
J. Zakaria, I. Yuniati, and E. F. Wijaya, "Implikatur Tegur Sapa Dalam Bahasa Melayu Bengkulu," Lit. J. Bahasa, Sastra dn Pengajaran, vol. 1, no. 2, pp. 74-78, 2021, doi: https://doi.org/10.31539/literatur.v1i2.2401. https://doi.org/10.31539/literatur.v1i2.2401
Asrif, "Pembinaan dan Pengembangan Bahasa Daerah dalam Memantapkan Kedudukan dan Fungsi Bahasa," pp. 11-23, 1945. https://doi.org/10.26499/mab.v4i1.183
N. H. M. Ningsih, D. E. C. Wardhana, and S. Supadi, "Derivasi Bahasa Melayu Bengkulu," J. Ilm. KORPUS, vol. 4, no. 2, pp. 224-230, 2020, doi: 10.33369/jik.v4i2.8361. https://doi.org/10.33369/jik.v4i2.8361
F. Senovil, "Morfofonemik Bahasa Melayu Bengkulu," KLITIKA J. Ilm. Pendidik. Bhs. dan Sastra Indones., vol. 2, no. 2, pp. 165-178, 2020, doi: https://doi.org/10.32585/klitika.v2i2.1037.
R. Afria, J. Izar, R. D. Anggraini, and D. H. Fitri, "Analisis Komparatif Bahasa Bengkulu, Rejang, Dan Enggano," Ling. Fr. Bahasa, Sastra, dan Pengajarannya, vol. 5, no. 1, p. 1, 2021, doi: 10.30651/lf.v5i1.4274. https://doi.org/10.30651/lf.v5i1.4274
D. E. C. Wardhana, D. Kusumaningsih, and A. C. S. Dewi, " Model of Perception and Critical Language Style of Academic Community at University of Bengkulu During Coronavirus Disease (COVID) 19 Epidemic to Realize the Freedom of Learning ," vol. 485, no. Iclle, pp. 223-227, 2020, doi: 10.2991/assehr.k.201109.038. https://doi.org/10.2991/assehr.k.201109.038
A. Sudarsono, "Jaringan Syaraf Tiruan Untuk Memprediksi Laju Pertumbuhan Penduduk Menggunakan Metode Bacpropagation (Studi Kasus Di Kota Bengkulu)," J. Media Infotama, vol. 12, no. 1, pp. 61-69, 2016, doi: 10.37676/jmi.v12i1.273. https://doi.org/10.37676/jmi.v12i1.273
E. Widianto, "Pemertahanan Bahasa Daerah melalui Pembelajaran dan Kegiatan di Sekolah," J. Kredo, vol. (1) 2, pp. 1-13, 2018. https://doi.org/10.24176/kredo.v1i1.1757
R. Darwis, H. Sujaini, and R. D. Nyoto, "Peningkatan Mesin Penerjemah Statistik dengan Menambah Kuantitas Korpus Monolingual ( Studi Kasus?: Bahasa Indonesia - Sunda )," vol. 7, no. 1, pp. 27-32, 2019. https://doi.org/10.26418/justin.v7i1.27254
A. E. P. Lesatari, A. Ardiyanti, and I. Asror, "Phrase Based Statistical Machine Translation Javanese-Indonesian," J. Media Inform. Budidarma, vol. 5, no. 2, pp. 378-386, 2021, doi: http://dx.doi.org/10.30865/mib.v5i2.2812. https://doi.org/10.30865/mib.v5i2.2812
Permata and Z. Abidin, "Statistical Machine Translation Pada Bahasa Lampung Dialek Api Ke Bahasa Indonesia," J. Media Inform. Budidarma, vol. 4, no. 3, pp. 519-528, 2020, doi: http://dx.doi.org/10.30865/mib.v4i3.2116. https://doi.org/10.30865/mib.v4i3.2116
Q. A. Agigi and A. A. Suryani, "Statistical Machine Translation Muna to Indonesia Language," J. Tek. Inform. dan Sist. Inf., vol. 8, no. 4, pp. 2173-2186, 2021, doi: 10.35957/jatisi.v8i4.1149. https://doi.org/10.35957/jatisi.v8i4.1149
M. S. Alam and A. A. Suryani, "Minang and Indonesian Phrase-Based Statistical Machine Translation," J. Informatics Telecommun. Eng., vol. 5, no. 1, pp. 216-224, 2021, doi: https://doi.org/10.31289/jite.v5i1.5308. https://doi.org/10.31289/jite.v5i1.5308
M. F. Khaikal and A. A. Suryani, "Statistical Machine Translation Dayak Language - Indonesia Language," Inform. Mulawarman J. Ilm. Ilmu Komput., vol. 16, no. 1, pp. 49-56, 2021, doi: http://dx.doi.org/10.30872/jim.v16i1.5315. https://doi.org/10.30872/jim.v16i1.5315
S. M. A. Razak, M. S. A. Seman, W. Ali, W. Y. Wan, N. H. Nizan, and M. Noor, "Malay manuscripts transliteration using statistical machine translation (SMT)," Proc. - 2019 1st Int. Conf. Artif. Intell. Data Sci. AiDAS 2019, pp. 137-141, 2019, doi: 10.1109/AiDAS47888.2019.8970867. https://doi.org/10.1109/AiDAS47888.2019.8970867
A. Jannesari, "Statistical Machine Translation Outperforms Neural Machine Translation in Software Engineering?: Why and How," pp. 3-12, 2020, doi: 10.1145/3416506.3423576. https://doi.org/10.1145/3416506.3423576
N. S. Khan, A. Abid, and K. Abid, "A Novel Natural Language Processing (NLP)-Based Machine Translation Model for English to Pakistan Sign Language Translation," Cognit. Comput., vol. 12, no. 4, pp. 748-765, 2020, doi: 10.1007/s12559-020-09731-7. https://doi.org/10.1007/s12559-020-09731-7
M. N. Amin, A. B. P. Negara, and A. Perwitasari, "Implementasi Mesin Penerjemah Statistik Pada Aplikasi Chatting Berbasis AndroidDengan Moses Decoder," Infotekjar J. Nas. Inform. dan Teknol. Jar., vol. 6, no. 1, pp. 155-164, 2021, [Online]. Available: https://jurnal.uisu.ac.id/index.php/infotekjar/article/view/4025/0.
A. M. Gezmu, A. Nürnberger, and T. B. Bati, "Extended Parallel Corpus for Amharic-English Machine Translation," 2022 Lang. Resour. Eval. Conf. Lr. 2022, pp. 6644-6653, 2022.
J. Liu, "Comparing and Analyzing Cohesive Devices of SMT and NMT from Chinese to English: A Diachronic Approach," Open J. Mod. Linguist., vol. 10, no. 06, pp. 765-772, 2020, doi: 10.4236/ojml.2020.106046. https://doi.org/10.4236/ojml.2020.106046
M. Wahyuni, H. Sujaini, and H. Muhardi, "Pengaruh Kuantitas Korpus Monolingual Terhadap Akurasi Mesin Penerjemah Statistik," J. Sist. dan Teknol. Inf., vol. 7, no. 1, pp. 20-26, 2019, doi: https://dx.doi.org/10.26418/justin.v7i1.27241.
https://doi.org/10.26418/justin.v7i1.27241
Z. Yu, Z. Yu, J. Guo, Y. Huang, and Y. Wen, "Efficient Low-Resource Neural Machine Translation with," vol. 19, no. 3, pp. 1-13, 2020.
https://doi.org/10.1145/3365244
H. Yuliansyah, S. A. Mulasari, S. Sulistyawati, F. A. Ghozali, and B. Sudarsono, "Sentiment Analysis of the Waste Problem based on YouTube comments using VADER and Deep Translator," J. Media Inform. Budidarma, vol. 8, pp. 663-673, 2024, doi: 10.30865/mib.v8i1.6918.
https://doi.org/10.30865/mib.v8i1.6918
M. Popel et al., "Transforming machine translation: a deep learning system reaches news translation quality comparable to human professionals," Nat. Commun., vol. 11, no. 1, pp. 1-15, 2020, doi: 10.1038/s41467-020-18073-9.
https://doi.org/10.1038/s41467-020-18073-9
H. Yuliansyah, M. Iqbal, and A. Latiffi, "Sentiment Analysis of the Sheikh Zayed Grand Mosque ' s Visitor Reviews on Google Maps Using the VADER Method," vol. 5, no. 1, 2024, doi: 10.59395/ijadis.v5i1.1320.
https://doi.org/10.59395/ijadis.v5i1.1320
F. Rahutomo, A. A. Septarina, M. Sarosa, A. Setiawan, and M. M. Huda, "A review on Indonesian machine translation," J. Phys. Conf. Ser., vol. 1402, no. 7, 2019, doi: 10.1088/1742-6596/1402/7/077040.
https://doi.org/10.1088/1742-6596/1402/7/077040
A. Bandyopadhyay, I. Kundu, A. Chakraborty, R. Kumar, A. Kumar, and S. Sabut, Blood Donation Management System Using Android Application, vol. 728 LNEE. 2021.
https://doi.org/10.1007/978-981-33-4866-0_33
I. Factor, "RNN Encoder or Decoder-Based Phrase Representation Learning For," no. 1, pp. 1-10, 2023.
R. Achmad, Y. Tokoro, J. Haurissa, and A. Wijanarko, "Recurrent Neural Network-Gated Recurrent Unit for Indonesia-Sentani Papua Machine Translation," J. Inf. Syst. Informatics, vol. 5, no. 4, pp. 1449-1460, 2023, doi: 10.51519/journalisi.v5i4.597.
https://doi.org/10.51519/journalisi.v5i4.597
T. I. Ramadhan, N. G. Ramadhan, and A. Supriatman, "Implementation of Neural Machine Translation for English-Sundanese Language using Long Short Term Memory (LSTM)," Build. Informatics, Technol. Sci., vol. 4, no. 3, pp. 1438-1446, 2022, doi: 10.47065/bits.v4i3.2614.
https://doi.org/10.47065/bits.v4i3.2614
D. W. Otter, J. R. Medina, and J. K. Kalita, "A Survey of the Usages of Deep Learning for Natural Language Processing," IEEE Trans. Neural Networks Learn. Syst., vol. 32, no. 2, pp. 604-624, 2021, doi: 10.1109/TNNLS.2020.2979670.
https://doi.org/10.1109/TNNLS.2020.2979670
Y. Dong, "RNN Neural Network Model for Chinese-Korean Translation Learning," Secur. Commun. Networks, vol. 2022, 2022, doi: 10.1155/2022/6848847.
https://doi.org/10.1155/2022/6848847
J. Xiao and Z. Zhou, "Research Progress of RNN Language Model," Proc. 2020 IEEE Int. Conf. Artif. Intell. Comput. Appl. ICAICA 2020, pp. 1285-1288, 2020, doi: 10.1109/ICAICA50127.2020.9182390.
https://doi.org/10.1109/ICAICA50127.2020.9182390
A. Othman and M. Jemni, "Designing high accuracy statistical machine translation for sign language using parallel corpus: Case study English and American Sign language," J. Inf. Technol. Res., vol. 12, no. 2, pp. 134-158, 2019, doi: 10.4018/JITR.2019040108.
https://doi.org/10.4018/JITR.2019040108
Z. Abidin, "Penerapan Neural Machine Translation untuk Eksperimen Penerjemahan secara Otomatis pada Bahasa Lampung - Indonesia," Pros. Semin. Nas. Metod. Kuantitatif, no. 978, pp. 53-68, 2017.
M. Gerdy Asparilla, H. Sujaini, R. Dwi Nyoto, and J. H. Hadari Nawawi, "Perbaikan Kualitas Korpus untuk Meningkatkan Kualitas Mesin Penerjemah Statistik (Studi Kasus?: Bahasa Indonesia-Jawa Krama)," vol. 1, no. 2, 2018.
Y. Jarob, H. Sujaini, and N. Safriadi, "Uji Akurasi Penerjemahan Bahasa Indonesia - Dayak Taman Dengan Penandaan Kata Dasar Dan Imbuhan," J. Edukasi dan Penelit. Inform., vol. 2, no. 2, pp. 78-83, 2016, doi: 10.26418/jp.v2i2.16520.
https://doi.org/10.26418/jp.v2i2.16520
H. Sujaini, "Peningkatan Akurasi Penerjemah Bahasa Daerah dengan Optimasi Korpus Paralel," J. Nas. Tek. Elektro dan Teknol. Inf., vol. 7, no. 1, 2018, doi: 10.22146/jnteti.v7i1.394.
Published
How to Cite
Issue
Section
License
Copyright (c) 2024 Bella Okta Sari Miranda, Herman Yuliansyah, Muhammad Kunta Biddinika
This work is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License.