Analisis Kebutuhan Dataset Algoritma Speech To Text Bahasa Sasak Menggunakan Perbandingan Data Suara Bahasa Inggris Pada Metode CNN
Analysis of Sasak Language Speech To Text Algorithm Dataset Requirements Using English Voice Data Comparison on CNN Method
Abstract
Currently, there have been many studies on speech recognition or speech to text. Speech to text is a technology used to convert human speech or voice and translate it into written text. Some speech to text research that has been done, has obtained an accuracy rate of up to 95% with English datasets using the Mel Frequency Coefficient (MFCC) feature extraction method and the Convolutional Neural Network (CNN) classification method. This research will apply similar algorithms, namely MFCC and CNN by displaying the training process and the resulting accuracy in its processing with an analysis scenario using datasets in multiples of 50, 150, 250, and 350 voice data. The results obtained have achieved 95% accuracy on the training data of 350 English voice data. The analysis carried out is to find the best composition on the Sasak language dataset by
comparing the accuracy of the test results with the accuracy of the previous training results on the English dataset. From the training and testing process that has been carried out, the results obtained show that the best dataset composition for Sasak language is with nine speakers. This illustrates that the Sasak language requires less human resources compared to the English dataset which involves more than 30 speakers in
50 words. This has a positive impact on saving resources and time required in the development of Sasak language speech recognition system.