Abstract This study is prepared to examine the effects of Turkish character usage on text data by using multiple classifiers. Regression Classifiers, SVM, NB-Classifiers, and ANN are frequently used in supervised learning methods, especially in classification problems. Regression classifiers generally come in two types: as Linear and Logistic. There are also more than one type of Naive Bayes classifier. In our study, after mentioning the properties of Linear Regression and Logistic Regression classifiers in general terms, why Logistic Regression is much more suitable for this study is explained. Then, with the usage of "Logistic Regression", "LinearSVC", "MultinomialNB", "ComplementNB", "BernoulliNB" and "Perceptron" classifiers, the analyzing part starts. Our datasets consist of abstracts-parts from 64 Turkish articles, which have 4 different classes as Physical Sciences, Social Sciences, Educational Sciences, and Economics Administrative Sciences. The data files are all in CSV file format, however, two different data files were prepared. One with original Turkish characters, and the other with its English equivalent formation targeting the Turkish characters "Ç, ç, Ö, ö, Ü, ü, Ş, ş, İ, ı, ğ". In its English-like equivalent file, these were replaced with "C, c, O, o, U, u, S, s, I, i, g" respectively.
Eser Adı (dc.title) | Turkish Character Usage in Text Classification |
Eser Sahibi (dc.contributor.author) | Ali Aycan KOLUKISA |
Yayın Tarihi (dc.date.issued) | 2021 |
Yayıncı (dc.publisher) | İzmir Katip Çelebi Üniversitesi |
Tür (dc.type) | Makale |
Özet (dc.description.abstract) | Abstract This study is prepared to examine the effects of Turkish character usage on text data by using multiple classifiers. Regression Classifiers, SVM, NB-Classifiers, and ANN are frequently used in supervised learning methods, especially in classification problems. Regression classifiers generally come in two types: as Linear and Logistic. There are also more than one type of Naive Bayes classifier. In our study, after mentioning the properties of Linear Regression and Logistic Regression classifiers in general terms, why Logistic Regression is much more suitable for this study is explained. Then, with the usage of "Logistic Regression", "LinearSVC", "MultinomialNB", "ComplementNB", "BernoulliNB" and "Perceptron" classifiers, the analyzing part starts. Our datasets consist of abstracts-parts from 64 Turkish articles, which have 4 different classes as Physical Sciences, Social Sciences, Educational Sciences, and Economics Administrative Sciences. The data files are all in CSV file format, however, two different data files were prepared. One with original Turkish characters, and the other with its English equivalent formation targeting the Turkish characters "Ç, ç, Ö, ö, Ü, ü, Ş, ş, İ, ı, ğ". In its English-like equivalent file, these were replaced with "C, c, O, o, U, u, S, s, I, i, g" respectively. |
Kayıt Giriş Tarihi (dc.date.accessioned) | 06.06.2022 |
Açık Erişim Tarihi (dc.date.available) | 2022-06-06 |
Yayın Dili (dc.language.iso) | eng |
Konu Başlıkları (dc.subject) | Accuracy rate |
Konu Başlıkları (dc.subject) | bag of words |
Konu Başlıkları (dc.subject) | English characters |
Konu Başlıkları (dc.subject) | logistic regression |
Konu Başlıkları (dc.subject) | Turkish characters |
Atıf için Künye (dc.identifier.citation) | A. A. Kolukısa , "Turkish Character Usage in Text Classification", Journal of Artificial Intelligence and Data Science, c. 1, sayı. 1, ss. 53-58, Ağu. 2021 |
ISSN (dc.identifier.issn) | 2791-8335 |
Yayının ilk sayfa sayısı (dc.identifier.startpage) | 53 |
Yayının son sayfa sayısı (dc.identifier.endpage) | 58 |
Dergi Adı (dc.relation.journal) | Journal of Artificial Intelligence and Data Science |
Dergi Sayısı (dc.identifier.issue) | 1 |
Dergi Cilt (dc.identifier.volume) | 1 |
Veritabanı (dc.source.database) | Hiçbiri |
Haklar (dc.rights) | Open access |
Tek Biçim Adres (dc.identifier.uri) | https://hdl.handle.net/11469/1929 |