Turkish Character Usage in Text Classification

Ali Aycan KOLUKISA

Turkish Character Usage in Text Classification

Eser Sahibi Ali Aycan KOLUKISA
Tür Makale
Yayın Tarihi 2021
Yayıncı İzmir Katip Çelebi Üniversitesi
Dergi Adı Journal of Artificial Intelligence and Data Science 1, ( 1 ), pp.53 - 58
Tek Biçim Adres https://hdl.handle.net/11469/1929
Konu Başlıkları Accuracy rate
bag of words
English characters
logistic regression
Turkish characters

Abstract This study is prepared to examine the effects of Turkish character usage on text data by using multiple classifiers. Regression Classifiers, SVM, NB-Classifiers, and ANN are frequently used in supervised learning methods, especially in classification problems. Regression classifiers generally come in two types: as Linear and Logistic. There are also more than one type of Naive Bayes classifier. In our study, after mentioning the properties of Linear Regression and Logistic Regression classifiers in general terms, why Logistic Regression is much more suitable for this study is explained. Then, with the usage of "Logistic Regression", "LinearSVC", "MultinomialNB", "ComplementNB", "BernoulliNB" and "Perceptron" classifiers, the analyzing part starts. Our datasets consist of abstracts-parts from 64 Turkish articles, which have 4 different classes as Physical Sciences, Social Sciences, Educational Sciences, and Economics Administrative Sciences. The data files are all in CSV file format, however, two different data files were prepared. One with original Turkish characters, and the other with its English equivalent formation targeting the Turkish characters "Ç, ç, Ö, ö, Ü, ü, Ş, ş, İ, ı, ğ". In its English-like equivalent file, these were replaced with "C, c, O, o, U, u, S, s, I, i, g" respectively.

Koleksiyonlar REKTÖRLÜĞE BAĞLI BİRİMLER
YAPAY ZEKA VE VERİ BİLİMİ UYGULAMA VE ARAŞTIRMA MERKEZİ

Erişime Açık

Görüntülenme

25

06.06.2022 tarihinden bu yana

İndirme

1

06.06.2022 tarihinden bu yana

Son Erişim Tarihi

17 Nisan 2024 17:05

Google Kontrol

Tıklayınız

Tam Metin İndirmek için tıklayın Ön izleme

Eser Adı (dc.title)	Turkish Character Usage in Text Classification
Eser Sahibi (dc.contributor.author)	Ali Aycan KOLUKISA
Yayın Tarihi (dc.date.issued)	2021
Yayıncı (dc.publisher)	İzmir Katip Çelebi Üniversitesi
Tür (dc.type)	Makale
Özet (dc.description.abstract)	Abstract This study is prepared to examine the effects of Turkish character usage on text data by using multiple classifiers. Regression Classifiers, SVM, NB-Classifiers, and ANN are frequently used in supervised learning methods, especially in classification problems. Regression classifiers generally come in two types: as Linear and Logistic. There are also more than one type of Naive Bayes classifier. In our study, after mentioning the properties of Linear Regression and Logistic Regression classifiers in general terms, why Logistic Regression is much more suitable for this study is explained. Then, with the usage of "Logistic Regression", "LinearSVC", "MultinomialNB", "ComplementNB", "BernoulliNB" and "Perceptron" classifiers, the analyzing part starts. Our datasets consist of abstracts-parts from 64 Turkish articles, which have 4 different classes as Physical Sciences, Social Sciences, Educational Sciences, and Economics Administrative Sciences. The data files are all in CSV file format, however, two different data files were prepared. One with original Turkish characters, and the other with its English equivalent formation targeting the Turkish characters "Ç, ç, Ö, ö, Ü, ü, Ş, ş, İ, ı, ğ". In its English-like equivalent file, these were replaced with "C, c, O, o, U, u, S, s, I, i, g" respectively.
Kayıt Giriş Tarihi (dc.date.accessioned)	06.06.2022
Açık Erişim Tarihi (dc.date.available)	2022-06-06
Yayın Dili (dc.language.iso)	eng
Konu Başlıkları (dc.subject)	Accuracy rate
Konu Başlıkları (dc.subject)	bag of words
Konu Başlıkları (dc.subject)	English characters
Konu Başlıkları (dc.subject)	logistic regression
Konu Başlıkları (dc.subject)	Turkish characters
Atıf için Künye (dc.identifier.citation)	A. A. Kolukısa , "Turkish Character Usage in Text Classification", Journal of Artificial Intelligence and Data Science, c. 1, sayı. 1, ss. 53-58, Ağu. 2021
ISSN (dc.identifier.issn)	2791-8335
Yayının ilk sayfa sayısı (dc.identifier.startpage)	53
Yayının son sayfa sayısı (dc.identifier.endpage)	58
Dergi Adı (dc.relation.journal)	Journal of Artificial Intelligence and Data Science
Dergi Sayısı (dc.identifier.issue)	1
Dergi Cilt (dc.identifier.volume)	1
Veritabanı (dc.source.database)	Hiçbiri
Haklar (dc.rights)	Open access
Tek Biçim Adres (dc.identifier.uri)	https://hdl.handle.net/11469/1929

Yayın Görüntülenme

Erişilen ülkeler

Erişilen şehirler

Bu site altında yer alan tüm kaynaklar Creative Commons Alıntı-GayriTicari-Türetilemez 4.0 Uluslararası Lisansı ile lisanslanmıştır.