PREDIKSI PROFESI BERDASARKAN MODEL BAHASA PADA TWEETS

Hapnes Toba; William Stefanus

PREDIKSI PROFESI BERDASARKAN MODEL BAHASA PADA TWEETS

Hapnes Toba, William Stefanus

Abstract

With the advance of social media, people tends to be very reactive on issues which are happening around the globe. Everybody can show their opinions freely, and sometimes uncontrollable, no matters what their job is. This research investigates the tendency of words choice in someone’s job based on the style of language he/she used in his/her twitter account. It is assume that most of the people in a specific job has the same language used on social media. The analyses of the study is performed by using Naïve Bayes classifiers for around 30,000 tweets. The text processing are divided into three main parts, i.e.: retrieval and grouping of the data, data processing, and evaluation. The type of jobs which are analyzed, consists of: politicians, actresses/actors, musicians, and students, through their official twitter accounts. The experimental results show that multinomial Bayes classifiers are more reliable than the binomial classifiers. Further investigation shows that the best accuracy is achieved by the unigram model, which has a mean of 0.73±0.127 in a 5 cross validation setting. This fact reveals that there is no direct relatioship between someone’s word choice and his/her profession.

References

Bui, A. A., & Taira, R. K. (2010). Medical Imaging Informatics. London: Springer Science-Business Media, LLC.

Mitchell, T. (2015). Generative and Discrimintative Classifiers: Naïve Bayes and Logistic Regression. Available online: https://www.cs.cmu.edu/~tom/mlbook/Nbayes LogReg.pdf. Access: November 2015.

Dai, W., Xue, G. R., Yang, Q., & Yu, Y. (2007, July). Transferring naive bayes classifiers for text classification. In Proceedings of the National Conference on Artificial Intelligence (Vol. 22, No. 1, p. 540). Menlo Park, CA; Cambridge, MA; London; AAAI Press; MIT Press.

Reilly, T. O., & Milstein, S. (2011). The Twitter Book. Sebastopol: O'Reilly Media, Inc.

Pak, A., & Paroubek, P. (2010, May). Twitter as a Corpus for Sentiment Analysis and Opinion Mining. In LREC (Vol. 10, pp. 1320-1326).

Go, A., Huang, L., & Bhayani, R. (2009). Twitter sentiment analysis. Entropy, 17.

Pennacchiotti, M., & Popescu, A. M. (2011). A Machine Learning Approach to Twitter User Classification. ICWSM, 11, 281-288.

Refbacks

There are currently no refbacks.

Username
Password
Remember me

Register

OPEN JOURNAL SYSTEM

" SEMNASTEKNOMEDIA ONLINE "