PREDIKSI PROFESI BERDASARKAN MODEL BAHASA PADA TWEETS

Hapnes Toba, William Stefanus

Abstract


With the advance of social media, people tends to be  very reactive on issues which are happening around the  globe. Everybody can show their opinions freely, and  sometimes uncontrollable, no matters what their job is.  This research investigates the tendency of words choice  in someone’s job based on the style of language he/she  used in his/her twitter account. It is assume that most of  the people in a specific job has the same language used  on social media. The analyses of the study is performed  by using Naïve Bayes classifiers for around 30,000  tweets. The text processing are divided into three main  parts, i.e.: retrieval and grouping of the data, data  processing, and evaluation. The type of jobs which are  analyzed, consists of: politicians, actresses/actors,  musicians, and students, through their official twitter  accounts. The experimental results show that  multinomial Bayes classifiers are more reliable than the  binomial classifiers. Further investigation shows that the  best accuracy is achieved by the unigram model, which  has a mean of 0.73±0.127 in a 5 cross validation setting.  This fact reveals that there is no direct relatioship  between someone’s word choice and his/her profession.  


References


Bui, A. A., & Taira, R. K. (2010). Medical Imaging Informatics. London: Springer Science-Business Media, LLC.

Mitchell, T. (2015). Generative and Discrimintative Classifiers: Naïve Bayes and Logistic Regression. Available online: https://www.cs.cmu.edu/~tom/mlbook/Nbayes LogReg.pdf. Access: November 2015.

Dai, W., Xue, G. R., Yang, Q., & Yu, Y. (2007, July). Transferring naive bayes classifiers for text classification. In Proceedings of the National Conference on Artificial Intelligence (Vol. 22, No. 1, p. 540). Menlo Park, CA; Cambridge, MA; London; AAAI Press; MIT Press.

Reilly, T. O., & Milstein, S. (2011). The Twitter Book. Sebastopol: O'Reilly Media, Inc.

Pak, A., & Paroubek, P. (2010, May). Twitter as a Corpus for Sentiment Analysis and Opinion Mining. In LREC (Vol. 10, pp. 1320-1326).

Go, A., Huang, L., & Bhayani, R. (2009). Twitter sentiment analysis. Entropy, 17.

Pennacchiotti, M., & Popescu, A. M. (2011). A Machine Learning Approach to Twitter User Classification. ICWSM, 11, 281-288.


Refbacks

  • There are currently no refbacks.