Jacerong at tass 2016: An ensemble classi-er for sentiment analysis of Spanish tweets at global level
Resumen
This paper describes an ensemble-based approach developed to participate in TASS-2016 Task 1 on sentiment analysis of Spanish tweets at global level. Ensembles are built on the combination of systems with the lowest absolute correlation with each other. The systems are able to deal with non-standard lexical forms in tweets, in order to improve the quality of natural language analysis. To support the polarity classification, the approach uses basic features that have proved their discriminative power, as well as word and character n-gram features. Then, outputs from Logistic Regression classifiers, which may be either class labels or probabilities for each class, are used to build ensembles. Experimental results show that the less-correlated combination of 25 systems, which chooses the class with the highest unweighted average probability, is the setting that best suits to the task, achieving an overall accuracy of 62.0% in the six-labels evaluation, and of 70.5% in the fourlabels evaluation