Tuesday, March 26, 2013

Sentiment Analysis in the News

Summary:
For this article, the authors use opinion mining on 1592 quotes from English language newspapers whereby both the target and the source were known.  The authors acknowledge that the author, reader, and the text all could have potentially different interpretations of content.  Perhaps one of the most important points of this article is that the authors state that by putting these news feed results into categories (ex: 'disaster', 'flood', and 'accident' can all be put into one category) they may miss some things through misinterpretation.  However, especially with news analysis, having lists that are easily translated to apply to many different languages is a useful way to save time and to sometimes preserve the meaning.  They state that their technique of analysis could be used for tests whereby quotes are not used.  A key assumptions is that the text in quotes are more subject than the entirety of the text.

The authors conducted this experiment by taking the 1592 quotes and limiting them to 1292- the amount the authors agreed upon for sentiment.  Of their results, the sentiment analysis system identified the target sentiment in 1114 of these quotes.  Additionally, these quotes were broken into four categories by which opinion was shown: positive, negative, high positive, and high negative.  Some of the issues the authors found were that the software did not detect sarcasm and had a lot of error with regard to false neutral results where no sentiment words were present.  One of the solutions suggested was to increase the amount of text examined, such that sarcasm could be negated and sentiment words would be present.  The authors did not include foreign news media, but suggested it for future research.

Critique:

One of my major issues with this article is that terminology was either poorly defined or not defined at all.  The authors did not define what EMM was, and with a general search, I found results ranging anywhere from Enterprise Mobility Management to Eastern Mennonite Missions, neither of which  I believe the authors wished to analyze.  My best guess is that this article was on the Europe Media Monitor.  Additionally, the techniques/resources were only mentioned briefly and not defined.  WordNet Affect and SentiWordNet were two of the resources mentioned but never explained why or what they are used for.  I feel that these two particular issues would have made understanding this article much easier. 

Another major issue I had with this article is that the authors disregard background knowledge and interpretation of what quotes were said in order to simplify the process.  I argue that this would be nearly impossible to do.  I cannot read a quote and disregard any knowledge I may have on the issue, nor can I stop myself from interpreting a quote.  These mental processes operate automatically and are difficult, if not impossible to stop.   

Lastly, I feel that this article does not apply specifically to the intelligence field, however it does explain a process that could be used for intelligence analysis.  Examining articles for sentiment is a useful procedure when examining areas of interest, such as how Iranian leaders perceive certain issues through quotes from their newspapers.  I agree that using multiple resources to check for sentiment helps to double check the resources to make sure the sentiment is consistently detected correctly.  


Source:
Balahur, A., Steinberger, R., Kabadjob, M., Zavarella, V., van der Goot, E., Halkia, M., Pouliquen, B., & Belyaeva, J. (2009). Opinion Mining on Newspaper Quotations. Proceedings of the workshop
'Intelligent Analysis and Processing of Web News Content' (IAPWNC), held at the 2009
IEEE/WIC/ACM International Conferences on Web Intelligence and Intelligent Agent
Technology. Milano, Italy.  Retrieved from: http://lexitron.nectec.or.th/public/LREC-2010_Malta/pdf/909_Paper.pdf


4 comments:

  1. This comment has been removed by the author.

    ReplyDelete
  2. I also found with this technique a lack of definitions was an issue. For example, the article I found incorporated WordNet into the study but did not provide information as to what it is. Furthermore, I think in this case as well as with the technique overall the issue of misinterpretation can skew the results. As mentioned, sarcasm or the use of colorful language can throw off the intended meaning of a word changing its sentiment. A study that accounts for this issue in some way would prove to be very useful and provide more accurate findings.

    ReplyDelete
  3. I agree completely with your critique. While the study initially looks interesting, there are multiple issues with it. The biggest issue I see, which you touch on, is the simplicity of the methodology. It does not take outside factors (such as background information) into consideration. While it is an interesting approach, the simplicity of the method makes the results suspect.

    ReplyDelete
  4. I also agree with your critique and I found similar issues with the article I read. Because the definitions were not defined and the techniques were not fully explained, such confusion hindered the article's effectiveness. And although hard to avoid, I would say categorizing words to such an extent can lead to misinterpretation that almost makes the entire method meaningless. The process of deciding category words should be one of the most important steps in the process. But this is also dependent on what the end goal of the analysis is as well.

    ReplyDelete