Monday, March 25, 2013

Reading the Markets: Forecasting Public Opinion of Political Candidates by News Analysis

Lerman, Gilder, Dredze, & Pereira (2008) used computational linguistics to predict the impact of news on public perceptions of political candidates in the 2004 US Presidential election. The system predicts shifts in public opinion by analyzing daily newspaper articles. Their research assumes that mass media affects world events, such as elections, by swaying the opinions of both the general public and decision makers.

The research of Lerman et al. applies the predictive capability of news analysis typically associated with financial performance to the political field of election results. Unlike opinion polls which are conducted and published sporadically and are often incomparable, the authors claim daily news analysis can predict how public perception of political candidates will change on a day-to-day basis. The work differs from other opinion analysis in that the system uses objective news, not the extracted opinions, to analyze news to predict future opinions, a cause and effect relationship.

Their computational system incorporates both external linguistic information (provided by the news coverage) and internal market indicators to forecast public opinion measured by prediction markets. The political prediction markets act like a stock market for elections with investors buying shares in an outcome they believe most likely to occur in exchange for a payout if correct. Internal market indicators include overall market mood, momentum, and history, citing the example that a positive news story regarding a candidate otherwise disliked will have less of an impact on public opinion (Lerman et al, 2008, p. 474). The system employed takes morning news articles, looks for particular features, and computes based on market history the price movement the news will cause, and compares the prediction with the actual day's end movement.

The system looks for certain features that will affect public opinion. Bag-of-words features are words that occur more than 20 times in an article, excluding common stop words. News focus features refer to particular topic that is reported multiple times and to what degree the amount of reporting on this single topic changes. Entity features look to connect a subject entity to the topic, such as a political candidate to a scandal. Dependency features takes entity features a step further and identifies both the subject and the object of particular topics, such as which candidate defeated the other in a particular debate. Dependency features proved to be the most influential in forecasting public opinions using news analysis for the 2004 US Presidential election.

The research presented by Lerman et al. succeeds in identifying certain important aspects of news analysis. First, the authors note their system best tracks negative news impact (Lerman et al., 2008, p. 479). This is not surprising given the media's propensity to publish negative news stories which attract readership. Additionally, the work disproves the notion that the quantity of mentions a candidate has is the sole factor in forecasting election results. The authors note that while Bush had more mentions than Kerry and did win, Kerry had the least amount of mentions compared to fellow DNC contenders, yet he won the nomination (Lerman et al., 2008, p. 478-79).

Despite these positive contributions, the research fails in a few areas. For instance, the authors do not identify or at all address the types of news sources they compiled their data from, beyond stating daily, early-morning publications in various markets. It would be interesting to see if these are local, regional, or national papers, or papers with known biases, and how the authors addressed this, if at all. Additionally, the research only looked at morning articles, specifically print articles, which leaves out the vast amount of news likely to affect public opinion. For instance, while the reasoning to focus on morning articles was to make a prediction for that day, the researchers fail to address how news from the previous day, after or during market hours, affected the next day, particularly for those, like myself, who read the news in the evening, not the morning. Additionally, a vast amount of news does not come from print sources, and now more than ever social media is being analyzed to make similar predictions, though the authors could not have predicted this in 2004, at the time of publication (2008), social media was an ever-important election resource, particularly for President Obama.

Lerman, K., Gilder, A., Dredze, M., & Pereira, F. (2008). Reading the Markets: Forecasting Public Opinion of Political Candidates by News Analysis. Proceedings of the 22nd International Conference on Computational Linguistics. Manchester, UK: Coling. Retrieved from


  1. Did the article state what linguistic/sentiment dictionary was used in the bag-of-words approach? The overall tone and meaning of a text can be interpreted differently based on the dictionary used. Also, it would be interesting, in a future study, if the use of Bayesian filter was applied to the text would yield different results.

  2. Based on reading your review of the article I am interested in the publications that the author used in his sample and the credibility and reliability of the sources that were chosen for the sample. Additionally, I agree that limiting the sample to morning news sources to keep out of the sample limits the amount of news sources that are most likely to affect public opinion, those sources that would be more essential to have in the study. I understand why the author choose morning news sources for their sample, but my overall feeling on the sample chosen makes me feel that the sample is extremely bias in nature. Overall, I thought this was a very intriguing study, but the author would have benefited from including different sources of news articles that would increase both the reliability of results and overall representativeness of the sample.

  3. I think the conclusions from this article that quantity of mentions does not dictate the winner of an election is an interesting find. I do however agree with you that failing to identify which types of news sources they used is a serious flaw in the design, especially after learning the importance of being aware of the sources we draw our analyses from. Although sentiment analysis in news can be an interesting methodology, I don't believe that it has yielded any truly substantial results.

  4. The biggest issue I see in this article is how the author obtained objective news. Every news publishers are influenced by politics. For example, FOX news is conservative compared to CNN which is liberal. These publishers and news channels covers topics related to political elections based on their political affiliations. I would imagine most news articles related to politics are prone to biases.

  5. I agree with your critique of the article. While it was interesting to look at objective news, I agree with Nuwanti that the it is difficult to determine objective. Additionally, the use of only news released in the morning does not account for individuals who may not be able to look at the news until later in the day, or account for events that happened after the end of the traditional day. Though, this choice to limit the news may have been done to make the research manageable.

  6. Dave- The authors did not get into the linguistic determinants as far as bag-of-words, except to state that they did not include occurrences fewer than 20, and common stop words. I agree, it would improve upon the quality of research and replicability to have explained their word choice.

    Nuwanti- you bring up a very good point regarding the subjective nature of news sources. I would hope that the authors attempted to mitigate this in some way.

    Olivia- I certainly agree it is likely the limitation was to make the research manageable.