Lerman, Gilder, Dredze, & Pereira (2008) used computational linguistics to predict the impact of news on public perceptions of political candidates in the 2004 US Presidential election. The system predicts shifts in public opinion by analyzing daily newspaper articles. Their research assumes that mass media affects world events, such as elections, by swaying the opinions of both the general public and decision makers.
The research of Lerman et al. applies the predictive capability of news analysis typically associated with financial performance to the political field of election results. Unlike opinion polls which are conducted and published sporadically and are often incomparable, the authors claim daily news analysis can predict how public perception of political candidates will change on a day-to-day basis. The work differs from other opinion analysis in that the system uses objective news, not the extracted opinions, to analyze news to predict future opinions, a cause and effect relationship.
Their computational system incorporates both external linguistic information (provided by the news coverage) and internal market indicators to forecast public opinion measured by prediction markets. The political prediction markets act like a stock market for elections with investors buying shares in an outcome they believe most likely to occur in exchange for a payout if correct. Internal market indicators include overall market mood, momentum, and history, citing the example that a positive news story regarding a candidate otherwise disliked will have less of an impact on public opinion (Lerman et al, 2008, p. 474). The system employed takes morning news articles, looks for particular features, and computes based on market history the price movement the news will cause, and compares the prediction with the actual day's end movement.
The system looks for certain features that will affect public opinion. Bag-of-words features are words that occur more than 20 times in an article, excluding common stop words. News focus features refer to particular topic that is reported multiple times and to what degree the amount of reporting on this single topic changes. Entity features look to connect a subject entity to the topic, such as a political candidate to a scandal. Dependency features takes entity features a step further and identifies both the subject and the object of particular topics, such as which candidate defeated the other in a particular debate. Dependency features proved to be the most influential in forecasting public opinions using news analysis for the 2004 US Presidential election.
The research presented by Lerman et al. succeeds in identifying certain important aspects of news analysis. First, the authors note their system best tracks negative news impact (Lerman et al., 2008, p. 479). This is not surprising given the media's propensity to publish negative news stories which attract readership. Additionally, the work disproves the notion that the quantity of mentions a candidate has is the sole factor in forecasting election results. The authors note that while Bush had more mentions than Kerry and did win, Kerry had the least amount of mentions compared to fellow DNC contenders, yet he won the nomination (Lerman et al., 2008, p. 478-79).
Despite these positive contributions, the research fails in a few areas. For instance, the authors do not identify or at all address the types of news sources they compiled their data from, beyond stating daily, early-morning publications in various markets. It would be interesting to see if these are local, regional, or national papers, or papers with known biases, and how the authors addressed this, if at all. Additionally, the research only looked at morning articles, specifically print articles, which leaves out the vast amount of news likely to affect public opinion. For instance, while the reasoning to focus on morning articles was to make a prediction for that day, the researchers fail to address how news from the previous day, after or during market hours, affected the next day, particularly for those, like myself, who read the news in the evening, not the morning. Additionally, a vast amount of news does not come from print sources, and now more than ever social media is being analyzed to make similar predictions, though the authors could not have predicted this in 2004, at the time of publication (2008), social media was an ever-important election resource, particularly for President Obama.
Lerman, K., Gilder, A., Dredze, M., & Pereira, F. (2008). Reading the Markets: Forecasting Public Opinion of Political Candidates by News Analysis. Proceedings of the 22nd International Conference on Computational Linguistics. Manchester, UK: Coling. Retrieved from http://delivery.acm.org/10.1145/1600000/1599141/p473-lerman.pdf?ip=126.96.36.199&acc=OPEN&CFID=302768080&CFTOKEN=33442324&__acm__=1364267932_9637607646da1dbf0b73a76e46a9b775