Tuesday, March 20, 2012

Probabilistic Part-of-Speech Tagging Using Decision Trees


In Probabilistic Part-of-Speech Tagging Using Decision Trees, author Helmut Schmid describes a new method for tagging words and their appropriate parts-of-speech in a tex by using a decision tree. In languages such as English, where a word can have several different meanings depending on the context (try “store”), highly accurate tagging can improve the ability of programming like spell-check to properly recognize and correct for errors. Earlier methods for tagging often degraded as the size of the text decreased, creating problems in accuracy. Schmid’s new method using a decision tree was intended to greatly increase the accuracy of probabilistic tagging regardless of the size of the text.


In an effort to improve on the accuracy of probabilistic part-of-speech tagging, Helmut Schmid developed a method using a binary decision tree to develop paths that accurately predict the probability of a word by using the context around it. Each estimation of a word’s part-of-speech begins with a root node. If looking for a specific type of word, such as a noun preceded by an adjective, the root node asks if the preceding word is an adjective or not. The decision tree follows a path until it comes to an ending node (leaf) that determines the appropriate probabilities of what the word is.

In the design of the decision tree, each root node asks a simple question of the context surrounding a word. In the example above, this is whether the preceding word is an adjective or not. If the answer is “yes”, then the program follows to the next node. This node refines the search by asking more detail of preceding and following words to improve context. This path
continues until it comes to a node that leads only to leafs, in which case the word is then weighted and probabilities are given for the word’s part of speech.

The decision tree also incorporated probabilities and parts-of-speech from “fullform” and “suffix” lexicons. The “fullform” lexicon was built on a large tagged text to present probabilities in other texts. The lexicon allows for searches of words and their potential parts-of-speech. As interesting is the “suffix” lexicon, which is built on a decision tree. Starting at a root node (labeled “#”), the tree follows a suffix backward from the last letter to the first of the suffix. For example, when searching for the word “tagging” in the suffix lexicon, the tree starts at the root “#” then proceeds to the “g” node, then “n” node and finally the “i” node. Once there, the lexicon gives the probability of the word and its part-of-speech.

In Schmid’s testing, the TreeTagger was able to tag 10,000 words a second and over 2 million words in 6 minutes. After testing, TreeTagger turned out to be more accurate than other methods for tagging a large training text (corpus). In addition, the decision tree method allowed TreeTagger to tag smaller texts significantly more accurately. While other methods saw significant degradation in accuracy as the number of words dropped below 10,000, TreeTagger’s use of decision trees allowed it to maintain a respectable 84 percent accuracy.


The TreeTagger method uses a decision tree and probabilities for accurate part-of-speech tagging in texts. The decision trees allow the method to work efficiently and accurately with texts regardless of the size while other methods saw significant reductions in accuracy as texts decreased in size and offered fewer examples for them to draw on. TreeTagger instead uses prebuilt lexicons and decision trees to track context and words.

This is an interesting way to employ decision trees. While it steers away from the intelligence field in application, its use of probabilities provides a different perspective on decision trees and probability. The application could be changed by switching out words for conditions or facts and linguistic contexts for cultural, social, and political contexts. While this is easier said than done, it may be an avenue of research worth pursuing in the intelligence field, especially since the end goal of both is to produce an accurate probability or estimate.


Schmid, H. (1994). Probabilistic part-of-speech tagging using decision trees. Proceedings of International Conference on New Methods in Language Processing. 12(4) pp 44-49
Retrieved from: http://citeseerx.ist.psu.edu/viewdoc/summary?doi=


  1. In addition to using this for spell-check and grammar-check, it would be interesting to find out if this application could be used for translating text from one language to another. The article doesn't state it but it seems like there is an element of Bayesian Filtering in this application.

  2. The TreeTagger software that was developed after this article was written does not have the ability to directly translate because different decision trees have to be developed for different languages. However, the method can apply to several different languages (English, German, and French in particular)and can help to confirm proper translations.

  3. This is a very interesting application of decision tree analysis, even more with the concept that the TreeTagger software can be applied to confirm accurate translations. Did the article go into much detail about switching out words for facts or conditions, for I am interested in the capability of the software.

  4. The TreeTagger software is built exclusively on prior methods for Part-of-Speech tagging in specific languages. The decision trees are already fully developed for the software and I do not think that you could change out words and parts-of-speech with facts or conditions easily or quickly.

  5. I have read your blog its very attractive and impressive. I like your blog machine learning online course