Summary and Critique by: Jared Leets
Sekine, Grishman, and Shinnou state that the purpose of the study is to present a learning system that will find and classify name entities in Japanese newspapers. The task was to find named entities and character types such as the name of an organization, date, and location to name a few. The algorithm for the system has several processes, the first step creates the decision tree from the training data while the other is used for making the tagged output from the decision tree. They use a Japanese morphological analyzer (JUMAN) and a program package for the decision tree. The three feature sets in the decision tree include: the part of speech tagged by JUMAN, character type of information, and special dictionaries which were based on JUMAN dictionary entries, entry lists on the internet, or based on human information.
Training sentences are broken up and parts-of-speech are tagged by JUMAN. Then a token is analyzed by a character type and eventually matched against entries in the special dictionaries. A single token has the ability to be matched in entries in many dictionaries. A decision tree can be built from the training data. It can learn from the opening and closing of named entities on the three kinds of information, those being the parts of speech, character type, and special dictionary. To find probabilities of the opening and closing of a named entity for every single token, the properties of all the tokens are analyzed against the decision tree. Once the tokens are assigned in a sentence, the next step is to discover the most consistent likely path through the sentence. The article concludes by stating how they used a decision-tree system to discover and classify names in Japanese texts.
The study using decision trees proved to be successful. Their system did not require additional methods and multiple possibilities could be resolved by the probabilistic method. Decision trees seem to be more domain independent compared to the dictionaries in the study. Since their study was smaller compared to other similar studies done with decision tree methods, it had less errors and seemed to be much more positive with the end results. The study was a good example of using the decision tree methodology and helped show how algorithms with the methodology can classify information and show the results. Other studies using decision tree methodology will likely continue to display trends and regularities.
Source:Sekine, S., Grishman, R., & Shinnou, H. (1998, August). A decision tree method for finding and classifying names in Japanese texts. Proceedings of the Sixth Workshop on Very Large Corpora.http://www.cs.nyu.edu/~sekine/papers/wvlc98.pdf