Friday, September 15, 2017

A Decision Tree Method for Finding and Classifying Names in Japanese Texts

Summary and Critique by: Jared Leets

Summary:
Sekine, Grishman, and Shinnou state that the purpose of the study is to present a learning system that will find and classify name entities in Japanese newspapers. The task was to find named entities and character types such as the name of an organization, date, and location to name a few. The algorithm for the system has several processes, the first step creates the decision tree from the training data while the other is used for making the tagged output from the decision tree. They use a Japanese morphological analyzer (JUMAN) and a program package for the decision tree. The three feature sets in the decision tree include: the part of speech tagged by JUMAN, character type of information, and special dictionaries which were based on JUMAN dictionary entries, entry lists on the internet, or based on human information.

Training sentences are broken up and parts-of-speech are tagged by JUMAN. Then a token is analyzed by a character type and eventually matched against entries in the special dictionaries. A single token has the ability to be matched in entries in many dictionaries. A decision tree can be built from the training data. It can learn from the opening and closing of named entities on the three kinds of information, those being the parts of speech, character type, and special dictionary. To find probabilities of the opening and closing of a named entity for every single token, the properties of all the tokens are analyzed against the decision tree. Once the tokens are assigned in a sentence, the next step is to discover the most consistent likely path through the sentence. The article concludes by stating how they used a decision-tree system to discover and classify names in Japanese texts.

Critique:
The study using decision trees proved to be successful. Their system did not require additional methods and multiple possibilities could be resolved by the probabilistic method. Decision trees seem to be more domain independent compared to the dictionaries in the study. Since their study was smaller compared to other similar studies done with decision tree methods, it had less errors and seemed to be much more positive with the end results. The study was a good example of using the decision tree methodology and helped show how algorithms with the methodology can classify information and show the results. Other studies using decision tree methodology will likely continue to display trends and regularities.

Source:
Sekine, S., Grishman, R., & Shinnou, H. (1998, August). A decision tree method for finding and classifying names in Japanese texts. Proceedings of the Sixth Workshop on Very Large Corpora.http://www.cs.nyu.edu/~sekine/papers/wvlc98.pdf

5 comments:

  1. Though they prove their decision tree successful, I feel they became too computer reliant on their algorithms rather than creating a robust decision tree. Despite displaying multiple possibilities and using the decision tree system to discover and classify names in Japanese texts, there was a lot of reliance on calculated probabilities and the individual who calculated those probabilities to get the desired results. In addition, the authors also acknowledge that their training data was small but received the desired outcomes. I wonder if they added more training data, they would still receive the desired results.

    ReplyDelete
  2. It appears that they adapted machine learning in they study by allowing the tree to learn various patterns based speech patterns, character types, special dictionary etc. I wonder if the intent is to make automate the process with minimal to no human interaction down the line.

    ReplyDelete
  3. It's a great learning system they are building using decision tree and web scrapping(asumming they are looking at online newspaper) to create a system to find named entities and character types such as the name of an organization, date, and location. It's good that they had JUMAN, this kind of complex classification of language is difficult for decision tree to handle in absence of training set like JUMAN.

    ReplyDelete
  4. I agree with Oddi it looks a lot like machine learning, and it is very interesting to see that in action. It would be interesting to see this type of decision tree adjusted to fit resumes in HR departments, because the current systems that scrape resumes can be extremely ineffective.

    ReplyDelete
  5. This comment has been removed by the author.

    ReplyDelete