지난 시간에 짧게 카카오브레인에서 발표한 pororo NER을 살펴보았다.
개발자 커뮤니티 SQLER.com - kakobrain에서 발표한 pororo 리뷰
이번 시간에는 잠시 text classification - AES(Automated Essay Scorer, 이하 AES)를 리뷰.
kakaobrain pororo - Automated Essay Scorer 리뷰
Pororo 공식 가이드 문서
기술 이야기 전에, AES는 사람의 수필(Essay)에 대해 점수를 주는 모델이다. - 처음 접하면서 복잡한 생각이 들면 정상이다.
오랜 역사를 가지는 모델이며, 이미 우리의 생각처럼 다양한 criticism 이슈가 엮여 있다. - Automated essay scoring - Wikipedia
이런 부분은 우선 뒤로하고, 기술적인 부분에 초점을 맞춰 진행한다.
코드 수행
# text classification from pororo import Pororo # 현재 en만 task로 제공, Bert모델을 로드해 사용하는 방안도 있는 듯. # https://kakaobrain.github.io/pororo/_modules/pororo/tasks/automated_essay_scoring.html#PororoBertAes aes = Pororo(task="aes", lang="en") # dataset은 kaggle의 test TSV dataset에서 일부 사용 essay = "I believe that computers have a positive effect on people. They help you stay in touch with family in a couple different ways they excercise your mind and hands and help you learn and make things easier. Computer's help you keep in touch with people. Say you live in @LOCATION1 and you miss your @CAPS1. You can just send an e-mail and talk all you want. If you don't just want to limit it to words you can add pictures so they can see how much you've grown or if you are well. Even if you're just e-mailing someone down the block it is just as effective as getting up and walking over there. You can also use a computer to make a scrap book card or slide show to show how much you love the person you give them to. Computers @MONTH1 not excercise you whole body but it excersises you mind and hands. You could play solitaire on the computer and come away @PERCENT1 smarter than before. You can play other games of strategy like checkers and chess while still sitting at home being comfortable. Your hands always play a big role while you're on the computer. They need to move the mouse and press the keys on a keyboard. Your hands learn all the keys from memorization. It's like the computer's teaching handi-coordination and studying habit for the future. Computers make human lives easier. Not only do they help kids turn in a nice neatly printed piece or paper for home work but they also help the average person. Teachers use it to keep peoples grades in order and others use it to write reports for various jobs. The @CAPS2 probably uses one to write a speech or to just keep his day in order. Computers make it easier to learn certain topics like the @LOCATION2 history. You can type something into a searcher site and have ton's of websites for one person with, who knows how much imformation. Instead of flipping through all the pages in a dictionary you can look for an online dictionary, type in the word and you have the definition. Computers have positive effects on people because they help you keep close to your family, they challenge your mind to be greater and excercise your hands and they make life easier for kids and the average person. This is why, I think computers have good effects on society." aes(essay) 결과: 63.65
과연 어떻게 이러한 데이터에 대해서 scoring이 가능할까?
Essay quality dimensions
wikipedia에도 나오는 것처럼,
- Grammaticality: following grammar rules
- Usage: using of prepositions, word usage
- Mechanics: following rules for spelling, punctuation, capitalization
- Style: word choice, sentence structure variety
- Relevance: how relevant of the content to the prompt
- Organization: how well the essay is structured
- Development: development of ideas with examples
- Cohesion: appropriate use of transition phrases
- Coherence: appropriate transitions between ideas
- Thesis Clarity: clarity of the thesis
- Persuasiveness: convincingness of the major argument
의 내용과 여러 레시피가 포함되어 scoring에 적용된다. 추가적으로, dimensions of essay quality에 대한 논문 - Automated Essay Scoring: A Survey of the State of the Art 문서에서 더 자세한 내용을 확인 가능.
kaggle의 AES 챌린지
The Hewlett Foundation: Automated Essay Scoring | Kaggle
Pororo 공식문서에서 기술한 것처럼 이 AES 챌린지 데이터셋을 이용한다. Pororo의 metric score가 80.25면 거의 탑수준. 어마어마하다.
kaggle에서 여러 참조 노트북을 확인할 수 있었다. 리뷰하면 좋을 노트북을 링크.
Automated Essay Scorer | Kaggle
위의 여러 dimension에 대한 부분과 EDA 과정에서 여러 좋은 인사이트를 얻을 수 있었다.