Publication:
BERT-TRIPLE: Using BERT for Extracting Triples from Vietnamese Sentences

No Thumbnail Available
Date
2022
Authors
Truong H. V. Phan, Phuc Do
Journal Title
Journal ISSN
Volume Title
Publisher
Research Projects
Organizational Units
Journal Issue
Abstract
A triple consists of a head entity, relation, and tail entity. Triple is a key component of the knowledge graph. Extracting the triples from a sentence in the Vietnamese corpus is an interesting problem in building knowledge graphs and generating questions for the question answering systems. The previous works mostly have been done by using rules. Little research has applied machine learning models to extract triples from a given sentence. In this paper, we proposed a model named BERT-TRIPLE that is based on the BERT model to extract triples from the Vietnamese corpus. In particular, we performed two phases that were annotation and training. In the annotation phase, our method annotates word tokens in sentences. In the training phase, we fine-tuned the BERT model to train the labeled dataset for predicting triples in Vietnamese sentences. After collecting the triples of Vietnamese text, we generate questions for Question Answering systems. We experiment to prove the precision of our proposed method on the Vietnamese Wikipedia corpus. Our model improved F1 score better than VnCoreNLP and underthesea models with 12% and 17% respectively. Also, our model was more efficient than the Text-to-Text Transfer Transformer (T5) model in the question generation task.
Description
Keywords
"BERT, triple extraction, named entity, semantic dependency graph, Question Answering systems"
Citation