Publication:
BERT-TRIPLE: Using BERT for Extracting Triples from Vietnamese Sentences
BERT-TRIPLE: Using BERT for Extracting Triples from Vietnamese Sentences
No Thumbnail Available
Files
Date
2022
Authors
Truong H. V. Phan, Phuc Do
Journal Title
Journal ISSN
Volume Title
Publisher
Research Projects
Organizational Units
Journal Issue
Abstract
A triple consists of a head entity, relation, and tail
entity. Triple is a key component of the knowledge graph.
Extracting the triples from a sentence in the Vietnamese corpus
is an interesting problem in building knowledge graphs and
generating questions for the question answering systems. The
previous works mostly have been done by using rules. Little
research has applied machine learning models to extract triples
from a given sentence. In this paper, we proposed a model
named BERT-TRIPLE that is based on the BERT model to
extract triples from the Vietnamese corpus. In particular, we
performed two phases that were annotation and training. In the
annotation phase, our method annotates word tokens in
sentences. In the training phase, we fine-tuned the BERT model
to train the labeled dataset for predicting triples in Vietnamese
sentences. After collecting the triples of Vietnamese text, we
generate questions for Question Answering systems. We
experiment to prove the precision of our proposed method on
the Vietnamese Wikipedia corpus. Our model improved F1
score better than VnCoreNLP and underthesea models with
12% and 17% respectively. Also, our model was more efficient
than the Text-to-Text Transfer Transformer (T5) model in the
question generation task.
Description
Keywords
"BERT,
triple extraction,
named entity,
semantic dependency graph,
Question Answering systems"