A Text Matching Method Based on a Pretraining Language Model: Sentence Embeddings Using Siamese BERT-Networks

doi:10.12146/j.issn.2095-3135.20220817001

Home > Archive>Volume 12, Issue 2, 2023 >53-63. DOI:10.12146/j.issn.2095-3135.20220817001

A Text Matching Method Based on a Pretraining Language Model: Sentence Embeddings Using Siamese BERT-Networks
DOI:
                        10.12146/j.issn.2095-3135.20220817001
                    
CSTR:
                        
Author:
                        
Affiliation:
Clc Number:
Fund Project:This work is supported by National Key Research and Development Program of China (2019YFB1405200), High-Level University Construction Special Project of Guangdong Province, China in 2019 (5041700175) and Second Batch of New Engineering Research and Practice Projects of Ministry of Education (E-RGZN20201036)

Article

Figures

Metrics

Reference

Cited by

Materials

Comments

Abstract:

The sentence embeddings using Siamese BERT-Networks pre-trained language model has two shortcomings in its presentation layer for text matching, that is, (1) two queried texts are directly computed after they are represented in vectors by the BERT Encoder, (2) such computation does not consider the needs to refine the granular representation of the two queried texts. As such presented semantics could be deviated and it is also difficult to assess the importance of single words in text matching. This paper proposes an improved text similarity matching model SBMAA based on SBERT pre-trained language model. Firstly, the hidden layer vectors of the two queries passing through the SBERT model are obtained, and then the similarity matrix between the two is calculated. The attention mechanism is used to encode the tokens in the two sentences again to obtain interactive features and pool them. Finally, the fully connected layer is connected for prediction. This method introduces the multi-head attention alignment mechanism, which is a common way of interactive text matching algorithm, and strengthens the correlation degree between similar texts, so that the model can achieve more accurate matching effect. The experimental results on ATEC 2018 NLP data set and CCKS 2018 Webank Customer Question Matching dataset show that compared with the five popular text similarity matching models ESIM, ConSERT, BERT-whitening, SimCSE and Baseline model SBERT, The proposed SBMAA model achieves 84.7% and 90.4% in F1 evaluation index, 18.6% and 8.7% higher than Baseline, respectively. It also shows good effect in accuracy and recall rate, and has certain robustness.

Reference

Cited by

Get Citation

LU Meiqing, SHEN Yanyan. A Text Matching Method Based on a Pretraining Language Model: Sentence Embeddings Using Siamese BERT-Networks[J]. Journal of Integration Technology,2023,12(2):53-63

Copy

Article Metrics

Abstract:
PDF:
HTML:
Cited by:

History

Received:
Revised:
Adopted:
Online: March 23,2023
Published:

Home

About Journal

Editorial Team

Author Center

Peer Review

Reader Center

Ethics

Contact us

中文

Get Citation

Share

Article Metrics

History

Article QR Code