A Text Matching Method Based on a Pretraining Language Model: Sentence Embeddings Using Siamese BERT-Networks


This work is supported by National Key Research and Development Program of China (2019YFB1405200), High-Level University Construction Special Project of Guangdong Province, China in 2019 (5041700175) and Second Batch of New Engineering Research and Practice Projects of Ministry of Education (E-RGZN20201036)

Ethical statement:

  • Article
  • |
  • Figures
  • |
  • Metrics
  • |
  • Reference
  • |
  • Related
  • |
  • Cited by
  • |
  • Materials

    The sentence embeddings using Siamese BERT-Networks pre-trained language model has two shortcomings in its presentation layer for text matching, that is, (1) two queried texts are directly computed after they are represented in vectors by the BERT Encoder, (2) such computation does not consider the needs to refine the granular representation of the two queried texts. As such presented semantics could be deviated and it is also difficult to assess the importance of single words in text matching. This paper proposes an improved text similarity matching model SBMAA based on SBERT pre-trained language model. Firstly, the hidden layer vectors of the two queries passing through the SBERT model are obtained, and then the similarity matrix between the two is calculated. The attention mechanism is used to encode the tokens in the two sentences again to obtain interactive features and pool them. Finally, the fully connected layer is connected for prediction. This method introduces the multi-head attention alignment mechanism, which is a common way of interactive text matching algorithm, and strengthens the correlation degree between similar texts, so that the model can achieve more accurate matching effect. The experimental results on ATEC 2018 NLP data set and CCKS 2018 Webank Customer Question Matching dataset show that compared with the five popular text similarity matching models ESIM, ConSERT, BERT-whitening, SimCSE and Baseline model SBERT, The proposed SBMAA model achieves 84.7% and 90.4% in F1 evaluation index, 18.6% and 8.7% higher than Baseline, respectively. It also shows good effect in accuracy and recall rate, and has certain robustness.

    Cited by
Get Citation

LU Meiqing, SHEN Yanyan. A Text Matching Method Based on a Pretraining Language Model: Sentence Embeddings Using Siamese BERT-Networks[J]. Journal of Integration Technology,2023,12(2):53-63

Article Metrics
  • Abstract:
  • PDF:
  • HTML:
  • Received:
  • Revised:
  • Adopted:
  • Online: March 23,2023
  • Published: