Abstract:In multi-party conversations, identifying the reply-to relation between messages is an important task in the dialogue domain. Existing efforts have not addressed the following two issues related to data distribution: shorter messages tend to appear more frequently, while shorter texts contain less semantic information, which limits the learning ability of the model; the number of positive samples with reply-to relation is often much less than the number of negative samples, leading to data skewness issue during training phase and reducing the model’s performance in processing positive samples. Aiming at the two issues, this paper proposes an improved model based on a pre-trained language model, which firstly mitigates the short text-related issue through dynamic inquiry window modeling; and then copes with the positive sample-related issue through position-driven positive sample weight optimization. The paper is compared with previous research, and the experimental results show that this paper’s work improves the recall metric by an average of 15.7% compared to the baseline model based on the pre-trained language model. In addition, this paper constructs a new dataset collected from the Telegram platform, which can provide data support for subsequent related studies.