Chinese spelling error correction model for transcribed text
DOI:
Author:
Affiliation:

Beijing University of Posts and Telecommunications, School of Artificial Intelligence,,Beijing 100876, China

Clc Number:

TP3

Fund Project:

  • Article
  • |
  • Figures
  • |
  • Metrics
  • |
  • Reference
  • |
  • Related
  • |
  • Cited by
  • |
  • Materials
  • |
  • Comments
    Abstract:

    Aiming at the high error rate of speech transcription text, proposes a text error detection and correction model based on MacBERT to correct the text after speech transcription. In the error detection stage, the MacBERT-BiLSTM-CRF model is used to check whether the text is wrong and where it is. In the error correction stage, starting from the two dimensions of confidence and phonetic similarity, a curve of "confidence-phonetic similarity" is delineated to determine whether candidate words are to be corrected for errors. The confidence of the candidate words is calculated using the MacBERT language model, and a phonetic similarity calculation method based on pinyin code is proposed. Experiments were conducted on the public speech dataset Thchs-30 by calling Baidu speech recognition API. Compared with the existing methods, the precision rate, recall rate and F1 value in the error detection stage and error correction stage have been improved. Among them, the error correction stage The accuracy rate reaches 83.32%, which improves the accuracy of the transcribed text.

    Reference
    Related
    Cited by
Get Citation
Share
Article Metrics
  • Abstract:
  • PDF:
  • HTML:
  • Cited by:
History
  • Received:
  • Revised:
  • Adopted:
  • Online: February 19,2024
  • Published: