Abstract:The significant variation in cardiac dimensions across different age groups and the faster heart rate in children result in more blurred cardiac borders compared to adults, impacting the segmentation of echocardiography. To address the above problems, the segmentation model called H2Former is improved, and the model called TPA-H2VSS combining the attention and state space is proposed to segment the left ventricle of pediatric echocardiography. Firstly, this paper replaces the Transformer block with the visual state space (VSS) block to enhance the model’s advantage in long-range modeling. Secondly, the temporal attention (TA) module is built between the encoder and decoder in the model to complements and interacts with the semantic information of the left ventricle in the echocardiography video in the temporal dimension. Finally, the positional attention (PA) module is added in the output head to make pediatric echocardiographic left ventricle segmentation more accurate. The experiments were trained, validated, and tested on the pediatric echocardiographic video dataset EchoNet-Pediatrics on the PSAX dataset and the A4C dataset, respectively. Compared with the base model H2Former, Dice, IoU, and accuracy on the PSAX dataset were improved by 0.86%, 1.41%, and 0.15%, respectively, and HD was reduced by 0.219 5. Dice, IoU, and accuracy on the A4C dataset were improved by 0.93%, 1.53%, and 0.2%, respectively, and HD was reduced by 0.167. By comparing with other models, it was demonstrated that the model could effectively segment the left ventricle in pediatric echocardiography and could provide a new solution for the auxiliary diagnosis of congenital heart disease.