[email protected]

科学发展研究

Scientific Development Research

您当前位置:首页 > 精选文章

Scientific Development Research . 2025; 5: (1) ; 10.12208/j.sdr.20250006 .

Application of deep neural networks combined with multi-modal data in sign language recognition
结合多模态数据的深度神经网络在手语识别中的应用研究

作者: 李富钢 *

西南交通大学计算机与人工智能学院 四川成都

*通讯作者: 李富钢,单位:西南交通大学计算机与人工智能学院 四川成都;

引用本文: 李富钢 结合多模态数据的深度神经网络在手语识别中的应用研究[J]. 科学发展研究, 2025; 5: (1) : 28-32.
Published: 2025/3/31 15:30:22

摘要

随着人工智能和计算机视觉技术的快速发展,手语识别成为人机交互和无障碍通信研究的重要方向。传统手语识别方法往往依赖单一模态数据,如图像或视频,存在信息丢失、识别精度受限等问题。多模态数据融合结合视觉、深度信息、肌电信号、IMU等,能够丰富语义表达,提高识别准确性。本文探讨深度神经网络如何结合多模态数据提升手语识别性能,分析关键技术、挑战及其应用前景,以期为智能手语翻译系统的优化提供参考。

关键词: 多模态数据;深度神经网络;手语识别;应用研究

Abstract

With the rapid development of artificial intelligence and computer vision technology, sign language recognition has become an important direction of human-computer interaction and barrier-free communication research. Traditional sign language recognition methods often rely on single modal data, such as image or video, which has problems such as information loss and recognition accuracy limitation. Multi-modal data fusion combined with vision, depth information, EMG, IMU, etc., can enrich semantic expression and improve recognition accuracy. This paper discusses how deep neural networks combine multi-modal data to improve sign language recognition performance, analyzes the key technologies, challenges and application prospects, in order to provide reference for the optimization of intelligent sign language translation system.

Key words: Multi-modal data; Deep neural network; Sign language recognition; Applied research

参考文献 References

[1] 韩国军, 马晨, 王战备, 尹继武. 基于视觉的手指语识别系统设计[J]. 实验技术与管理, 2023, 40(4): 119-124.

[2] 倪广兴, 徐华, 王超. 融合改进YOLOv5及Mediapipe的手势识别研究[J]. 计算机工程与应用, 2024, 60(7): 108-118.

[3] 曹一丹, 王青山, 王琦. 一种双路并行的大规模手势识别模型[J]. 合肥工业大学学报(自然科学版), 2024, 47(5): 585-589.

[4] 韩晓冰, 胡其胜, 赵小飞, 等. 改进YOLOv7-tiny的手语识别算法研究[J]. 现代电子技术, 2024, 47(1): 55-61.

[5] 郭乐铭, 薛万利, 袁甜甜. 多尺度视觉特征提取及跨模态对齐的连续手语识别[J/OL]. 计算机科学与探索, 2024 , 18 (10) :2762-2769.

[6] 常龙飞, 牛清正, 宋伟, 等. 压阻式柔性应变传感纤维的手指姿态识别装置[J]. 西安交通大学学报, 2020, 54(8): 116-123

[7] 张金, 冯涛. 基于改进的Faster RCNN的手势识别[J]. 信息通信, 2019(1): 44-46.

[8] 胡宗承, 周亚同, 史宝军, 等. 结合注意力机制和特征融合的静态手势识别[J]. 计算机工程, 2022, 48(4): 240-246.

[9] 梁华刚, 王亚茹, 张志伟. 基于Res-Bi-LSTM的人脸表情识别[J]. 计算机工程与应用, 2020, 56(13): 204-209.

[10] 周舟, 韩芳, 王直杰. 面向手语识别的视频关键帧提取和优化算法[J]. 华东理工大学学报(自然科学版), 2021, 47(1): 81-88.

[11] 路飞, 韩祥祖, 程显鹏, 等. 基于轻量3D CNNs和Transformer的手语识别[J]. 华中科技大学学报(自然科学版), 2023, 51(5): 13-18.

[12] 杨光义, 丁星宇, 高毅, 等. 基于注意力机制的复杂背景连续手语识别[J]. 武汉大学学报(理学版), 2023, 69(1): 97-105.

[13] 郭丹, 唐申庚, 洪日昌, 等. 手语识别、翻译与生成综述[J]. 计算机科学, 2021, 48(3): 60-70.

[14] 张淑军, 张群, 李辉. 基于深度学习的手语识别综述[J]. 电子与信息学报, 2020, 42(4): 1021-1032.

[15] REDMON J, DIVVALA S, GIRSHICK R, et al. You only look once: unified, real-time object detection[C]// Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. New York: IEEE, 2016: 779-788.

[16] SONG J, AREIAS P M A, BEIYTSCHKO T. A method for dynamic crack and shear band propagation with phantom nodes[J]. International Journal for Numerical Methods in Engineering, 2006, 67(6): 868-893.

[17] LIU W, ANGUELOV D, ERHAN D, et al. SSD: single shot multibox detector[C].European Conference on Computer Vision. Heidelberg: Springer, 2016: 21-37.

[18] WANG X, GIRSHICK R, GUPTA A, et al. Non-local neural networks[C]. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2018: 7794-7803.

[19] WANG C Y, BOCHKOVSKIY A, LIAO H Y M. YOLOv7: trainable bag-of-freebies sets new state-of-the-art for real-time object detectors[C].2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). New York: IEEE, 2023: 7464-7475.

[20] TRAN D, WANG H, TORRESANI L, et al. A closer look at spatiotemporal convolutions for action recognition[C]. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2018: 6450-6459.