[email protected]

工程学研究

Journal of Engineering Research

您当前位置:首页 > 精选文章

Journal of Engineering Research. 2025; 4: (1) ; 10.12208/j.jer.20250007 .

Point-voxel 3D object detection method based on transfer block Transformer and feature pyramid
基于转移分块Transformer和特征金字塔的点-体素三维目标检测方法

作者: 刘良杰1, 孙凯2, 任宏乾3, 谢国涛4 *

1株洲中车时代软件技术有限公司 湖南株洲

2国家能源集团陕西神延煤炭有限责任公司西湾露天煤矿 陕西榆林

3湖南大学机械与运载工程学院 湖南长沙

4湖南大学无锡智能控制研究院 江苏无锡

*通讯作者: 谢国涛,单位:湖南大学无锡智能控制研究院 江苏无锡;

引用本文: 刘良杰, 孙凯, 任宏乾, 谢国涛 基于转移分块Transformer和特征金字塔的点-体素三维目标检测方法[J]. 工程学研究, 2025; 4: (1) : 46-58.
Published: 2025/1/20 7:35:31

摘要

随着环境感知技术的发展,激光雷达三维目标检测取得了显著进展。然而,基于体素的三维检测器在划分点云时,难以捕捉丰富的上下文信息和细节特征,尤其在处理遮挡和截断问题时,原始点云的细节信息常常丢失。为解决这些挑战,本文提出了一种新型数据增强策略,增强了模型对不完整点云的处理能力;并提出了基于转移分块Transformer和特征金字塔的点-体素三维目标检测模型PV-FMRTNet,有效解决了点云转换为体素过程中位置信息丢失的问题。此外,设计了一种新的二维特征编码网络,提升了基于体素的三维目标检测系统的性能。评估结果显示,本文模型在检测汽车、行人和骑行者方面的准确度分别达到84.30%、61.76%和78.08%,相比主流算法PointPillars等基准模型平均提升2.08%,展现出先进的准确性和鲁棒性。

关键词: 自动驾驶;深度学习;三维目标检测;特征金字塔;点体素

Abstract

With the development of environmental sensing technology, lidar three-dimensional target detection has made significant progress. However, it is difficult for voxel-based 3D detectors to capture rich contextual information and detailed features when dividing point clouds. Especially when dealing with occlusion and truncation problems, the detailed information of the original point cloud is often lost. To address these challenges, this paper proposes a new data augmentation strategy to enhance the model's ability to handle incomplete point clouds; and proposes a point-to-voxel 3D target detection model PV-FMRTNet based on transfer block Transformer and feature pyramid, which effectively solves the problem of position information loss in the process of converting point clouds to voxels. In addition, a new 2D feature encoding network was designed to improve the performance of the voxel-based 3D object detection system. The evaluation results show that the accuracy of the proposed model in detecting cars, pedestrians and cyclists reached 84.30%, 61.76% and 78.08% respectively, which is an average improvement of 2.08% over the mainstream algorithm PointPillars and other benchmark models, showing advanced accuracy and robustness.

Key words: Autonomous driving; Deep learning; 3D object detection; Feature pyramid; Point voxel

参考文献 References

[1] Xie G, Zhang X, Gao H, et al. Situational assessments based on uncertainty-risk awareness in complex traffic scenarios[J]. Sustainability, 2017, 9(9): 1582.

[2] 刘秀红,姜圣.基于自动驾驶的车辆安全技术应用探讨[J].时代汽车, 2023(6):175-177.

[3] 李克强,戴一凡,李升波等.智能网联汽车(ICV)技术的发展现状及趋势[J].汽车安全与节能学报,2017,8(01):1-14.

[4] 吕璐,程虎,朱鸿泰等.基于深度学习的目标检测研究与应用综述[J].电子与封装,2022,22(01):72-80.

[5] 郑少武,李巍华,胡坚耀.基于激光点云与图像信息融合的交通环境车辆检测[J].仪器仪表学报,2019,40(12): 143-151.

[6] 张银,任国全,程子阳,孔国杰.三维激光雷达在无人车环境感知中的应用研究[J].激光与光电子学进展,2019, 56(13):1-11.

[7] 叶语同,李必军,付黎明.智能驾驶中点云目标快速检测与跟踪[J].武汉大学(信息科学版),2019,44(01):139-144+152.

[8] 王刚,王沛.基于深度学习的三维目标检测方法研究[J].计算机应用与软件,2020,37(12):164-168.

[9] Qi C R, Su H, Mo K, et al. Pointnet: Deep learning on point sets for 3d classification and segmentation[C]//Proceedings of the IEEE conference on computer vision and pattern recognition. 2017: 652-660.

[10] Qi C R, Yi L, Su H, et al. Pointnet++: Deep hierarchical feature learning on point sets in a metric space[J]. Advances in neural information processing systems, 2017, 30.

[11] Qi C R, Liu W, Wu C, et al. Frustum pointnets for 3d object detection from rgb-d data[C]//Proceedings of the IEEE conference on computer vision and pattern recognition. 2018: 918-927.

[12] Shi S, Wang X, Li H. Pointrcnn: 3d object proposal generation and detection from point cloud[C]//Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. 2019: 770-779.

[13] Tang Q, Bai X, Guo J, et al. DFAF3D: A dual-feature-aware anchor-free single-stage 3D detector for point clouds[J]. Image and Vision Computing, 2023, 129: 104594.

[14] 周燕,蒲磊,林良熙,et al.激光点云的三维目标检测研究进展[J].计算机科学与探索, 2022, 16(12):23.

[15] Zhou Y, Tuzel O. Voxelnet: End-to-end learning for point cloud based 3d object detection[C]//Proceedings of the IEEE conference on computer vision and pattern recognition. 2018: 4490-4499.

[16] Yan Y, Mao Y, Li B. Second: Sparsely embedded convolutional detection[J]. Sensors, 2018, 18(10): 3337.

[17] Lang A H, Vora S, Caesar H, et al. Pointpillars: Fast encoders for object detection from point clouds[C] //Proceedings of the IEEE/CVF conference on computer vision and pattern  recognition. 2019: 12697-12705.

[18] Zhou S, Tian Z, Chu X, et al. FastPillars: a deployment-friendly pillar-based 3D detector[J]. arXiv preprint arXiv:2302.02367, 2023.

[19] Shi S, Guo C, Jiang L, et al. Pv-rcnn: Point-voxel feature set abstraction for 3d object detection[C]//Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. 2020: 10529-10538.

[20] Deng J, Shi S, Li P, et al. Voxel r-cnn: Towards high performance voxel-based 3d object detection[C]// Proceedings of the AAAI conference on artificial intelligence. 2021, 35(2): 1201-1209.

[21] Liu Z, Tang H, Lin Y, et al. Point-voxel CNN for efficient 3D deep learning[C]//Proceedings of the 33rd International Conference on Neural Information Processing Systems. 2019: 965-975.

[22] Liu Z, Lin Y, Cao Y, et al. Swin transformer: Hierarchical vision transformer using shifted windows[C]//Proceedings of the IEEE/CVF international conference on computer vision. 2021: 10012-10022.

[23] Lin T Y, Dollár P, Girshick R, et al. Feature pyramid networks for object detection[C]//Proceedings of the IEEE conference on computer vision and pattern recognition. 2017: 2117-2125.

[24] Lin T Y, Goyal P, Girshick R, et al. Focal loss for dense object detection[C]//Proceedings of the IEEE international conference on computer vision. 2017: 2980-2988.

[25] Shannon C E. A mathematical theory of communication[J]. The Bell system technical journal, 1948, 27(3): 379-423.

[26] Geiger A, Lenz P, Urtasun R. Are we ready for autonomous driving? the kitti vision benchmark suite[C]//2012 IEEE conference on computer vision and pattern recognition. IEEE, 2012: 3354-3361.

[27] Ku J, Mozifian M, Lee J, et al. Joint 3d proposal generation and object detection from view aggregation[C]//2018 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS). IEEE, 2018: 1-8.

[28] Zhao X, Liu Z, Hu R, et al. 3D object detection using scale invariant and feature reweighting networks[C]// Proceedings of the AAAI Conference on Artificial Intelligence. 2019, 33(01): 9267-9274.

[29] Yoo J H, Kim Y, Kim J, et al. 3d-cvf: Generating joint camera and lidar features using cross-view spatial feature fusion for 3d object detection[C]//Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part XXVII 16. Springer International Publishing, 2020: 720-736.

[30] Huang T, Liu Z, Chen X, et al. Epnet: Enhancing point features with image semantics for 3d object detection[C]//Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part XV 16. Springer International Publishing, 2020: 35-52.

[31] Chen X, Ma H, Wan J, et al. Multi-view 3d object detection network for autonomous driving[C]//Proceedings of the IEEE conference on Computer Vision and Pattern Recognition. 2017: 1907-1915.

[32] Liang M, Yang B, Wang S, et al. Deep continuous fusion for multi-sensor 3d object detection[C]//Proceedings of the European conference on computer vision (ECCV). 2018: 641-656.

[33] Pang S, Morris D, Radha H. CLOCs: Camera-LiDAR object candidates fusion for 3D object detection[C]//2020 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS). IEEE, 2020: 10386-10393.

[34] He C, Zeng H, Huang J, et al. Structure aware single-stage 3d object detection from point cloud[C]//Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. 2020: 11873-11882.

[35] Hu J S K, Kuai T, Waslander S L. Point density-aware voxels for lidar 3d object detection[C]//Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. 2022: 8469-8478.

[36] Guan T, Wang J, Lan S, et al. M3detr: Multi-representation, multi-scale, mutual-relation 3d object detection with transformers[C]//Proceedings of the IEEE/CVF winter conference on applications of computer vision. 2022: 772-782.

[37] Yang H, He T, Liu J, et al. GD-MAE: generative decoder for MAE pre-training on lidar point clouds[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 2023: 9403-9414.