Drivers instinctively slow down when they see a cyclist ahead or a traffic light turning red. Their reactions depend on the seamless fusion of information gained through sight and sound with knowledge of road conditions, human behaviour and traffic regulations.
Autonomous cars attempt to mimic this human skill using ‘smart sensing’. Data from sensors such as cameras and radars are processed by computers to enable vehicles to decide direction and speed. But self-driving vehicles have a long way to go before they match humans in terms of reliability, computational efficiency, and real-time decision making.
Yanyong Zhang, a computer scientist at the University of Science and Technology of China (USTC) in Hefei, China, wants to close the gap between humans and autonomous vehicles when it comes to road sense.
“Our team’s work spans the range from button-sized sensing devices to complex perception algorithms learned from huge amounts of data,” says Zhang, who joined the USTC School of Computer Science and Technology in 2018.
Professor Yanyong Zhang, a computer scientist at the University of Science and Technology of China, is working to make safer autonomous vehicles.
In their quest to improve the road sense of autonomous vehicles, Zhang’s team has made strides in fusing data from devices, such as cameras, lidar and millimeter-wave radar. This could lead to unmanned systems being made safer by being better able to detect and track 3D objects, such as pedestrians or moving vehicles.
Data Fusion
One common combination of sensors used to help autonomous vehicles gather a comprehensive understanding of their surroundings is stereo camera and lidar, which is also called laser scanning. Stereo cameras acquire 2D images rich in useful information such as color, texture and fine-grained shapes. Lidar emits laser beams then measures the time they take to bounce back after hitting objects to form a ‘point cloud’ or 3D map of its surroundings.
Fusing the two types of data is tricky. One approach is to convert the 3D lidar data points onto the 2D image plane captured by the camera and then fuse them with the 2D pixel data. However, this loses precious information in the process. Pixel density is typically much higher than lidar point density, leading to a mismatch. “This approach, currently used by several mainstream sensor fusion schemes, can cause serious information loss,” says Hanqi Zhu, a doctoral researcher in Zhang’s team.
To solve the problem, Zhu proposed using so-called ‘virtual points’ as a bridge between 3D pixel data points and 2D lidar data points. The idea was inspired by virtual antenna arrays, which are antenna systems optimized by adding data streams from computer-generated virtual antennas.
Fusing data from cameras and lidar for safer autonomous vehicles. This conceptual drawing shows how virtual points (green) enable the efficient fusion of image data (I1, l2) and lidar point cloud data features (L1, L2) for subsequent processing.
To make the process as computationally efficient as possible, these virtual points are generated only around foreground objects of interest from which the cameras also capture data.
When Zhang’s team tested this approach, called VPFNet, on the widely used autonomous driving KITTI dataset (from the Karlsruhe Institute of Technology and Toyota Technological Institute), they found it performed well. It ranked 31st out of 405 methods for detecting cars and third out of 211 methods for detecting pedestrians.
The team also optimized VPFNet for computational efficiency. When running on a single NVIDIA RTX 2080Ti graphics card, commonly used for gaming or artificial intelligence, VPFNet can process 15 frames per second. A speed of at least ten frames per second is crucial for tracking moving objects in real-world driving, says Zhang. The study was published in the journal IEEE Transactions on Multimedia1 in 2023.
“VPFNet introduces a novel fusion approach — virtual points — for fusing lidar and stereo camera data, which is very important for improving detection performance,” explains Zhang.
Testing combinations of sensors
To help autonomous vehicles make better navigation decisions, researchers worldwide have been exploring the use of different combinations of sensors. But the efficient integration of those sensors has remained hit-or-miss, says Yao Li, another doctoral researcher in Zhang’s team.
The Zhang team approach has been to start by comparing strengths and weaknesses of the different sensors’ data and then develop optimal ways to combine data. “This allows us to apply tailored processing to different data individually,” says Li.
For example, millimeter-wave radar, which uses short-wavelength electromagnetic waves to detect the distance, speed and angle of targets, cannot provide information about height, but it does provide a 2D point cloud that provides a bird’s-eye view without the need for complex 3D data processing. Unlike cameras, it is also unaffected by heavy rain or fog.
The team plan to use lidar, cameras and millimeter-wave radar to capture scenes with overlapping perspectives to minimize blind spots, says Zhang.
The team is also developing a framework they call EZFusion, which includes data from the three types of sensors. EZFusion processes radar data and combines it with lidar and image data. The fused data, presented in a top-down view, is used for object detection and re-identification.
“EZfusion is simple and effective, achieving an improvement in the performance of 3D object detection and tracking”, says Li. When EZfusion was tested using large-scale autonomous driving dataset nuScenes, it outperformed the top-ranking method EagerMOT in tracking moving objects such as other vehicles and people.
The study was published in the journal IEEE Robotics and Automation Letters in October 20222. “EZfusion has huge potential for real-world applications like autonomous vehicles,” Zhang says.
Calibrating sensors on a miniature autonomous vehicle at the University of Science and Technology China.
On-campus tests
Zhang’s team has incorporated their sensing techniques into miniature autonomous vehicles – Sonic, Little Cloud and Little Owl – that drive around USTC’s busy University Park campus. For the future, the researchers plan to incorporate ever-more sophisticated embodied intelligence into unmanned vehicles so that they can interact with the environment, make decisions and execute tasks with little or no human intervention.
The autonomous vehicle, Sonic, equipped with lidar, cameras and GPS, roaming the USTC campus.
“We built and continue to refine those vehicles to create a sensing system that is not only versatile and practical, but also user-friendly,” Zhang says.
References:
1. Zhu, H., et al., IEEE Transactions on Multimedia 25, 5291-5304 (2023). DOI: 10.1109/TMM.2022.3189778.
2. Li, Y., et al., IEEE Robotics and Automation Letters 7, 11182-11189 (2022). DOI: 10.1109/LRA.2022.3193465.