Object Detection for Semantic SLAM using Convolution Neural Networks

Object Detection for Semantic SLAM using Convolution Neural Networks

Conventional SLAM (Simultaneous Localization and Mapping) systems typically provide odometry estimates and point-cloud reconstructions of an unknown environment. While these outputs can be used for tasks such as autonomous navigation, they lack any semantic information. Our project implements a modular object detection framework that can be used in conjunction with a SLAM engine to generate semantic scene reconstructions. A semantically-augmented reconstruction has many potential applications. Some examples include: • Discriminating between pedestrians, cars, bicyclists, etc in an autonomous driving system. • Loop-closure detection based on object-level descriptors. • Smart household bots that can retrieve objects given a natural language command. An object detection algorithm designed for these applications has a unique set of requirements and constraints. The algorithm needs to be reasonably fast – on the order of a few seconds at most. Since the camera is in motion, the detections must be consistent from multiple viewpoints. It needs to be robust to artifacts such as motion blur and rolling shutter. Currently, no existing object detection algorithm addresses all of these concerns. Therefore, our algorithm is designed with these requirements in mind. In the past couple of years, convolutional neural networks have experienced a resurgence in popularity. They currently dominate the benchmarks for image classification and detection tasks [1]. This has motivated us to use it as the core of our detection frameworkConventional SLAM (Simultaneous Localization and Mapping) systems typically provide odometry estimates and point-cloud reconstructions of an unknown environment. While these outputs can be used for tasks such as autonomous navigation, they lack any semantic information. Our project implements a modular object detection framework that can be used in conjunction with a SLAM engine to generate semantic scene reconstructions. A semantically-augmented reconstruction has many potential applications. Some examples include: • Discriminating between pedestrians, cars, bicyclists, etc in an autonomous driving system. • Loop-closure detection based on object-level descriptors. • Smart household bots that can retrieve objects given a natural language command. An object detection algorithm designed for these applications has a unique set of requirements and constraints. The algorithm needs to be reasonably fast – on the order of a few seconds at most. Since the camera is in motion, the detections must be consistent from multiple viewpoints. It needs to be robust to artifacts such as motion blur and rolling shutter. Currently, no existing object detection algorithm addresses all of these concerns. Therefore, our algorithm is designed with these requirements in mind. In the past couple of years, convolutional neural networks have experienced a resurgence in popularity. They currently dominate the benchmarks for image classification and detection tasks [1]. This has motivated us to use it as the core of our detection framework.

Research Paper Link: Download Paper

Related Post

Leave a Reply

    Open chat