Shrey Niraula

Real Time Visual Localization and Mapping of Mobile Robot in Dynamic Environment

Final Year Project

2020

Human brain is extremely powerful that it can map out the entire 3D environment. This helps us to localize our whereabouts and even helps to navigate from one point to another. However, lots of researches have been carried out to mimic this ability of humans to artificial bots. Use of expensive sensors like 3D lidars might help to increase the accuracy of localization and navigation, but is not practical to use in small bots. Therefore, our goal is to use cheap single camera to make robot able to localize and navigate in the environment.

Report

Poster

Slides

This project uses camera as its only sensor to build 3d map of entire room and localize itself in the built map. The map can then be used for navigation purposes within the mapped environment. Problems such as dynamically changing environment, varying lightening conditions, lack of textured environment are the hindrances for visual SLAM. Some of these problems has been well tackled in this project. Dynamic objects in the environment have been masked to minimize its effect. Light invariant feature extraction has been used to tackle with variations in lightening conditions.

Effective 2D-2D
correspondence estimation

Dynamic Obstacle Detection Using Semantic Segmentation

Dynamic Obstacle that hinders the visual SLAM has been tackled by using Semantic Segmentation Technology. I have been responsible for this task. My role mainly included carrying out the Proof of Concepts (POC) of the existing semantic segmentation models and train, fine-tune and employ the model integrating it with the rest of the system. My roles are:

Models Comparisons
Human was considered as main dynamic object and model was trained to generate the human segmentation mask. IcNet was taken as final choice on the basis of Speed vs mIOU tradeoff from various models such as: ICNet, BiSeNetv1, DeepLabV3plus, UNetPlus.

Custom Dataset Generation
To fine-tune ICNet model, custom dataset of legs of human (since, human legs mainly fall in the vision of robot) was created. Both test datasets and validation datasets were generated from sequences of images from video of person walking on multi environment under various lighting conditions. In order to label the dataset, instance segmentation was performed using pretrained model of R50-FPN from detectron2 model zoo. Output was twisted to generate just binary mask.

Prevention of Overfitting
In order to prevent overfit of ICnet model, feature extracting backbone was frozen during fine tuning.

Team Members

Next Project