Large scale semantic segmentation is considered as one of the fundamental tasks in 3D scene understanding. Point clouds provide a basic and rich geometric representation of scenes and tangible objects. Convolutional Neural Networks (CNNs) have demonstrated an impressive success in processing regular discrete data such as 2D images and 1D audio. However, CNNs do not directly generalize to point cloud processing due to their irregular and un-ordered nature. One way to extend CNNs to point cloud understanding is to derive an intermediate euclidean representation of a point cloud by projecting onto image domain, voxelizing, or treating points as vertices of an un-directed graph. Graph-CNNs (GCNs) have demonstrated to be a very promising solution for deep learning on irregular data such as social networks, biological systems, and recently point clouds. Early works in literature for graph based point networks relied on constructing dynamic graphs in the node feature space to define a convolution kernel. Later works constructed hierarchical static graphs in 3D space for an encoder-decoder framework inspired from image segmentation. This thesis takes a closer look at both dynamic and static graph neighborhoods of graph- based point networks for the task of semantic segmentation in order to: 1) discuss a potential cause for why going deep in dynamic GCNs does not necessarily lead to an improved performance, and 2) propose a new approach in treating points in a static graph neighborhood for an improved information aggregation. The proposed method leads to an efficient graph based 3D semantic segmentation network that is on par with current state-of-the-art methods on both indoor and outdoor scene semantic segmentation benchmarks such as S3DIS and Semantic3D.
|Date made available
|KAUST Research Repository