PointConv: Deep Convolutional Networks on 3D Point Clouds


Unlike images which are represented in regular dense grids, 3D point clouds are irregular and unordered,
hence applying convolution on them can be difficult. In this paper, we extend the dynamic filter to a new
convolution operation, named PointConv. PointConv can be applied on point clouds to build deep convolutional
networks. We treat convolution kernels as nonlinear functions of the local coordinates of 3D points comprised
of weight and density functions. With respect to a given point, the weight functions are learned with
multi-layer perceptron networks and the density functions through kernel density estimation. A novel
reformulation is proposed for efficiently computing the weight functions, which allowed us to dramatically
scale up the network and significantly improve its performance. The learned convolution kernel can be used
to compute translation-invariant and permutation-invariant convolution on any point set in the 3D space.
Besides, PointConv can also be used as deconvolution operators to propagate features from a subsampled point
cloud back to its original resolution. Experiments on ModelNet40, ShapeNet, and ScanNet show that deep
convolutional neural networks built on PointConv are able to achieve state-of-the-art on challenging semantic
segmentation benchmarks on 3D point clouds. Besides, our experiments converting CIFAR-10 into a point cloud
showed that networks built on PointConv can match the performance of convolutional networks in 2D images of
a similar structure.

Segmentation Example

1. Segmentation Results on ScanNet

Method mIoU(%)
ScanNet [1] 30.6
PointNet [2] 33.9
SPLAT Net [3] 39.3
Tangent Conv [4] 43.8
PointConv 55.6

2. Network Structure for ScanNet

alt text 

Note: PointConv:32, 64-1024 is a PointConv layer with neighborhood
size K = 32, C_out = 64 output channels, and N = 1024 centroids.
Each rectangle represents a convolution or deconvolution layer.
Each ellipse represents the data dimensionality at the particular stage.
1024 x 64 means the point cloud has 1024 points with 64-dimensional features.

3. Segmentation Visualization

alt text 


Paper     Code(Tensorflow)     Code(PyTorch)


title={PointConv: Deep Convolutional Networks on 3D Point Clouds},
author={Wu, Wenxuan and Qi, Zhongang and Fuxin, Li},
journal={arXiv preprint arXiv:1811.07246},


[1] Angela Dai, Angel X Chang, Manolis Savva, Maciej Hal-ber, Thomas Funkhouser, and Matthias Nießner. Scannet:Richly-annotated 3d reconstructions of indoor scenes. InProc. IEEE Conf. on Computer Vision and Pattern Recog-nition (CVPR), volume 1, 2017.

[2] Charles Ruizhongtai Qi, Li Yi, Hao Su, and Leonidas JGuibas. Pointnet: Deep hierarchical feature learning onpoint sets in a metric space. InAdvances in Neural Infor-mation Processing Systems, pages 5105–5114, 2017

[3] Hang Su, Varun Jampani, Deqing Sun, Subhransu Maji,Evangelos Kalogerakis, Ming-Hsuan Yang, and Jan Kautz.Splatnet: Sparse lattice networks for point cloud processing.InProceedings of the IEEE Conference on Computer Visionand Pattern Recognition, pages 2530–2539, 2018

[4] Maxim Tatarchenko, Jaesik Park, Vladlen Koltun, and Qian-Yi Zhou. Tangent convolutions for dense prediction in 3d.InProceedings of the IEEE Conference on Computer Visionand Pattern Recognition, pages 3887–3896, 201