A Python Engineer's Introduction to 3D Gaussian Splatting (Part 1)

A Python-based introduction to gaussian splatting for engineers versed in python and machine learning but less experienced with graphics rendering.
A Python Engineer's Introduction to 3D Gaussian Splatting (Part 1)
Photo by Cok Wisnu on Unsplash

A Python Engineer’s Introduction to 3D Gaussian Splatting (Part 1)

Understanding and coding Gaussian Splatting from a Python Engineer’s perspective

Gaussian Splatting in 3D rendering

In early 2013, authors from Université Côte d’Azur and Max-Planck-Institut für Informatik published a paper titled “3D Gaussian Splatting for Real-Time Field Rendering.” The paper presented a significant advancement in real-time neural rendering, surpassing the utility of previous methods like NeRF’s. Gaussian splatting not only reduced latency but also matched or exceeded the rendering quality of NeRF’s, taking the world of neural rendering by storm.

Gaussian splatting, while effective, can be challenging to understand for those unfamiliar with camera matrices and graphics rendering.

Gaussian splatting, while effective, can be challenging to understand for those unfamiliar with camera matrices and graphics rendering. Moreover, I found that resources for implementing gaussian splatting in Python are scarce, as even the author’s source code is written in CUDA! This tutorial aims to bridge that gap, providing a Python-based introduction to gaussian splatting for engineers versed in python and machine learning but less experienced with graphics rendering.

COLMAP, a software that extracts points consistently seen across multiple images using Structure from Motion (SfM).

To begin, we use COLMAP, a software that extracts points consistently seen across multiple images using Structure from Motion (SfM). SfM essentially identifies points (e.g., the top right edge of a doorway) found in more than 1 picture. By matching these points across different images, we can estimate the depth of each point in 3D space. This closely emulates how human stereo vision works, where depth is perceived by comparing slightly different views from each eye. Thus, SfM generates a set of 3D points, each with x, y, and z coordinates, from the common points found in multiple images giving us the “structure” of the scene.

The extrinsic matrix translates points from world space to camera space, making the camera the new center of the world.

The folder consists of three files corresponding to the camera parameters, the image parameters, and the actual 3D points. We will start with the 3D points.

The points file consists of thousands of points in 3D along with associated colors. The points are centered around what is called the world origin, essentially their x, y, or z coordinates are based upon where they were observed in reference to this world origin. The exact location of the world origin isn’t crucial for our purposes, so we won’t focus on it as it can be any arbitrary point in space. Instead, its only essential to know where you are in the world in relation to this origin. That is where the image file becomes useful!

Broadly speaking, the image file tells us where the image was taken and the orientation of the camera, both in relation to the world origin. Therefore, the key parameters we care about are the quaternion vector and the translation vector. The quaternion vector describes the rotation of the camera in space using 4 distinct float values that can be used to form a rotation matrix (3Blue1Brown has a great video explaining exactly what quaternions are). The translation vector then tells us the camera’s position relative to the origin. Together, these parameters form the extrinsic matrix, with the quaternion values used to compute a 3x3 rotation matrix and the translation vector appended to this matrix.

The intrinsic matrix represents the focal length in the x and y direction, along with the principal point coordinates.

When we convert coordinates from world space to camera space, we still have a 3D vector, where the z coordinate represents the depth in the camera’s view. This depth information is crucial for determining the order of splats, which we need for rendering later on.

To review, we can now take any set of 3D points and project where they would appear on a 2D image plane as long as we have the various location and camera parameters we need! With that in hand, we can move forward with understanding the “gaussian” part of gaussian splatting in part 2.