Camera¶
World coordinate systems¶
OpenGL/MAYA OpenCV/Colmap Blender Unity/DirectX Unreal
Right-handed Right-handed Right-handed Left-handed Left-handed
+y +z +z +y +y +z +z
| / | / | / |
| / | / | / |
|______+x /______+x |/_____+x |/_____+x |______+x
/ | /
/ | /
/ | /
+z +y +y
A common color code: x = red, y = green, z = blue (XYZ=RGB)
Left/right-handed notation: roll your left/right palm from x to y, and your thumb should point to z.
A common color code: right = red., up = green, forward = blue (XYZ=RUF=RGB).
Our camera convention¶
world coordinate is OpenGL/right-handed,
+x = right, +y = up, +z = forwardcamera coordinate is OpenGL (forward is
target --> campos).elevation in (-90, 90), from
+y (-90) --> -y (+90)azimuth in (-180, 180), from
+z (0/-360) --> +x (90/-270) --> -z (180/-180) --> -x (270/-90) --> +z (360/0)
Camera pose conventions¶
We call the camera to world (c2w) transformation matrix strictly as pose.
The inversion of it is the world to camera (w2c) transformation matrix is called extrinsics (or the view transform in graphics).
A camera pose matrix is in the form of:
[[Right_x, Up_x, Forward_x, Position_x],
[Right_y, Up_y, Forward_y, Position_y],
[Right_z, Up_z, Forward_z, Position_z],
[0, 0, 0, 1 ]]
The xyz follows corresponding world coordinate system. However, the three directions (right, up, forward) can be defined differently:
forward can be
camera --> targetortarget --> camera.up can align with the world-up-axis (
y) or world-down-axis (-y).right can also be left, depending on it’s (
up cross forward) or (forward cross up).
This leads to two common camera conventions:
OpenGL OpenCV
Blender Colmap
up target forward & target
| / /
| / /
|/_____right /______right
/ |
/ |
/ |
forward up
These two conventions can be converted by negating the forward and up directions:
pose # camera2world, [4, 4]
# convert between opengl and colmap convention
pose[:, 1] *= -1 # up
pose[:, 2] *= -1 # forward
Note that camera convention is NOT dependent on the world coordinate system!
Why there are different camera conventions?¶
It’s an unfortunate outcome of different 3D world coordinate systems and 2D image coordinate systems. For a 2D image / array / matrix, the convention is usually:
0 1 2
+---------------------> +x/j
0 | (0, 0) (0, 1) (0, 2)
1 | (1, 0) (1, 1) (1, 2)
2 | (2, 0) (2, 1) (2, 2)
v
+y/i
Now assume we are using OpenGL world coordinate system and also OpenGL camera convention:
+y & up
| target
| /
|/______+x & right
/
/
/
+z & forward
The unit camera (Identity rotation matrix) is perfectly aligned with the 3D world coordinate system.
However, if we want to construct the camera rays from camera center to the target, the rays should point to -z, and the +y axis is also misaligned with image coordinate system (where +y points down).
So we need to negate both y and z axes during the ray construction (check kiui.cam.get_rays):
rays_d = np.stack([
(x - cx + 0.5) / focal,
(y - cy + 0.5) / focal * (-1.0),
np.ones_like(x) * (-1.0),
], axis=-1) # [hw, 3]
On the other hand, if we use OpenGL world coordinate system and also OpenCV camera convention:
+y & up +z & forward & target
| /
| equals /
|______+x & right -----> /_______+x & right
/ |
/ |
/ |
+z & forward & target +y & up
While the unit camera points to a different direction, you can see that it’s actually aligned with the 2D image convention. And we don’t need to negate any axes during the ray construction:
rays_d = np.stack([
(x - cx + 0.5) / focal,
(y - cy + 0.5) / focal,
np.ones_like(x),
], axis=-1) # [hw, 3]
Given a poorly-documented dataset of cameras (and depth maps), how to make sure its conventions?¶
check if the camera “pose” is pose (camera2world) or extrinsics (world2camera), many datasets are misusing the term “pose” to mean extrinsics.
check if the depth map is the distance to camera center or to the camera plane. If it’s to the camera center, we need to normalize the
rays_dto unit length. Otherwise, just use the above raw rays (|z|is always 1).simply assume a camera convention (e.g., OpenGL) and use it to construct the rays, and project depth to world-space point cloud for visualization.
Check if point cloud of static objects are drifting (they shouldn’t if everything is correct). If they are, you are using the wrong camera convention. Change to the other convention and it should be correct!
This is a complete script:
import os
import re
import cv2
import kiui
import glob
import tqdm
import numpy as np
from plyfile import PlyData, PlyElement
def glob_sorted(path: str):
paths = glob.glob(path)
def _natural_key(name: str):
return [int(part) if part.isdigit() else part for part in re.split(r"(\d+)", name)]
return sorted(
paths,
key=lambda p: _natural_key(os.path.splitext(os.path.basename(p))[0]),
)
def export_ply_with_color(point_cloud: np.ndarray, colors: np.ndarray, file_path: str) -> None:
assert point_cloud.shape[0] == colors.shape[0]
vertices = np.zeros(
point_cloud.shape[0],
dtype=[("x", "f4"), ("y", "f4"), ("z", "f4"), ("red", "u1"), ("green", "u1"), ("blue", "u1")],
)
vertices["x"] = point_cloud[:, 0].astype(np.float32)
vertices["y"] = point_cloud[:, 1].astype(np.float32)
vertices["z"] = point_cloud[:, 2].astype(np.float32)
vertices["red"] = colors[:, 0].astype(np.uint8)
vertices["green"] = colors[:, 1].astype(np.uint8)
vertices["blue"] = colors[:, 2].astype(np.uint8)
el = PlyElement.describe(vertices, "vertex")
os.makedirs(os.path.dirname(file_path) or ".", exist_ok=True)
PlyData([el]).write(file_path)
def load_data(path: str, target_size: tuple[int, int] = (320, 180)):
# load camera
cam_params = kiui.read_json(path)
width, height = int(cam_params['renderProductResolution'][0]), int(cam_params['renderProductResolution'][1])
proj = np.array(cam_params["cameraProjection"]).reshape(4, 4).T # [4, 4], perspective projection (T is for col-major to row-major)
w2c = np.array(cam_params["cameraViewTransform"]).reshape(4, 4).T # [4, 4], world2cam
# get intrinsics from projection matrix
fx = proj[0, 0] * (width / 2.0)
fy = proj[1, 1] * (height / 2.0)
cx = (1.0 - proj[0, 2]) * (width / 2.0)
cy = (1.0 - proj[1, 2]) * (height / 2.0)
c2w = np.linalg.inv(w2c)
# load depth
path_depth = path.replace("camera_params", "distance_to_camera").replace(".json", ".exr")
depth = kiui.read_image(path_depth, mode="float")
depth = depth.astype(np.float32) # [H, W]
# downsample depth and intrinsics for visualization
d_width, d_height = int(target_size[0]), int(target_size[1])
d_depth = cv2.resize(depth, (d_width, d_height), interpolation=cv2.INTER_NEAREST)
scale_x = float(d_width) / float(width)
scale_y = float(d_height) / float(height)
d_fx = fx * scale_x
d_fy = fy * scale_y
d_cx = cx * scale_x
d_cy = cy * scale_y
# create world space point cloud
### IMPORTANT: toggle camera convention
# here the camera is OpenGL convention actually
# either toggle the c2w convention
# c2w[:, 1] *= -1
# c2w[:, 2] *= -1
# or toggle the rays construction (but don't do both!).
OPENGL_RAYS = True
x, y = np.meshgrid(np.arange(d_width), np.arange(d_height), indexing="xy")
x = x.reshape(-1)
y = y.reshape(-1)
rays_d = np.stack([
(x - d_cx + 0.5) / d_fx,
(y - d_cy + 0.5) / d_fy * (-1.0 if OPENGL_RAYS else 1.0),
np.ones_like(x) * (-1.0 if OPENGL_RAYS else 1.0),
], axis=-1) # [hw, 3]
rays_d = rays_d @ c2w[:3, :3].T # [hw, 3]
rays_o = np.expand_dims(c2w[:3, 3], 0).repeat(rays_d.shape[0], 0) # [hw, 3]
# since depth is to camera center, we need to normalize the rays_d
rays_d = rays_d / np.linalg.norm(rays_d, axis=-1, keepdims=True)
rays_o = rays_o.reshape(d_height, d_width, 3)
rays_d = rays_d.reshape(d_height, d_width, 3)
valid_mask = np.isfinite(d_depth) & (d_depth > 0)
points = rays_d[valid_mask] * d_depth[valid_mask][:, None] + rays_o[valid_mask] # [N, 3] in world space
# load rgb for color visualization
path_rgb = path.replace("camera_params", "ldr_color").replace(".json", ".png")
rgb = kiui.read_image(path_rgb, mode="float") # [H, W, 3]
d_rgb = cv2.resize(rgb, (d_width, d_height), interpolation=cv2.INTER_AREA)
d_rgb = d_rgb[valid_mask] # [N, 3]
return c2w, points, d_rgb
def c2w_to_pointcloud(c2ws: list[np.ndarray], point_per_axis: int = 5, axis_length: float = 1.0):
# visualize the camera poses as point clouds, c2w is [4, 4]
# debug: add the identity pose to visualize
c2ws.append(np.array([
[1, 0, 0, 0],
[0, 1, 0, 0],
[0, 0, 1, 0],
]))
axis_points = []
axis_colors = []
# Right, Up, Forward unit directions are columns of rotation in c2w
for c2w in c2ws:
R = c2w[:3, :3]
t = c2w[:3, 3]
dirs = [
(R[:, 0], np.array([1, 0, 0], dtype=np.float32)), # Right (X) - Red
(R[:, 1], np.array([0, 1, 0], dtype=np.float32)), # Up (Y) - Green
(R[:, 2], np.array([0, 0, 1], dtype=np.float32)), # Forward (Z) - Blue
]
for d, col in dirs:
for k in range(point_per_axis):
p = t + d * (axis_length * (k / (point_per_axis - 1)))
axis_points.append(p)
axis_colors.append(col)
axis_points_np = np.stack(axis_points, axis=0).astype(np.float32)
axis_colors_np = np.stack(axis_colors, axis=0).astype(np.float32)
return axis_points_np, axis_colors_np
camera_paths = glob_sorted("camera_params/*.json")
all_c2ws = []
all_points = []
all_colors = []
for path in tqdm.tqdm(camera_paths):
c2w, points, rgb = load_data(path, target_size=(320, 180))
all_c2ws.append(c2w)
all_points.append(points)
all_colors.append(rgb)
# visualize the camera poses as point clouds
axis_points, axis_colors = c2w_to_pointcloud(all_c2ws, point_per_axis=5, axis_length=1.0)
all_points.append(axis_points)
all_colors.append(axis_colors)
# concatenate all the points and colors
all_points = np.concatenate(all_points, axis=0)
all_colors = np.concatenate(all_colors, axis=0)
all_colors = np.clip(all_colors * 255.0, 0.0, 255.0).astype(np.uint8)
# save colored point cloud (e.g., check with meshlab)
export_ply_with_color(all_points, all_colors, "world_pointcloud.ply")
API¶
Note
the camera API is designed to be numpy based and un-batched!
- kiui.cam.to_homo(mat: ndarray, eye: bool = True) ndarray[source]¶
Append a homogeneous row to a matrix.
If
matis [3, 4], append[0, 0, 0, 1]to get a [4, 4] matrix wheneye=True, or[0, 0, 0, 0]wheneye=False.If
matis [2, 3], append[0, 0, 1]to get a [3, 3] matrix.
- Parameters:
mat (np.ndarray) – Input matrix of shape [3, 4] or [2, 3].
eye (bool, optional) – Whether to append a homogeneous “eye” row. Defaults to True.
- Returns:
Homogeneous matrix of shape [4, 4] or [3, 3].
- Return type:
np.ndarray
- kiui.cam.intr_to_K(intr)[source]¶
convert intrinsics [fx, fy, cx, cy] to calibration matrix K [3, 3].
- Parameters:
intr (np.ndarray) – camera intrinsics, float [4] as [fx, fy, cx, cy]
- Returns:
calibration matrix K, float [3, 3]
- Return type:
np.ndarray
- kiui.cam.K_to_intr(K)[source]¶
convert calibration matrix K [3, 3] to intrinsics [fx, fy, cx, cy].
- Parameters:
K (np.ndarray) – calibration matrix, float [3, 3]
- Returns:
intrinsics, float [4] as [fx, fy, cx, cy]
- Return type:
np.ndarray
- kiui.cam.fov_to_intr(fov, width, height, is_degree=True)[source]¶
convert field of view [fovx, fovy] to intrinsics [fx, fy, cx, cy].
- Parameters:
fov (np.ndarray) – field of view, float [2] as [fovx, fovy]
width (int) – image width
height (int) – image height
is_degree (bool, optional) – whether fov is in degree. Defaults to True.
- Returns:
intrinsics, float [4] as [fx, fy, cx, cy]
- Return type:
np.ndarray
- kiui.cam.intr_to_fov(intr, width, height, is_degree=True)[source]¶
convert intrinsics [fx, fy, cx, cy] to field of view [fovx, fovy].
- Parameters:
intr (np.ndarray) – intrinsics, float [4] as [fx, fy, cx, cy]
width (int) – image width
height (int) – image height
is_degree (bool, optional) – whether to return fov in degree. Defaults to True.
- Returns:
field of view, float [2] as [fovx, fovy]
- Return type:
np.ndarray
- kiui.cam.fov_to_K(fov, width, height, is_degree=True)[source]¶
convert field of view [fovx, fovy] to calibration matrix K [3, 3].
- kiui.cam.K_to_fov(K, width, height, is_degree=True)[source]¶
convert calibration matrix K [3, 3] to field of view [fovx, fovy].
- kiui.cam.perspective_to_intr(proj, width, height)[source]¶
convert a perspective projection matrix to intrinsics [fx, fy, cx, cy].
- Parameters:
proj (np.ndarray) – perspective projection matrix, float [4, 4]
width (int) – image width
height (int) – image height
- Returns:
intrinsics, float [4] as [fx, fy, cx, cy]
- Return type:
np.ndarray
- kiui.cam.get_rays(pose, h, w, fovy, opengl=True, normalize_dir=True)[source]¶
construct rays origin and direction from a camera pose.
- Parameters:
pose (np.ndarray) – camera pose, float [4, 4]
h (int) – image height
w (int) – image width
fovy (float) – field of view in degree along y-axis.
opengl (bool, optional) – whether to use the OpenGL camera convention. Defaults to True.
normalize_dir (bool, optional) – whether to normalize the ray directions. Defaults to True.
- Returns:
rays_o and rays_d, both are float [h, w, 3]
- Return type:
Tuple[np.ndarray]
- kiui.cam.pose_to_extr(pose)[source]¶
convert camera pose (cam2world) to extrinsics (world2cam).
- Parameters:
pose (np.ndarray) – camera pose matrix (cam2world), float [4, 4]
- Returns:
camera extrinsics matrix (world2cam), float [4, 4]
- Return type:
np.ndarray
- kiui.cam.extr_to_pose(extr)[source]¶
convert camera extrinsics (world2cam) to pose (cam2world).
- Parameters:
extr (np.ndarray) – camera extrinsics matrix (world2cam), float [4, 4]
- Returns:
camera pose matrix (cam2world), float [4, 4]
- Return type:
np.ndarray
- kiui.cam.se3_to_matrix(se3)[source]¶
convert [tx, ty, tz, qx, qy, qz, qw] to a [4, 4] R|t camera matrix (it can be c2w or w2c!).
- Parameters:
se3 (np.ndarray) – camera parameters, float [7] as [tx, ty, tz, qx, qy, qz, qw]
- Returns:
c2w/w2c matrix, float [4, 4]
- Return type:
np.ndarray
- kiui.cam.matrix_to_se3(matrix)[source]¶
convert a [4, 4] R|t camera matrix to [tx, ty, tz, qx, qy, qz, qw].
- Parameters:
matrix (np.ndarray) – c2w/w2c matrix, float [4, 4]
- Returns:
camera parameters, float [7] as [tx, ty, tz, qx, qy, qz, qw]
- Return type:
np.ndarray
- kiui.cam.look_at(campos, target, opengl=True)[source]¶
construct pose rotation matrix by look-at.
- Parameters:
campos (np.ndarray) – camera position, float [3]
target (np.ndarray) – look at target, float [3]
opengl (bool, optional) – whether use opengl camera convention (forward direction is target –> camera). Defaults to True.
- Returns:
the camera pose rotation matrix, float [3, 3], normalized.
- Return type:
np.ndarray
- kiui.cam.orbit_camera(elevation, azimuth, radius=1, is_degree=True, target=None, opengl=True)[source]¶
construct a camera pose matrix orbiting a target with elevation & azimuth angle.
- Parameters:
elevation (float) – elevation in (-90, 90), from +y to -y is (-90, 90)
azimuth (float) – azimuth in (-180, 180), from +z to +x is (0, 90)
radius (float, optional) – camera radius. Defaults to 1.
is_degree (bool, optional) – if the angles are in degree. Defaults to True.
target (np.ndarray, optional) – look at target position. Defaults to None.
opengl (bool, optional) – whether to use OpenGL camera convention. Defaults to True.
- Returns:
the camera pose matrix, float [4, 4]
- Return type:
np.ndarray
- kiui.cam.undo_orbit_camera(T, is_degree=True)[source]¶
undo an orbital camera pose matrix to elevation & azimuth
- Parameters:
T (np.ndarray) – camera pose matrix, float [4, 4], must be an orbital camera targeting at (0, 0, 0)!
is_degree (bool, optional) – whether to return angles in degree. Defaults to True.
- Returns:
elevation, azimuth, and radius.
- Return type:
Tuple[float]
- kiui.cam.get_perspective(fovy, aspect=1, near=0.01, far=1000)[source]¶
construct a perspective matrix from fovy.
- Parameters:
fovy (float) – field of view in degree along y-axis.
aspect (int, optional) – aspect ratio. Defaults to 1.
near (float, optional) – near clip plane. Defaults to 0.01.
far (int, optional) – far clip plane. Defaults to 1000.
- Returns:
perspective matrix, float [4, 4]
- Return type:
np.ndarray
- class kiui.cam.OrbitCamera(W, H, r=2, fovy=60, near=0.01, far=100)[source]¶
An orbital camera class.
- __init__(W, H, r=2, fovy=60, near=0.01, far=100)[source]¶
init function
- Parameters:
W (int) – image width
H (int) – image height
r (float, optional) – camera radius. Defaults to 2.
fovy (float, optional) – camera field of view in degree along y-axis. Defaults to 60.
near (float, optional) – near clip plane. Defaults to 0.01.
far (float, optional) – far clip plane. Defaults to 100.
- property fovx¶
get the field of view in radians along x-axis
- Returns:
field of view in radians along x-axis
- Return type:
float
- property campos¶
get the camera position
- Returns:
camera position, float [3]
- Return type:
np.ndarray
- property pose¶
get the camera pose matrix (cam2world)
- Returns:
camera pose, float [4, 4]
- Return type:
np.ndarray
- property view¶
get the camera view matrix (world2cam, inverse of cam2world)
- Returns:
camera view, float [4, 4]
- Return type:
np.ndarray
- property perspective¶
get the perspective matrix
- Returns:
camera perspective, float [4, 4]
- Return type:
np.ndarray
- property intrinsics¶
get the camera intrinsics
- Returns:
intrinsics (fx, fy, cx, cy), float [4]
- Return type:
np.ndarray
- property mvp¶
get the MVP (model-view-perspective) matrix.
- Returns:
camera MVP, float [4, 4]
- Return type:
np.ndarray
- orbit(dx, dy)[source]¶
rotate along camera up/side axis!
- Parameters:
dx (float) – delta step along x (up).
dy (float) – delta step along y (side).
- pan(dx, dy, dz=0)[source]¶
pan the camera.
- Parameters:
dx (float) – delta step along x.
dy (float) – delta step along y.
dz (float, optional) – delta step along x. Defaults to 0.
- from_angle(elevation, azimuth, is_degree=True)[source]¶
set the camera pose from elevation & azimuth angle.
- Parameters:
elevation (float) – elevation in (-90, 90), from +y to -y is (-90, 90)
azimuth (float) – azimuth in (-180, 180), from +z to +x is (0, 90)
is_degree (bool, optional) – whether the angles are in degree. Defaults to True.