Camera

World coordinate systems

OpenGL/MAYA       OpenCV/Colmap     Blender      Unity/DirectX     Unreal
Right-handed      Right-handed    Right-handed    Left-handed    Left-handed

     +y                +z           +z  +y         +y  +z          +z
     |                /             |  /           |  /             |
     |               /              | /            | /              |
     |______+x      /______+x       |/_____+x      |/_____+x        |______+x
    /               |                                              /
   /                |                                             /
  /                 |                                            /
 +z                 +y                                          +y

A common color code: x = red, y = green, z = blue (XYZ=RGB)

Left/right-handed notation: roll your left/right palm from x to y, and your thumb should point to z.

A common color code: right = red., up = green, forward = blue (XYZ=RUF=RGB).

Our camera convention

  • world coordinate is OpenGL/right-handed, +x = right, +y = up, +z = forward

  • camera coordinate is OpenGL (forward is target --> campos).

  • elevation in (-90, 90), from +y (-90) --> -y (+90)

  • azimuth in (-180, 180), from +z (0/-360) --> +x (90/-270) --> -z (180/-180) --> -x (270/-90) --> +z (360/0)

Camera pose conventions

We call the camera to world (c2w) transformation matrix strictly as pose. The inversion of it is the world to camera (w2c) transformation matrix is called extrinsics (or the view transform in graphics).

A camera pose matrix is in the form of:

[[Right_x, Up_x, Forward_x, Position_x],
 [Right_y, Up_y, Forward_y, Position_y],
 [Right_z, Up_z, Forward_z, Position_z],
 [0,       0,    0,         1         ]]

The xyz follows corresponding world coordinate system. However, the three directions (right, up, forward) can be defined differently:

  • forward can be camera --> target or target --> camera.

  • up can align with the world-up-axis (y) or world-down-axis (-y).

  • right can also be left, depending on it’s (up cross forward) or (forward cross up).

This leads to two common camera conventions:

   OpenGL                OpenCV
   Blender               Colmap

     up  target          forward & target
     |  /                /
     | /                /
     |/_____right      /______right
    /                  |
   /                   |
  /                    |
forward                up

These two conventions can be converted by negating the forward and up directions:

pose # camera2world, [4, 4]
# convert between opengl and colmap convention
pose[:, 1] *= -1 # up
pose[:, 2] *= -1 # forward

Note that camera convention is NOT dependent on the world coordinate system!

Why there are different camera conventions?

It’s an unfortunate outcome of different 3D world coordinate systems and 2D image coordinate systems. For a 2D image / array / matrix, the convention is usually:

       0      1      2
  +---------------------> +x/j
0 | (0, 0) (0, 1) (0, 2)
1 | (1, 0) (1, 1) (1, 2)
2 | (2, 0) (2, 1) (2, 2)
  v
 +y/i

Now assume we are using OpenGL world coordinate system and also OpenGL camera convention:

    +y & up
    |  target
    | /
    |/______+x & right
   /
  /
 /
+z & forward

The unit camera (Identity rotation matrix) is perfectly aligned with the 3D world coordinate system. However, if we want to construct the camera rays from camera center to the target, the rays should point to -z, and the +y axis is also misaligned with image coordinate system (where +y points down). So we need to negate both y and z axes during the ray construction (check kiui.cam.get_rays):

rays_d = np.stack([
    (x - cx + 0.5) / focal,
    (y - cy + 0.5) / focal * (-1.0),
    np.ones_like(x) * (-1.0),
], axis=-1) # [hw, 3]

On the other hand, if we use OpenGL world coordinate system and also OpenCV camera convention:

    +y & up                            +z & forward & target
    |                                  /
    |                    equals       /
    |______+x & right    ----->      /_______+x & right
   /                                 |
  /                                  |
 /                                   |
+z & forward & target               +y & up

While the unit camera points to a different direction, you can see that it’s actually aligned with the 2D image convention. And we don’t need to negate any axes during the ray construction:

rays_d = np.stack([
    (x - cx + 0.5) / focal,
    (y - cy + 0.5) / focal,
    np.ones_like(x),
], axis=-1) # [hw, 3]

Given a poorly-documented dataset of cameras (and depth maps), how to make sure its conventions?

  1. check if the camera “pose” is pose (camera2world) or extrinsics (world2camera), many datasets are misusing the term “pose” to mean extrinsics.

  2. check if the depth map is the distance to camera center or to the camera plane. If it’s to the camera center, we need to normalize the rays_d to unit length. Otherwise, just use the above raw rays (|z| is always 1).

  3. simply assume a camera convention (e.g., OpenGL) and use it to construct the rays, and project depth to world-space point cloud for visualization.

  4. Check if point cloud of static objects are drifting (they shouldn’t if everything is correct). If they are, you are using the wrong camera convention. Change to the other convention and it should be correct!

This is a complete script:

import os
import re
import cv2
import kiui
import glob
import tqdm
import numpy as np
from plyfile import PlyData, PlyElement

def glob_sorted(path: str):
    paths = glob.glob(path)
    def _natural_key(name: str):
        return [int(part) if part.isdigit() else part for part in re.split(r"(\d+)", name)]
    return sorted(
        paths,
        key=lambda p: _natural_key(os.path.splitext(os.path.basename(p))[0]),
    )

def export_ply_with_color(point_cloud: np.ndarray, colors: np.ndarray, file_path: str) -> None:
    assert point_cloud.shape[0] == colors.shape[0]
    vertices = np.zeros(
        point_cloud.shape[0],
        dtype=[("x", "f4"), ("y", "f4"), ("z", "f4"), ("red", "u1"), ("green", "u1"), ("blue", "u1")],
    )
    vertices["x"] = point_cloud[:, 0].astype(np.float32)
    vertices["y"] = point_cloud[:, 1].astype(np.float32)
    vertices["z"] = point_cloud[:, 2].astype(np.float32)
    vertices["red"] = colors[:, 0].astype(np.uint8)
    vertices["green"] = colors[:, 1].astype(np.uint8)
    vertices["blue"] = colors[:, 2].astype(np.uint8)
    el = PlyElement.describe(vertices, "vertex")
    os.makedirs(os.path.dirname(file_path) or ".", exist_ok=True)
    PlyData([el]).write(file_path)


def load_data(path: str, target_size: tuple[int, int] = (320, 180)):

    # load camera
    cam_params = kiui.read_json(path)
    width, height = int(cam_params['renderProductResolution'][0]), int(cam_params['renderProductResolution'][1])
    proj = np.array(cam_params["cameraProjection"]).reshape(4, 4).T # [4, 4], perspective projection (T is for col-major to row-major)
    w2c = np.array(cam_params["cameraViewTransform"]).reshape(4, 4).T # [4, 4], world2cam

    # get intrinsics from projection matrix
    fx = proj[0, 0] * (width / 2.0)
    fy = proj[1, 1] * (height / 2.0)
    cx = (1.0 - proj[0, 2]) * (width / 2.0)
    cy = (1.0 - proj[1, 2]) * (height / 2.0)

    c2w = np.linalg.inv(w2c)

    # load depth
    path_depth = path.replace("camera_params", "distance_to_camera").replace(".json", ".exr")
    depth = kiui.read_image(path_depth, mode="float")
    depth = depth.astype(np.float32) # [H, W]

    # downsample depth and intrinsics for visualization
    d_width, d_height = int(target_size[0]), int(target_size[1])
    d_depth = cv2.resize(depth, (d_width, d_height), interpolation=cv2.INTER_NEAREST)
    scale_x = float(d_width) / float(width)
    scale_y = float(d_height) / float(height)
    d_fx = fx * scale_x
    d_fy = fy * scale_y
    d_cx = cx * scale_x
    d_cy = cy * scale_y

    # create world space point cloud

    ### IMPORTANT: toggle camera convention
    # here the camera is OpenGL convention actually

    # either toggle the c2w convention
    # c2w[:, 1] *= -1
    # c2w[:, 2] *= -1

    # or toggle the rays construction (but don't do both!).
    OPENGL_RAYS = True

    x, y = np.meshgrid(np.arange(d_width), np.arange(d_height), indexing="xy")
    x = x.reshape(-1)
    y = y.reshape(-1)
    rays_d = np.stack([
        (x - d_cx + 0.5) / d_fx,
        (y - d_cy + 0.5) / d_fy * (-1.0 if OPENGL_RAYS else 1.0),
        np.ones_like(x) * (-1.0 if OPENGL_RAYS else 1.0),
    ], axis=-1) # [hw, 3]

    rays_d = rays_d @ c2w[:3, :3].T  # [hw, 3]
    rays_o = np.expand_dims(c2w[:3, 3], 0).repeat(rays_d.shape[0], 0)  # [hw, 3]

    # since depth is to camera center, we need to normalize the rays_d
    rays_d = rays_d / np.linalg.norm(rays_d, axis=-1, keepdims=True)
    rays_o = rays_o.reshape(d_height, d_width, 3)
    rays_d = rays_d.reshape(d_height, d_width, 3)

    valid_mask = np.isfinite(d_depth) & (d_depth > 0)
    points = rays_d[valid_mask] * d_depth[valid_mask][:, None] + rays_o[valid_mask] # [N, 3] in world space

    # load rgb for color visualization
    path_rgb = path.replace("camera_params", "ldr_color").replace(".json", ".png")
    rgb = kiui.read_image(path_rgb, mode="float") # [H, W, 3]
    d_rgb = cv2.resize(rgb, (d_width, d_height), interpolation=cv2.INTER_AREA)
    d_rgb = d_rgb[valid_mask]  # [N, 3]

    return c2w, points, d_rgb

def c2w_to_pointcloud(c2ws: list[np.ndarray], point_per_axis: int = 5, axis_length: float = 1.0):
    # visualize the camera poses as point clouds, c2w is [4, 4]

    # debug: add the identity pose to visualize
    c2ws.append(np.array([
        [1, 0, 0, 0],
        [0, 1, 0, 0],
        [0, 0, 1, 0],
    ]))

    axis_points = []
    axis_colors = []
    # Right, Up, Forward unit directions are columns of rotation in c2w
    for c2w in c2ws:
        R = c2w[:3, :3]
        t = c2w[:3, 3]
        dirs = [
            (R[:, 0], np.array([1, 0, 0], dtype=np.float32)),   # Right (X) - Red
            (R[:, 1], np.array([0, 1, 0], dtype=np.float32)),   # Up (Y) - Green
            (R[:, 2], np.array([0, 0, 1], dtype=np.float32)),   # Forward (Z) - Blue
        ]
        for d, col in dirs:
            for k in range(point_per_axis):
                p = t + d * (axis_length * (k / (point_per_axis - 1)))
                axis_points.append(p)
                axis_colors.append(col)

    axis_points_np = np.stack(axis_points, axis=0).astype(np.float32)
    axis_colors_np = np.stack(axis_colors, axis=0).astype(np.float32)
    return axis_points_np, axis_colors_np

camera_paths = glob_sorted("camera_params/*.json")

all_c2ws = []
all_points = []
all_colors = []
for path in tqdm.tqdm(camera_paths):
    c2w, points, rgb = load_data(path, target_size=(320, 180))
    all_c2ws.append(c2w)
    all_points.append(points)
    all_colors.append(rgb)

# visualize the camera poses as point clouds
axis_points, axis_colors = c2w_to_pointcloud(all_c2ws, point_per_axis=5, axis_length=1.0)
all_points.append(axis_points)
all_colors.append(axis_colors)

# concatenate all the points and colors
all_points = np.concatenate(all_points, axis=0)
all_colors = np.concatenate(all_colors, axis=0)
all_colors = np.clip(all_colors * 255.0, 0.0, 255.0).astype(np.uint8)

# save colored point cloud (e.g., check with meshlab)
export_ply_with_color(all_points, all_colors, "world_pointcloud.ply")

API

Note

the camera API is designed to be numpy based and un-batched!

kiui.cam.to_homo(mat: ndarray, eye: bool = True) ndarray[source]

Append a homogeneous row to a matrix.

  • If mat is [3, 4], append [0, 0, 0, 1] to get a [4, 4] matrix when eye=True, or [0, 0, 0, 0] when eye=False.

  • If mat is [2, 3], append [0, 0, 1] to get a [3, 3] matrix.

Parameters:
  • mat (np.ndarray) – Input matrix of shape [3, 4] or [2, 3].

  • eye (bool, optional) – Whether to append a homogeneous “eye” row. Defaults to True.

Returns:

Homogeneous matrix of shape [4, 4] or [3, 3].

Return type:

np.ndarray

kiui.cam.intr_to_K(intr)[source]

convert intrinsics [fx, fy, cx, cy] to calibration matrix K [3, 3].

Parameters:

intr (np.ndarray) – camera intrinsics, float [4] as [fx, fy, cx, cy]

Returns:

calibration matrix K, float [3, 3]

Return type:

np.ndarray

kiui.cam.K_to_intr(K)[source]

convert calibration matrix K [3, 3] to intrinsics [fx, fy, cx, cy].

Parameters:

K (np.ndarray) – calibration matrix, float [3, 3]

Returns:

intrinsics, float [4] as [fx, fy, cx, cy]

Return type:

np.ndarray

kiui.cam.fov_to_intr(fov, width, height, is_degree=True)[source]

convert field of view [fovx, fovy] to intrinsics [fx, fy, cx, cy].

Parameters:
  • fov (np.ndarray) – field of view, float [2] as [fovx, fovy]

  • width (int) – image width

  • height (int) – image height

  • is_degree (bool, optional) – whether fov is in degree. Defaults to True.

Returns:

intrinsics, float [4] as [fx, fy, cx, cy]

Return type:

np.ndarray

kiui.cam.intr_to_fov(intr, width, height, is_degree=True)[source]

convert intrinsics [fx, fy, cx, cy] to field of view [fovx, fovy].

Parameters:
  • intr (np.ndarray) – intrinsics, float [4] as [fx, fy, cx, cy]

  • width (int) – image width

  • height (int) – image height

  • is_degree (bool, optional) – whether to return fov in degree. Defaults to True.

Returns:

field of view, float [2] as [fovx, fovy]

Return type:

np.ndarray

kiui.cam.fov_to_K(fov, width, height, is_degree=True)[source]

convert field of view [fovx, fovy] to calibration matrix K [3, 3].

kiui.cam.K_to_fov(K, width, height, is_degree=True)[source]

convert calibration matrix K [3, 3] to field of view [fovx, fovy].

kiui.cam.perspective_to_intr(proj, width, height)[source]

convert a perspective projection matrix to intrinsics [fx, fy, cx, cy].

Parameters:
  • proj (np.ndarray) – perspective projection matrix, float [4, 4]

  • width (int) – image width

  • height (int) – image height

Returns:

intrinsics, float [4] as [fx, fy, cx, cy]

Return type:

np.ndarray

kiui.cam.get_rays(pose, h, w, fovy, opengl=True, normalize_dir=True)[source]

construct rays origin and direction from a camera pose.

Parameters:
  • pose (np.ndarray) – camera pose, float [4, 4]

  • h (int) – image height

  • w (int) – image width

  • fovy (float) – field of view in degree along y-axis.

  • opengl (bool, optional) – whether to use the OpenGL camera convention. Defaults to True.

  • normalize_dir (bool, optional) – whether to normalize the ray directions. Defaults to True.

Returns:

rays_o and rays_d, both are float [h, w, 3]

Return type:

Tuple[np.ndarray]

kiui.cam.pose_to_extr(pose)[source]

convert camera pose (cam2world) to extrinsics (world2cam).

Parameters:

pose (np.ndarray) – camera pose matrix (cam2world), float [4, 4]

Returns:

camera extrinsics matrix (world2cam), float [4, 4]

Return type:

np.ndarray

kiui.cam.extr_to_pose(extr)[source]

convert camera extrinsics (world2cam) to pose (cam2world).

Parameters:

extr (np.ndarray) – camera extrinsics matrix (world2cam), float [4, 4]

Returns:

camera pose matrix (cam2world), float [4, 4]

Return type:

np.ndarray

kiui.cam.se3_to_matrix(se3)[source]

convert [tx, ty, tz, qx, qy, qz, qw] to a [4, 4] R|t camera matrix (it can be c2w or w2c!).

Parameters:

se3 (np.ndarray) – camera parameters, float [7] as [tx, ty, tz, qx, qy, qz, qw]

Returns:

c2w/w2c matrix, float [4, 4]

Return type:

np.ndarray

kiui.cam.matrix_to_se3(matrix)[source]

convert a [4, 4] R|t camera matrix to [tx, ty, tz, qx, qy, qz, qw].

Parameters:

matrix (np.ndarray) – c2w/w2c matrix, float [4, 4]

Returns:

camera parameters, float [7] as [tx, ty, tz, qx, qy, qz, qw]

Return type:

np.ndarray

kiui.cam.look_at(campos, target, opengl=True)[source]

construct pose rotation matrix by look-at.

Parameters:
  • campos (np.ndarray) – camera position, float [3]

  • target (np.ndarray) – look at target, float [3]

  • opengl (bool, optional) – whether use opengl camera convention (forward direction is target –> camera). Defaults to True.

Returns:

the camera pose rotation matrix, float [3, 3], normalized.

Return type:

np.ndarray

kiui.cam.orbit_camera(elevation, azimuth, radius=1, is_degree=True, target=None, opengl=True)[source]

construct a camera pose matrix orbiting a target with elevation & azimuth angle.

Parameters:
  • elevation (float) – elevation in (-90, 90), from +y to -y is (-90, 90)

  • azimuth (float) – azimuth in (-180, 180), from +z to +x is (0, 90)

  • radius (float, optional) – camera radius. Defaults to 1.

  • is_degree (bool, optional) – if the angles are in degree. Defaults to True.

  • target (np.ndarray, optional) – look at target position. Defaults to None.

  • opengl (bool, optional) – whether to use OpenGL camera convention. Defaults to True.

Returns:

the camera pose matrix, float [4, 4]

Return type:

np.ndarray

kiui.cam.undo_orbit_camera(T, is_degree=True)[source]

undo an orbital camera pose matrix to elevation & azimuth

Parameters:
  • T (np.ndarray) – camera pose matrix, float [4, 4], must be an orbital camera targeting at (0, 0, 0)!

  • is_degree (bool, optional) – whether to return angles in degree. Defaults to True.

Returns:

elevation, azimuth, and radius.

Return type:

Tuple[float]

kiui.cam.get_perspective(fovy, aspect=1, near=0.01, far=1000)[source]

construct a perspective matrix from fovy.

Parameters:
  • fovy (float) – field of view in degree along y-axis.

  • aspect (int, optional) – aspect ratio. Defaults to 1.

  • near (float, optional) – near clip plane. Defaults to 0.01.

  • far (int, optional) – far clip plane. Defaults to 1000.

Returns:

perspective matrix, float [4, 4]

Return type:

np.ndarray

class kiui.cam.OrbitCamera(W, H, r=2, fovy=60, near=0.01, far=100)[source]

An orbital camera class.

__init__(W, H, r=2, fovy=60, near=0.01, far=100)[source]

init function

Parameters:
  • W (int) – image width

  • H (int) – image height

  • r (float, optional) – camera radius. Defaults to 2.

  • fovy (float, optional) – camera field of view in degree along y-axis. Defaults to 60.

  • near (float, optional) – near clip plane. Defaults to 0.01.

  • far (float, optional) – far clip plane. Defaults to 100.

property fovx

get the field of view in radians along x-axis

Returns:

field of view in radians along x-axis

Return type:

float

property campos

get the camera position

Returns:

camera position, float [3]

Return type:

np.ndarray

property pose

get the camera pose matrix (cam2world)

Returns:

camera pose, float [4, 4]

Return type:

np.ndarray

property view

get the camera view matrix (world2cam, inverse of cam2world)

Returns:

camera view, float [4, 4]

Return type:

np.ndarray

property perspective

get the perspective matrix

Returns:

camera perspective, float [4, 4]

Return type:

np.ndarray

property intrinsics

get the camera intrinsics

Returns:

intrinsics (fx, fy, cx, cy), float [4]

Return type:

np.ndarray

property mvp

get the MVP (model-view-perspective) matrix.

Returns:

camera MVP, float [4, 4]

Return type:

np.ndarray

orbit(dx, dy)[source]

rotate along camera up/side axis!

Parameters:
  • dx (float) – delta step along x (up).

  • dy (float) – delta step along y (side).

scale(delta)[source]

scale the camera.

Parameters:

delta (float) – delta step.

pan(dx, dy, dz=0)[source]

pan the camera.

Parameters:
  • dx (float) – delta step along x.

  • dy (float) – delta step along y.

  • dz (float, optional) – delta step along x. Defaults to 0.

from_angle(elevation, azimuth, is_degree=True)[source]

set the camera pose from elevation & azimuth angle.

Parameters:
  • elevation (float) – elevation in (-90, 90), from +y to -y is (-90, 90)

  • azimuth (float) – azimuth in (-180, 180), from +z to +x is (0, 90)

  • is_degree (bool, optional) – whether the angles are in degree. Defaults to True.