Camtools: camera tools for computer vision.
Project description
CamTools
Camtools: camera tools for computer vision. Useful for plotting, converting, projecting, and ray casting with camera parameters.
Installation
# Option 1: install from pip.
pip install camtools
# Option 2: install from git.
pip install git+https://github.com/yxlao/camtools.git
# Option 3: install from source.
git clone https://github.com/yxlao/camtools.git
cd camtools
pip install -e . # Dev mode, if you want to modify camtools.
pip install . # Install mode, if you want to use camtools only.
What can you do with CamTools?
-
Plot cameras. Useful for debugging 3D reconstruction and NeRFs!
import camtools as ct import open3d as o3d cameras = ct.camera.create_camera_ray_frames(Ks, Ts) o3d.visualization.draw_geometries([cameras])
-
Convert camera parameters.
pose = ct.convert.T_to_pose(T) # Convert T to pose R, t = ct.convert.T_to_R_t(T) # Convert T to R and t C = ct.convert.pose_to_C(pose) # Convert pose to camera center K, T = ct.convert.P_to_K_T(P) # Decompose projection matrix to K and T # And more...
-
Projection and ray casting.
# Project 3D points to pixels. pixels = ct.project.points_to_pixel(points, K, T) # Back-project depth image ot 3D points. points = ct.project.im_depth_to_points(depth, K, T) # Ray cast a triangle mesh to depth image. im_depth = ct.raycast.mesh_to_depths(mesh, Ks, Ts, height, width) # And more...
-
Image I/O and depth I/O with no surprises.
ct.io.imread() ct.io.imwrite() ct.io.imread_detph() ct.io.imwrite_depth()
Strict type checks and range checks are enforced. These APIs are specifically designed to solve the following pain points:
- Is my image
float32
oruint8
? - Does it has range
[0, 1]
or[0, 255]
? - Is it RGB or BGR?
- Do my image have alpha channel?
- When saving depth image as integer-based
.png
, is it correctly scaled?
- Is my image
-
Useful command-line tools (run in terminal).
# Crop image boarders. ct crop-boarders *.png --pad_pixel 10 --skip_cropped --same_crop # Draw synchronized bounding boxes interactively. ct draw-bboxes path/to/a.png path/to/b.png # For more help. ct --help
-
And more.
- Solve line intersections.
- COLMAP tools.
- Points normalization.
- ...
Camera conventions
We follow the standard pinhole camera model:
- Camera coordinate: right-handed, with $Z$ pointing away from the camera towards the view direction and $Y$ axis pointing down. Note that this is different from the Blender convention, where $Z$ points towards the opposite view direction and the $Y$ axis points up.
- Image coordinate: starts from the top-left corner of the image, with $x$
pointing right (corresponding to the image width) and $y$ pointing down
(corresponding to the image height). This is also consistent with OpenCV, but
pay attention that the 0-th dimension in the image array is the height (i.e.,
$y$) and the 1-th dimension is the width (i.e., $x$). That is:
- $x$ <=> width <=> column <=> the 1-th dimension
- $y$ <=> height <=> row <=> the 0-th dimension
K
:(3, 3)
camera intrinsic matrix.K = [[fx, s, cx], [ 0, fy, cy], [ 0, 0, 1]]
T
orW2C
:(4, 4)
camera extrinsic matrix.T = [[R | t = [[R_01, R_02, R_03, t_0], 0 | 1]] [R_11, R_12, R_13, t_1], [R_21, R_22, R_23, t_2], [ 0, 0, 0, 1]]
T
is also known as the world-to-cameraW2C
matrix, which transforms a point in the world coordinate to the camera coordinate.T
's shape is(4, 4)
, not(3, 4)
.T
must be invertible, wherenp.linalg.inv(T) = pose
.- The camera center
C
in world coordinate is projected to[0, 0, 0, 1]
in camera coordinate, i.e.,T @ C = np.array([0, 0, 0, 1]).T
R
:(3, 3)
rotation matrix.R = T[:3, :3]
R
is a rotation matrix. It is an orthogonal matrix with determinant 1, as rotations preserve volume and orientation.R.T == np.linalg.inv(R)
np.linalg.norm(R @ x) == np.linalg.norm(x)
, wherex
is a(3, )
vector.
t
:(3,)
translation vector.t = T[:3, 3]
t
's shape is(3,)
, not(3, 1)
.
pose
orC2W
:(4, 4)
camera pose matrix. It is the inverse ofT
.pose = T.inv()
pose
is also known as the camera-to-worldC2W
matrix, which transforms a point in the camera coordinate to the world coordinate.pose
is the inverse ofT
, i.e.,pose == np.linalg.inv(T)
.
C
: camera center.C = pose[:3, 3]
C
's shape is(3,)
, not(3, 1)
.C
is the camera center in world coordinate. It is also the translation vector ofpose
.
P
:(3, 4)
the camera projection matrix.P
is the world-to-pixel projection matrix, which projects a point in the homogeneous world coordinate to the homogeneous pixel coordinate.P
is the product of the intrinsic and extrinsic parameters.# P = K @ [R | t] P = K @ np.hstack([R, t[:, None]])
P
's shape is(3, 4)
, not(4, 4)
.- It is possible to decompose
P
into intrinsic and extrinsic matrices by QR decomposition. - Don't confuse
P
withpose
.
- For more details, please refer to the following blog posts: part 1, part 2, and part 3.
Future works
- Refined APIs.
- Full PyTorch/Numpy compatibility.
- Unit tests.
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
camtools-0.1.0.tar.gz
(36.4 kB
view hashes)
Built Distribution
camtools-0.1.0-py3-none-any.whl
(37.1 kB
view hashes)