Body Tracking Overview

Body tracking module focuses on person’s bones detection and tracking. A detected bone is represented by its two end points also called keypoints. The ZED camera is able to provide 2D and 3D information of each detected keypoints. Furthermore, it produces local rotation between neighbor bones.

How It Works

The overall process is very similar to the ZED SDK Object detection module. They share some information in outputs like the 3D position and 3D velocity of each person. Body tracking module also uses neural network for keypoints detection and then calls depth and positional tracking of the ZED SDK module to get the final 3D position of each keypoint. The ZED SDK supports two Body formats :

BODY_FORMAT::POSE_18 are organized as follow :

Each keypoint is indexed by an integer ranging from 0 to 17 :

keypoint index	keypoint name
0	NOSE
1	NECK
2	RIGHT_SHOULDER
3	RIGHT_ELBOW
4	RIGHT_WRIST
5	LEFT_SHOULDER
6	LEFT_ELBOW
7	LEFT_WRIST
8	RIGHT_HIP
9	RIGHT_KNEE
10	RIGHT_ANKLE
11	LEFT_HIP
12	LEFT_KNEE
13	LEFT_ANKLE
14	RIGHT_EYE
15	LEFT_EYE
16	RIGHT_EAR
17	LEFT_EAR

BODY_FORMAT::POSE_34 are organized as follow :

Each keypoint is indexed by integer from 0 to 33 :

keypoint index	keypoint name
0	PELVIS
1	NAVAL_SPINE
2	CHEST_SPINE
3	NECK
4	LEFT_CLAVICLE
5	LEFT_SHOULDER
6	LEFT_ELBOW
7	LEFT_WRIST
8	LEFT_HAND
9	LEFT_HANDTIP
10	LEFT_THUMB
11	RIGHT_CLAVICLE
12	RIGHT_SHOULDER
13	RIGHT_ELBOW
14	RIGHT_WRIST
15	RIGHT_HAND
16	RIGHT_HANDTIP
17	RIGHT_THUMB
18	LEFT_HIP
19	LEFT_KNEE
20	LEFT_ANKLE
21	LEFT_FOOT
22	RIGHT_HIP
23	RIGHT_KNEE
24	RIGHT_ANKLE
25	RIGHT_FOOT
26	HEAD
27	NOSE
28	LEFT_EYE
29	LEFT_EAR
30	RIGHT_EYE
31	RIGHT_EAR
32	LEFT_HEEL
33	RIGHT_HEEL

The ZED SDK is able to output 3 levels of information : raw 2D/3D body detection, 3D body tracking and 3D body fitting.

2D/3D Body detection

The ZED SDK first uses the ZED camera image to infer all 2D bones and keypoints using neural networks. Then the SDK depth module and positional tracking module are used together to extract the correct 3D position of each bones and keypoints.

3D body tracking

If tracking is enabled, the ZED SDK will assign an identity to each detected body over time. At the same time, by filtering the raw body detection, it will output a more stable 3d body estimation.

3D body fitting

Moreover, user can enable fitting to unlock even more information about each identity. The fitting process takes the history of each tracked person to deduce all missing keypoints thanks to the human kinematic’s constraint used by the body tracking module. It is also able to extract local rotation between a pair of neighbor bones by solving the inverse kinematic problem. These data will be compatible with some known software for avataring for example. Here is an example where BODY_FORMAT::POSE_34 were used to animate an avatar in Unreal.

Detection Outputs

Each detected person is stored as a structure in the ZED SDK and extends the same structure as in Object detection. See the Object Detection Outputs for Body tracking and Object detection shared attributes. The following new attributes are only filled by body tracking module.

Object Data	Description	Output
2D keypoint	A set of useful points representing the human body, expressed in 2D.	a vector of `[x,y]`
keypoint	A set of useful points representing the human body, expressed in 3D.	a vector of `[x, y, z]`
2D head bounding box	bounds the head with four 2D points.	`Four pixel coordinates`
3D head bounding box	bounds the head with eight 3D points.	`Eight 3D coordinates`
head position	3D head centroid	`[x, y, z]`
keypoint confidence	Per keypoint detection confidence	`a vector of float`
local position per joint	stores local position of each keypoint	`a vector of [x,y,z]`
local orientation per joint	stores local rotation of each keypoint	`a vector of [x,y,z,w]`
global root orientation	stores the global root orientation of the Body	`[x,y,z,w]`

For more information on Body Tracking, see the Using the API page.