Using the Object Detection API

Object Detection Configuration

To configure object detection, use ObjectDetectionParameters at initialization and ObjectDetectionRuntimeParameters to change specific parameters during use.

C++
Python
C#

// Set initialization parameters
ObjectDetectionParameters detection_parameters;
detection_parameters.enable_tracking = true; // Objects will keep the same ID between frames
detection_parameters.enable_mask_output = true; // Outputs 2D masks over detected objects

// Set runtime parameters
ObjectDetectionRuntimeParameters detection_parameters_rt;
detection_parameters_rt.detection_confidence_threshold = 25;

# Set initialization parameters
detection_parameters = sl.ObjectDetectionParameters()
detection_parameters.enable_tracking = true

# Set runtime parameters
detection_parameters_rt = sl.ObjectDetectionRuntimeParameters()
detection_parameters_rt.detection_confidence_threshold = 25

// Set initialization parameters
ObjectDetectionParameters detection_parameters = new ObjectDetectionParameters();
detection_parameters.enableObjectTracking = true; // Objects will keep the same ID between frames

// Set runtime parameters
ObjectDetectionRuntimeParameters detection_parameters_rt = new ObjectDetectionRuntimeParameters();
detection_parameters_rt.detectionConfidenceThreshold = 35;

Various Object Box detection models are available in ZED SDK :

the general purpose object detection including DETECTION_MODEL::MULTI_CLASS_BOX, DETECTION_MODEL::MULTI_CLASS_BOX_MEDIUM and DETECTION_MODEL::MULTI_CLASS_BOX_ACCURATE. You can choose one of them depending on desired performance/accuracy. These models are able to detect multiple objects classes OBJECT_CLASS.
the head detection DETECTION_MODEL::PERSON_HEAD_BOX. It is specialized on person head detection and tracking. It may be beneficial for application in crowded scene where persons in background are merely detected by the general purpose person detection model. We have separated this model from the general purpose object detection model and have brought some special optimization and improvements to increase detection and tracking accuracies. It only detects a single class OBJECT_CLASS::PERSON with subclass OBJECT_SUBCLASS::PERSON_HEAD.

You can use detection_parameters.detection_model to set the detection model :

C++
Python
C#

// choose a detection model
detection_parameters.detection_model = DETECTION_MODEL::MULTI_CLASS_BOX;

# choose a detection model
detection_parameters.detection_model = sl.DETECTION_MODEL.MULTI_CLASS_BOX

// choose a detection model
detection_parameters.detectionModel = sl.DETECTION_MODEL.MULTI_CLASS_BOX;

If you want to track objects' motion within their environment, you will first need to activate the positional tracking module. Then, set detection_parameters.enable_tracking to true.

C++
Python
C#

if (detection_parameters.enable_tracking) {
    // Set positional tracking parameters
    PositionalTrackingParameters positional_tracking_parameters;
    // Enable positional tracking
    zed.enablePositionalTracking(positional_tracking_parameters);
}

if detection_parameters.enable_tracking :
    # Set positional tracking parameters
    positional_tracking_parameters = sl.PositionalTrackingParameters()
    # Enable positional tracking
    zed.enable_positional_tracking(positional_tracking_parameters)

if (detection_parameters.enableObjectTracking ) {
  // Set positional tracking parameters
  PositionalTrackingParameters trackingParams = new PositionalTrackingParameters();
  // Enable positional tracking
  zed.EnablePositionalTracking(ref trackingParams);
  }

With these parameters configured, you can enable the object detection module:

C++
Python
C#

// Enable object detection with initialization parameters
zed_error = zed.enableObjectDetection(detection_parameters);
if (zed_error != ERROR_CODE::SUCCESS) {
    cout << "enableObjectDetection: " << zed_error << "\nExit program.";
    zed.close();
    exit(-1);
}

# Enable object detection with initialization parameters
zed_error = zed.enable_object_detection(detection_parameters)
if zed_error != sl.ERROR_CODE.SUCCESS :
    print("enable_object_detection", zed_error, "\nExit program.")
    zed.close()
    exit(-1)

// Enable object detection with initialization parameters
zed_error = zedCamera.EnableObjectDetection(ref detection_parameters);
if (zed_error != ERROR_CODE.SUCCESS) {
    Console.WriteLine("enableObjectDetection: " + zed_error + "\nExit program.";
    zed.Close();
    Environment.Exit(-1);
}

Note: Object Detection has been optimized for ZED2/ZED2i and uses the camera motion sensors for improved reliability. Therefore the Object Detection module requires a ZED2/ZED2i or ZED-Mini, and sensors cannot be disabled when using the module.

Getting Object Data

To get the detected objects in a scene, get an new image with grab(...) and extract the detected objects with retrieveObjects(). The objects' 2D positions are relative to the left image, while the 3D positions are either in the CAMERA or WORLD reference frame depending on RuntimeParameters.measure3D_reference_frame (given to the grab() function).

C++
Python
C#

sl::Objects objects; // Structure containing all the detected objects
if (zed.grab() == ERROR_CODE::SUCCESS) {
  zed.retrieveObjects(objects, detection_parameters_rt); // Retrieve the detected objects
}

objects = sl.Objects() # Structure containing all the detected objects
if zed.grab() == sl.ERROR_CODE.SUCCESS :
  zed.retrieve_objects(objects, obj_runtime_param) # Retrieve the detected objects

sl.Objects objects = new sl.Objects(); // Structure containing all the detected objects
RuntimeParameters runtimeParameters = new RuntimeParameters();
if (zed.Grab(ref runtimeParameters) == ERROR_CODE.SUCCESS) {
  zed.RetrieveObjects(ref objects, ref obj_runtime_param); // Retrieve the detected objects
}

The sl::Objects class stores all the information regarding the different objects present in the scene in it object_list attribute. Each individual object is stored as a sl::ObjectData with all information about it, such as bounding box, position, mask, etc. All objects from a given frame are stored in a vector within sl::Objects. sl::Objects also contains the timestamp of the detection, which can help connect the objects to the images.

You can iterate through the objects as follows:

C++
Python
C#

for(auto object : objects.object_list)
  std::cout << object.id << " " << object.position << std::endl;

for object in objects.object_list:
  print("{} {}".format(object.id, object.position))

for (int idx = 0; idx < objects.numObject; idx++)
  Console.WriteLine(objects.objectData[idx].id + " " + objects.objectData[idx].position);

Each detected object can be accessed by using its ID as follows:

C++
Python
C#

sl::ObjectData object;
objects.getObjectDataFromId(object, 0); // Get the object with ID = O

object = sl.ObjectData()
objects.get_object_data_from_id(object, 0); # Get the object with ID = O

sl.ObjectData objectData = new ObjectData();
objects.GetObjectDataFromId(ref objectData, 0); // Get the object with ID = O

Accessing Object Information

Once an sl::ObjectData is retrieved from the object vector, you can access information such as its ID, position, velocity, label, and tracking_state:

C++
Python
C#

unsigned int object_id = object.id // Get the object id
sl::float3 object_position = object.position // Get the object position
sl::float3 object_velocity = object.velocity // Get the object velocity
sl::OBJECT_TRACKING_STATE object_tracking_state = object.tracking_state // Get the tracking state of the object
if(object_tracking_state == sl::OBJECT_TRACK_STATE::OK){
    cout << "Object " << object_id << " is tracked" << endl;
}

object_id = object.id # Get the object id
object_position = object.position # Get the object position
object_velocity = object.velocity # Get the object velocity
object_tracking_state = object.tracking_state # Get the tracking state of the object
if object_tracking_state == sl.OBJECT_TRACK_STATE.OK :
    print("Object {0} is tracked\n".format(object_id))

uint object_id = object.id // Get the object id
Vector3 object_position = object.position // Get the object position
Vector3 object_velocity = object.velocity // Get the object velocity
OBJECT_TRACK_STATE object_tracking_state = object.objectTrackingState; // Get the tracking state of the object
if(object_tracking_state == sl.OBJECT_TRACK_STATE.OK){
    Console.WriteLine("Object " + object_id + " is tracked");
}

You can also access the confidence of the detection for each object. This confidence depicts the probability of a detected object to really be present in the scene. Therefore, this confidence can be used to post-filter the detected objects. For example, you can ignore objects with a confidence less than 10%:

C++
Python
C#

for(auto object : objects.object_list){
  if(object.confidence < 0.1f)
    continue;
  // Work with other objects
}

for object in objects.object_list:
  if object.confidence < 0.1 :
    continue
  # Work with other objects

for (int idx = 0; idx < objects.numObject; idx++){
  if(objects.objectData[idx].confidence < 0.1f)
    continue;
  // Work with other objects
}

Getting 3D Bounding Boxes

Each detected object contains two bounding boxes: a 2D bounding box and a 3D bounding box. The 2D bounding box is defined in the image frame while the 3D bounding box is provided with the depth information.

The 2D bounding box is represented as four 2D points starting from the top left corner of the object. The 3D bounding box is represented by eight 3D points starting from the top left front corner, as follows:

The 2D and 3D bounding boxes are accessible in sl::ObjectData:

C++
Python
C#

vector<sl::uint2> object_2Dbbox = object.bounding_box_2d; // Get the 2D bounding box of the object
vector<sl::float3> object_3Dbbox = object.bounding_box; // Get the 3D bounding box of the object

object_2Dbbox = object.bounding_box_2d; # Get the 2D bounding box of the object
object_3Dbbox = object.bounding_box; # Get the 3D Bounding Box of the object

Vector2[] object_2Dbbox = objects.objectData[idx].boundingBox2D; // Get the 2D bounding box of the object
Vector3[] object_3Dbbox = objects.objectData[idx].boundingBox; // Get the 3D bounding box of the object

Getting the Object Mask

Each object can also be represented by its mask. The mask includes the pixels within the 2D bounding box that belong to the object. Pixels from the object itself are set to 255 while the pixels of the background are set to 0. You can access the mask of an object with sl::Mat object_mask = object.mask;.

Code Example

For code examples, check out the Tutorial and Sample on GitHub.