We use Detectron2 for object detection. It provides implementation and pre-trained weights for state-of-the-art object detection algorithms. In particular, we are using the Faster R-CNN architecture.
To demonstrate the method, we used the [Chess](http://download.microsoft.com/download/2/8/5/28564B23-0828-408F-8631-23B1EFF1DAC8/chess.zip) scene of the [7-Scenes](https://www.microsoft.com/en-us/research/project/rgb-d-dataset-7-scenes/) dataset.
You can easily apply the method on your own dataset. There are only two required files (described below):
-**scene model**
-**dataset file**
### Scene model
The localization method is based on a scene model in the form of an ellipsoid cloud. We adopted a simple JSON format this scene model, describing the ellipoids with some semantic information (i.e the object category). We provide a scene model for the Chess scene of the 7-Scene dataset, composed of 11 objects (from 7 cateories) on the Chess scene of the 7-Scenes dataset.
```json
[
{
"category_id":3,
"object_id":7,
"ellipse":{
"axes":[0.1,0.2,0.3],
"R":3x3rotationmatrix,
"center":[0.2,0.2,0.4],
}
},
...
]
```
### Data preparation
We use a common JSON format for grouping the pose-annotated images of our dataset. We provide a script (`prepare_7-Scenes.py`) for transforming the 7-Scene dataset into this format, but it can be easily adapted for your own dataset.
```json
[
{
"file_name":".../frame-000000.color.png",
"width":640,
"height":480,
"K":[...],
"R":[...],
"t":[...],
},
...
]
```
> **WARNING**: Because of assumptions on the camera roll made in P2E (used when only 2 objects are visible), the z-axis of the scene coordinate system needs to be vertical (and the XY-plane is horizontal). If this is not the case in your dataset but you still want to handle the 2-objects case, you will need to transform the scene coordinate system. This is what we did for the Chess scene (see `prepare_7-Scenes.py`).
### Automatic data annotation
Elliptic annotations for objects can then be generated from the scene model and the pose-annotated images using `annotate_objects.py`. This adds objects annotations (bounding box, category, projection ellipse) to a dataset file. Our JSON format is actually based on the format used by Detectron2 and can thus be used for training both Faster R-CNN and the ellipse prediction networks.
The full pre-processing pipeline (preparation + annotation) for generating the training and testing datasets for the Chess scene can be run with:
```
sh run_preprocessing.sh path/to/chess/scene/folder
The output images represent the result of the ellipses IoU-based RANSAC.
The objects detections found by Faster R-CNN are shown with white boxes. The bold ellipses represent the ellipsoids of the scene modle projected with the estimated camera pose. The thin ones correspond to the ellipse predictions.
Color code:
-<spanstyle="color:green">*green*</span> predicted ellipses and projected ellipsoids used inside the pose computation (P3P or P2E).
-<spanstyle="color:blue">*blue*</span> predicted ellipses and projected ellipsoids not directly used inside the pose computation but selected as inliers in the validation step of RANSAC.
-<spanstyle="color:red">*red*</span> predicted ellipses and projected ellipsoids not used for pose computation.
The top-left value is the position error (in meters) and the top-right value is the orientation error (in degrees).
Notice that there might be several ellipses predicted per object, as several objects of the same category can be present in the scene and the detection module can only recognize objects categories (not instances).