WIP cleaning + start READMD

bb54a086 · zinsmatt · f4f3ccb8 · bb54a086 · bb54a086
Commit bb54a086 authored 4 years ago by zinsmatt
--- a/README.md
+++ b/README.md
 # 3D-Aware Ellipses for Visual Localization

+Extended implementation of the paper: 3D-Aware Ellipse Prediction for Object-Based Camera Pose Estimation. Matthieu Zins, Gilles Simon, Marie-Odile Berger. 3DV 2020. [Paper](https://arxiv.org/abs/2003.10432) | [Video](https://youtu.be/9NOPcOGV6nU)
+
+
+
+<img src='imgs/AtlasGIF.gif'/>
+<img src='imgs/figure1.jpg'/>
+
+
+## Installation
+
+The file ```env/environment.yml``` lists the basic dependencies.
+Run the following command to create a virtual environment (called *visual-loc*) with these dependencies installed.
+
+
+```
+conda env create -f env/environment.yml
+```
+
+### Pyellcv library
+
+The code additionally depends on the pyellcv library for ellipse/ellipsoids manipulation and pose computation.
+
+```
+python -m pip install 'git+https://gitlab.inria.fr/mzins/pyellcv.git'
+# (add --user if you don't have permission)
+
+# Or, to install it from a local clone:
+git clone git+https://gitlab.inria.fr/mzins/pyellcv.git
+python -m pip install -e ./pyellcv
+```
+
+### Object Detection
+
+We use Detectron2 for object detection. It provides implementation and pre-trained weights for state-of-the-art object detection algorithms. In particular, we are using the Faster R-CNN architecture.
+
+```
+python -m pip install 'git+https://github.com/facebookresearch/detectron2.git'
+# (add --user if you don't have permission)
+
+# Or, to install it from a local clone:
+git clone https://github.com/facebookresearch/detectron2.git
+python -m pip install -e detectron2
+```
+
+
+
+## Data
+
+To demonstrate the method, we used the [Chess](http://download.microsoft.com/download/2/8/5/28564B23-0828-408F-8631-23B1EFF1DAC8/chess.zip) scene of the [7-Scenes](https://www.microsoft.com/en-us/research/project/rgb-d-dataset-7-scenes/) dataset.
+You can easily apply the method on your own dataset. There are only two required files (described below):
+- **scene model**
+- **dataset file**
+
+### Scene model
+
+The localization method is based on a scene model in the form of an ellipsoid cloud. We adopted a simple JSON format this scene model, describing the ellipoids with some semantic information (i.e the object category). We provide a scene model for the Chess scene of the 7-Scene dataset, composed of 11 objects (from 7 cateories) on the Chess scene of the 7-Scenes dataset.
+
+```json
+[
+    {
+        "category_id": 3,
+        "object_id": 7,
+        "ellipse": {
+            "axes": [0.1, 0.2, 0.3],
+            "R": 3x3 rotation matrix,
+            "center": [0.2, 0.2, 0.4],
+        }
+    },
+    ...
+]
+```
+
+### Data preparation
+
+We use a common JSON format for grouping the pose-annotated images of our dataset. We provide a script (`prepare_7-Scenes.py`) for transforming the 7-Scene dataset into this format, but it can be easily adapted for your own dataset.
+
+
+```json
+[
+    {
+        "file_name": ".../frame-000000.color.png",
+        "width": 640,
+        "height": 480,
+        "K": [ ... ],
+        "R": [ ... ],
+        "t": [ ... ],
+    },
+    ...
+]
+```
+
+
+> **WARNING**: Because of assumptions on the camera roll made in P2E (used when only 2 objects are visible), the z-axis of the scene coordinate system needs to be vertical (and the XY-plane is horizontal). If this is not the case in your dataset but you still want to handle the 2-objects case, you will need to transform the scene coordinate system. This is what we did for the Chess scene (see `prepare_7-Scenes.py`).
+
+
+
+### Automatic data annotation
+
+Elliptic annotations for objects can then be generated from the scene model and the pose-annotated images using `annotate_objects.py`. This adds objects annotations (bounding box, category, projection ellipse) to a dataset file. Our JSON format is actually based on the format used by Detectron2 and can thus be used for training both Faster R-CNN and the ellipse prediction networks.
+
+
+The full pre-processing pipeline (preparation + annotation) for generating the training and testing datasets for the Chess scene can be run with:
+```
+sh run_preprocessing.sh path/to/chess/scene/folder
+```
+
+This will generate 4 files:
+- 7-Scenes_Chess_dataset_train.json
+- 7-Scenes_Chess_dataset_test.json
+- 7-Scenes_Chess_dataset_train_with_obj_annot.json
+- 7-Scenes_Chess_dataset_test_with_obj_annot.json
+
+
+
+
+
+## 3D-Aware ellipse prediction
+
+### Pre-trained models
+Pre-trained weights for the ellipse prediction part on the Chess scene can be downloaded [here]().
+
+### Training
+To train the ellipse prediction network for each object of you scene, run:
+
+```
+python train_ellipse_prediction.py scene.json 7-Scenes_Chess_dataset_train_with_obj_annot.json 7-Scenes_Chess_dataset_test_with_obj_annot.json ellipses_checkpoints [--nb_epochs 300]
+```
+
+### Evaluation
+```
+python eval_ellipse_prediction.py scene.json 7-Scenes_Chess_dataset_test_with_obj_annot.json ellipses_checkpoints [--output_images out_folder]
+```
+
+
+## Object detection
+
+### Pre-trained models
+Pre-trained weights for the object detection network fine-tuned on our objects of the Chess scene can be downloaded [here]().
+
+### Training
+```
+python train_object_detector.py 7-Scenes_Chess_dataset_train_with_obj_annot.json detector_checkpoint/final.pth [--nb_epochs 50]
+```
+
+
+
+
+### Testing
+```
+python run_object_detector.py 7-Scenes_Chess_dataset_test.json detector_checkpoint/final.pth --visualize [--skip frames 10 --save_detections_file detections.json]
+```
+
+
+### Configurations
+The `config.py` file contains parameters that can be changed. Especially, it defines the level and kind of data augmentation used during training.
+
+ ## Running visual localization
+Run the following command:
+
+ ```
+ python run.py scene.json  7-Scenes_Chess_dataset_test.json detector_checkpoint/final.pth ellipses_checkpoints --visualize [--output_images output_folder --skip_frames 50 --output_predictions predictions.json --output_errors errors]
+```
+
+
+The output images represent the result of the ellipses IoU-based RANSAC.
+The objects detections found by Faster R-CNN are shown with white boxes. The bold ellipses represent the ellipsoids of the scene modle projected with the estimated camera pose. The thin ones correspond to the ellipse predictions.
+Color code:
+- <span style="color:green">*green*</span> predicted ellipses and projected ellipsoids used inside the pose computation (P3P or P2E).
+- <span style="color:blue">*blue*</span> predicted ellipses and projected ellipsoids not directly used inside the pose computation but selected as inliers in the validation step of RANSAC.
+- <span style="color:red">*red*</span> predicted ellipses and projected ellipsoids not used for pose computation.
+
+The top-left value is the position error (in meters) and the top-right value is the orientation error (in degrees).
+
+Notice that there might be several ellipses predicted per object, as several objects of the same category can be present in the scene and the detection module can only recognize objects categories (not instances).
--- a/model.py
+++ b/model.py
@@ -23,7 +23,7 @@ from pytorch_lightning import loggers as pl_loggers
 from pytorch_lightning.callbacks import ModelCheckpoint

 from ellcv.types import Ellipse
-from ellcv.utils.cpp import compute_iou_toms
+from ellcv.utils.cpp import compute_iou_toms_
 from ellcv.visu import draw_ellipse

 from config import _cfg
@@ -32,8 +32,8 @@ from loss import SamplingBasedLoss


 def compute_ellipses_iou(ell1, ell2):
-    return compute_iou_toms(np.hstack(ell1.decompose()),
-                            np.hstack(ell2.decompose()))
+    return compute_iou_toms_(np.hstack(ell1.decompose()),
+                             np.hstack(ell2.decompose()))


 def build_ellipses_from_pred(pred):
@@ -68,44 +68,26 @@ class EllipsePredictor(pl.LightningModule):
        self.backbone = nn.Sequential(vgg, nn.AdaptiveAvgPool2d(output_size=(2, 2)))
        self.n_features = 512*2*2
        self.mlp = nn.Sequential(nn.Linear(self.n_features, 256),
+                                  nn.BatchNorm1d(num_features=256),
                                  nn.ReLU(True),
                                  nn.Linear(256, 256),
+                                  nn.BatchNorm1d(num_features=256),
                                  nn.ReLU(True),
                                  nn.Linear(256, 64),
+                                  nn.BatchNorm1d(num_features=64),
                                  nn.ReLU(True))

        self.abxy = nn.Sequential(nn.Linear(64, 32),
+                                  nn.BatchNorm1d(num_features=32),
                                  nn.ReLU(True),
                                  nn.Linear(32, 4),
                                  nn.Sigmoid())

        self.angle = nn.Sequential(nn.Linear(64, 32),
-                                  nn.ReLU(True),
-                                  nn.Linear(32, 1),
-                                  nn.Tanh())
-
-        # self.mlp = nn.Sequential(nn.Linear(self.n_features, 256),
-        #                           nn.BatchNorm1d(num_features=256),
-        #                           nn.ReLU(True),
-        #                           nn.Linear(256, 256),
-        #                           nn.BatchNorm1d(num_features=256),
-        #                           nn.ReLU(True),
-        #                           nn.Linear(256, 64),
-        #                           nn.BatchNorm1d(num_features=64),
-        #                           nn.ReLU(True))
-
-        # self.abxy = nn.Sequential(nn.Linear(64, 32),
-        #                           nn.BatchNorm1d(num_features=32),
-        #                           nn.ReLU(True),
-        #                           nn.Linear(32, 4),
-        #                           nn.Sigmoid())
-
-        # self.angle = nn.Sequential(nn.Linear(64, 32),
-        #                            nn.BatchNorm1d(num_features=32),
-        #                            nn.ReLU(True),
-        #                            nn.Linear(32, 1),
-        #                            nn.Tanh())
-
+                                   nn.BatchNorm1d(num_features=32),
+                                   nn.ReLU(True),
+                                   nn.Linear(32, 1),
+                                   nn.Tanh())

        # Validation
        self.val_data_index = 0