Setup
仓库: https://github.com/Simple-Robotics/cosypose
1 | git clone --recurse-submodules https://github.com/Simple-Robotics/cosypose.git |
注意执行这一步的时候pip 会提示setuptools 和matplotlib-inline不符合3.7.6的python,到环境中手动安装适配的版本
1 | conda activate cosypose |
1 | git lfs pull |
根据README下载数据
注意第一块指令无法下载成功,由 https://bop.felk.cvut.cz/datasets/ 得知下载链接迁移到了huggingface, https://huggingface.co/datasets/bop-benchmark/datasets/tree/main/ycbv 可以从这里手动下载测试集并放置到local_data/bop_datasets/ycbv/test
设置测试使用的models
1 | cp ./local_data/bop_datasets/ycbv/model_bop_compat_eval ./local_data/bop_datasets/ycbv/models |
Debug
np.where(mask)[0].item()
运行
1 | export CUDA_VISIBLE_DEVICES=0 |
时出现报错
1 | Traceback (most recent call last): |
添加debug输出,得到
1 | Debug - scene_id: 48, view_id: 1 |
发现是下载的测试数据集并不包含数据集keyframe.txt中所有的帧,导致一些关键帧识别不到
运行到一半被终止的情况
如果想重新开始新的训练: 清空local_data/joblib_cache
Framework
prediction script cosypose.scripts.run_cosypose_eval
AI explanation
The script predicts object poses based on multi-view input by following these steps:
Dataset Loading: It first loads the dataset using the
make_scene_dataset
function, which prepares the scene data for evaluation. The dataset is wrapped in aMultiViewWrapper
to handle multiple views.Model Loading: The script loads pre-trained models for pose prediction using the
load_models
function. It loads both coarse and refiner models based on the configuration specified in the command-line arguments.Prediction Setup: The script sets up the prediction parameters, including the number of iterations for coarse and refiner models, and whether to skip multi-view processing based on the number of views specified.
Multi-view Prediction: The
MultiviewScenePredictor
is initialized with the mesh database, which is used to predict poses across multiple views. TheMultiviewPredictionRunner
is then used to run predictions on the dataset, leveraging the multi-view setup to improve pose estimation accuracy.Pose Estimation: The script uses the loaded models to predict object poses. It processes detections from either
pix2pose
orposecnn
depending on the dataset, and refines these predictions using the refiner model.Evaluation: After predictions, the script evaluates the predicted poses using the
PoseEvaluation
class. It calculates various metrics like ADD-S and AUC to assess the accuracy of the pose predictions.Results Logging: Finally, the script logs the results, including evaluation metrics, and saves them to a specified directory.
The multi-view approach allows the script to leverage information from different viewpoints, which can help resolve ambiguities and improve the robustness of the pose estimation.
Model dataset for the initialization of MultiviewScenePredictor
In run_cosypose_eval
we initialize MultiviewScenePredictor
in this way:
1 | mv_predictor = MultiviewScenePredictor(mesh_db) |
In the MultiviewScenePredictor
we use the mesh_db to initialize MultiviewRefinement
and solve:
1 | problem = MultiviewRefinement(candidates=candidates_n, |
The solve
function of MultiviewRefinement
:
1 | def solve(self, sample_n_init=1, **lm_kwargs): |
Adaption
不急