Chen Yulin's BlogChen Yulin's Blog
HomeArchivesCategoriesTagsAbout
  • Tags
  • 6-D
Cosypose modification
Posted 2024-11-17Updated 2025-05-08Note18 minutes read (About 2722 words)

Cosypose modification

Setup

仓库: https://github.com/Simple-Robotics/cosypose

1
2
3
git clone --recurse-submodules https://github.com/Simple-Robotics/cosypose.git
cd cosypose
conda env create -n cosypose --file environment.yaml

注意执行这一步的时候pip 会提示setuptools 和matplotlib-inline不符合3.7.6的python,到环境中手动安装适配的版本

1
2
3
conda activate cosypose
pip install setuptools==63.4.1
pip install matplotlib-inline==0.1.6
1
2
3
git lfs pull
python setup.py install
python setup.py develop

根据README下载数据
注意第一块指令无法下载成功,由 https://bop.felk.cvut.cz/datasets/ 得知下载链接迁移到了huggingface, https://huggingface.co/datasets/bop-benchmark/datasets/tree/main/ycbv 可以从这里手动下载测试集并放置到local_data/bop_datasets/ycbv/test

设置测试使用的models

1
cp ./local_data/bop_datasets/ycbv/model_bop_compat_eval ./local_data/bop_datasets/ycbv/models

Debug

np.where(mask)[0].item()

运行

1
2
export CUDA_VISIBLE_DEVICES=0 
python -m cosypose.scripts.run_cosypose_eval --config ycbv

时出现报错

1
2
3
4
5
6
7
8
9
10
11
12
Traceback (most recent call last):
File "/home/cyl/.conda/envs/cosypose/lib/python3.7/runpy.py", line 193, in _run_module_as_main
"__main__", mod_spec)
File "/home/cyl/.conda/envs/cosypose/lib/python3.7/runpy.py", line 85, in _run_code
exec(code, run_globals)
File "/home/cyl/cosypose/cosypose/scripts/run_cosypose_eval.py", line 491, in <module>
main()
File "/home/cyl/cosypose/cosypose/scripts/run_cosypose_eval.py", line 332, in main
scene_ds = make_scene_dataset(ds_name)
File "/home/cyl/cosypose/cosypose/datasets/datasets_cfg.py", line 68, in make_scene_dataset
ids.append(np.where(mask)[0].item())
ValueError: can only convert an array of size 1 to a Python scalar

添加debug输出,得到

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
Debug - scene_id: 48, view_id: 1
Debug - mask matches: 1
Debug - where result shape: (1,), values: [225]
Debug - scene_id: 48, view_id: 36
Debug - mask matches: 1
Debug - where result shape: (1,), values: [226]
Debug - scene_id: 48, view_id: 47
Debug - mask matches: 1
Debug - where result shape: (1,), values: [227]
Debug - scene_id: 48, view_id: 83
Debug - mask matches: 1
Debug - where result shape: (1,), values: [228]
Debug - scene_id: 48, view_id: 112
Debug - mask matches: 1
Debug - where result shape: (1,), values: [229]
Debug - scene_id: 48, view_id: 135
Debug - mask matches: 0
Debug - where result shape: (0,), values: []
0:00:00.912023 - Expected exactly one match, got 0 matches for scene_id=48, view_id=135

发现是下载的测试数据集并不包含数据集keyframe.txt中所有的帧,导致一些关键帧识别不到

运行到一半被终止的情况

如果想重新开始新的训练: 清空local_data/joblib_cache

Framework

Prediction Script cosypose.scripts.run_cosypose_eval

AI explanation

The script predicts object poses based on multi-view input by following these steps:

  1. Dataset Loading: It first loads the dataset using the make_scene_dataset function, which prepares the scene data for evaluation. The dataset is wrapped in a MultiViewWrapper to handle multiple views.

  2. Model Loading: The script loads pre-trained models for pose prediction using the load_models function. It loads both coarse and refiner models based on the configuration specified in the command-line arguments.

  3. Prediction Setup: The script sets up the prediction parameters, including the number of iterations for coarse and refiner models, and whether to skip multi-view processing based on the number of views specified.

  4. Multi-view Prediction: The MultiviewScenePredictor is initialized with the mesh database, which is used to predict poses across multiple views. The MultiviewPredictionRunner is then used to run predictions on the dataset, leveraging the multi-view setup to improve pose estimation accuracy.

  5. Pose Estimation: The script uses the loaded models to predict object poses. It processes detections from either pix2pose or posecnn depending on the dataset, and refines these predictions using the refiner model.

  6. Evaluation: After predictions, the script evaluates the predicted poses using the PoseEvaluation class. It calculates various metrics like ADD-S and AUC to assess the accuracy of the pose predictions.

  7. Results Logging: Finally, the script logs the results, including evaluation metrics, and saves them to a specified directory.

The multi-view approach allows the script to leverage information from different viewpoints, which can help resolve ambiguities and improve the robustness of the pose estimation.

Prediction Script run_custom_scenario

Terms

TCO

Transformation from Camera to Object.
It represents the transformation matrix or parameters that describe the pose of an object relative to the camera’s coordinate system

TWO

Transformation from World to Object.
It represents the transformation matrix or parameters that describe the pose of an object relative to the world’s coordinate system

Model dataset

1
2
3
4
5
6
7
8
9
10
class MeshDataBase:
def __init__(self, obj_list):
self.infos = {obj['label']: obj for obj in obj_list}
self.meshes = {l: trimesh.load(obj['mesh_path']) for l, obj in self.infos.items()}

@staticmethod
def from_object_ds(object_ds):
obj_list = [object_ds[n] for n in range(len(object_ds))]
return MeshDataBase(obj_list)
...

一般使用的初始化方式:

1
2
object_ds = BOPObjectDataset(scenario_dir / 'models')
mesh_db = MeshDataBase.from_object_ds(object_ds)

也可以通过load models一起加载:

1
predictor, mesh_db = load_models(coarse_run_id, refiner_run_id, n_workers=n_plotters, object_set=object_set)

Important Classes

Multiview_wrapper

作用:
读取 scene_dataset 并且通过视角数量n_views来分割这些数据为不同场景,然后方便遍历其中的场景元素(这里都是ground truth)
遍历时返回的值为

  • n_views张不同视角下的RGB图像
  • n_views张对应的mask
  • n_views份对应的observation
    • 识别到的物体位姿和类型
    • 相机位姿和内参
    • frame_info,没太多用
      1
      2
      scene_ds_pred = MultiViewWrapper(scene_ds, n_views=n_views)
      scene_ds_pred[0][2] # scene48 multiview_group1 's observations in five views
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
[
{'objects':
[
{'label': 'obj_000001',
'name': 'obj_000001',
'TWO': array([[-0.02062261, -0.99870347, -0.04654345, -0.05380909],
[ 0.99854439, -0.022895 , 0.04883047, 0.00189095],
[-0.04983272, -0.04546878, 0.9977229 , 0.07060698],
[ 0. , 0. , 0. , 1. ]]),
'T0O': array([[-0.02062261, -0.99870347, -0.04654345, -0.05380909],
[ 0.99854439, -0.022895 , 0.04883047, 0.00189095],
[-0.04983272, -0.04546878, 0.9977229 , 0.07060698],
[ 0. , 0. , 0. , 1. ]]),
'visib_fract': 0.7769277845777234,
'id_in_segm': 1,
'bbox': [347, 210, 467, 374]},
{'label': 'obj_000006',
'name': 'obj_000006',
'TWO': array([[-0.40056693, 0.91475543, -0.05262471, 0.03103553],
[-0.91622629, -0.39934108, 0.03248866, -0.02365388],
[ 0.00870386, 0.06123014, 0.9980863 , 0.01391488],
[ 0. , 0. , 0. , 1. ]]),
'T0O': array([[-0.40056693, 0.91475543, -0.05262471, 0.03103553],
[-0.91622629, -0.39934108, 0.03248866, -0.02365388],
[ 0.00870386, 0.06123014, 0.9980863 , 0.01391488],
[ 0. , 0. , 0. , 1. ]]),
'visib_fract': 0.9990349353406678,
'id_in_segm': 2,
'bbox': [328, 343, 422, 405]},
{'label': 'obj_000014',
'name': 'obj_000014',
'TWO': array([[ 0.24178672, -0.96941339, -0.04215706, -0.05206396],
[ 0.96977496, 0.2399519 , 0.0442575 , 0.0179453 ],
[-0.03278805, -0.05158388, 0.99813144, 0.16636215],
[ 0. , 0. , 0. , 1. ]]),
'T0O': array([[ 0.24178672, -0.96941339, -0.04215706, -0.05206396],
[ 0.96977496, 0.2399519 , 0.0442575 , 0.0179453 ],
[-0.03278805, -0.05158388, 0.99813144, 0.16636215],
[ 0. , 0. , 0. , 1. ]]),
'visib_fract': 0.9938250428816466,
'id_in_segm': 3,
'bbox': [372, 143, 490, 241]},
{'label': 'obj_000019',
'name': 'obj_000019',
'TWO': array([[-0.69888905, 0.1926738 , -0.68878937, 0.01412755],
[ 0.711967 , 0.27928957, -0.64428215, 0.05127768],
[ 0.06823575, -0.94067797, -0.33237011, 0.06472594],
[ 0. , 0. , 0. , 1. ]]),
'T0O': array([[-0.69888905, 0.1926738 , -0.68878937, 0.01412755],
[ 0.711967 , 0.27928957, -0.64428215, 0.05127768],
[ 0.06823575, -0.94067797, -0.33237011, 0.06472594],
[ 0. , 0. , 0. , 1. ]]),
'visib_fract': 0.9890470974808324,
'id_in_segm': 4,
'bbox': [419, 222, 527, 410]},
{'label': 'obj_000020',
'name': 'obj_000020',
'TWO': array([[-0.74512542, -0.66691536, 0.00352083, 0.07854437],
[-0.6669148 , 0.74507458, -0.00940455, -0.15283599],
[ 0.00364864, -0.00935569, -0.99995023, 0.01854317],
[ 0. , 0. , 0. , 1. ]]),
'T0O': array([[-0.74512542, -0.66691536, 0.00352083, 0.07854437],
[-0.6669148 , 0.74507458, -0.00940455, -0.15283599],
[ 0.00364864, -0.00935569, -0.99995023, 0.01854317],
[ 0. , 0. , 0. , 1. ]]),
'visib_fract': 0.9953060637992145,
'id_in_segm': 5,
'bbox': [92, 328, 288, 442]}],
'camera':
{'T0C': array([[-0.0792652 , 0.241296 , -0.967209 , 0.946419 ],
[ 0.996102 , 0.0568396 , -0.0674529 , -0.02116569],
[ 0.0386997 , -0.968786 , -0.244861 , 0.36645836],
[ 0. , 0. , 0. , 1. ]]),
'K': array([[1.066778e+03, 0.000000e+00, 3.129869e+02],
[0.000000e+00, 1.067487e+03, 2.413109e+02],
[0.000000e+00, 0.000000e+00, 1.000000e+00]]),
'TWC': array([[-0.0792652 , 0.241296 , -0.967209 , 0.946419 ],
[ 0.996102 , 0.0568396 , -0.0674529 , -0.02116569],
[ 0.0386997 , -0.968786 , -0.244861 , 0.36645836],
[ 0. , 0. , 0. , 1. ]]),
'resolution': torch.Size([480, 640])},
'frame_info':
{
'scene_id': 48,
'cam_id': 'cam',
'view_id': 1626,
'cam_name': 'cam',
'group_id': 0
}
},
... # other views
]

MultiviewPredictorRunner

作用:
接收Multiview_wrapper作为输入,并做出预测

首先是数据集接收:

1
2
3
4
5
dataloader = DataLoader(scene_ds, batch_size=batch_size,
num_workers=n_workers,
sampler=sampler,
collate_fn=self.collate_fn)

use collate_fn to process the row data (最后的注释里面有真正用到的数据)

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
def collate_fn(self, batch):
batch_im_id = -1

cam_infos, K = [], []
det_infos, bboxes = [], []
for n, data in enumerate(batch): # normally only one batch
assert n == 0
images, masks, obss = data
for c, obs in enumerate(obss): # iterate along different views
batch_im_id += 1
frame_info = obs['frame_info']
im_info = {k: frame_info[k] for k in ('scene_id', 'view_id', 'group_id')} # info for the image
im_info.update(batch_im_id=batch_im_id)
cam_info = im_info.copy() # info for camera

K.append(obs['camera']['K']) # info for 相机内参
cam_infos.append(cam_info)

for o, obj in enumerate(obs['objects']):
obj_info = dict(
label=obj['name'],
score=1.0,
)
obj_info.update(im_info) # add key-value pair from im_info to obj_info
bboxes.append(obj['bbox'])
det_infos.append(obj_info)

gt_detections = tc.PandasTensorCollection(
infos=pd.DataFrame(det_infos),
bboxes=torch.as_tensor(np.stack(bboxes)),
) # 包括每一个ground truthdetection的的基本info,和检测框
cameras = tc.PandasTensorCollection(
infos=pd.DataFrame(cam_infos),
K=torch.as_tensor(np.stack(K)),
)# 包括每一view 相机的基本info(和detection info相同),和内参
data = dict(
images=images,
cameras=cameras,
gt_detections=gt_detections,
)
return data

最重要的function: get_predictions

1
2
3
4
5
def get_predictions(self, pose_predictor, mv_predictor,
detections=None,
n_coarse_iterations=1, n_refiner_iterations=1,
sv_score_th=0.0, skip_mv=True,
use_detections_TCO=False):

Responsible for generating predictions for object poses in a scene using both single-view and multi-view approaches.

  1. Input Parameters:

    • pose_predictor: single view predictor,比如ycbv数据集用的就是posecnn的检测模型
    • mv_predictor: An object or function that predicts scene states using multi-view information.
    • detections: A collection of detected objects with associated information, pre-generated and saved in a .pkl file
    • n_coarse_iterations, n_refiner_iterations: Number of iterations for coarse and refinement pose estimation.
    • sv_score_th: Score threshold for single-view detections.
    • skip_mv: A flag to skip multi-view predictions.
    • use_detections_TCO: A flag to use detections for initial pose estimation.
  2. Filtering Detections:
    需要注意的是这里使用的detection是直接来自预存好的检测数据(非ground truth)

    1
    posecnn_detections = load_posecnn_results()
    • The function filters the input detections based on the sv_score_th threshold.
    • It assigns a unique detection ID to each detection and creates an index based on scene_id and view_id.
  3. Iterating Over Data:

    • The function iterates over batches of data from the dataloader.
    • For each batch, it extracts images, camera information, and ground truth detections.
  4. Matching Detections:

    • It matches the detections with the current batch of data using the index created earlier.
    • It filters and prepares the detections for processing.
  5. Pose Prediction:

    • If there are detections, it uses the pose_predictor to get single-view predictions.
    • It registers the initial bounding boxes with the candidates.
  6. Multi-View Prediction:

    • If skip_mv is False, it uses the mv_predictor to predict the scene state using multi-view information.
  7. Collecting Predictions:

    • It collects the single-view and multi-view predictions into a dictionary.
  8. Concatenating Results:

    • It concatenates the predictions across all batches and returns the final predictions.

MultiviewScenePredictor

作用:
used by Myltiview_PredictionRunner.get_predictions
In run_cosypose_eval we initialize MultiviewScenePredictor in this way:

1
mv_predictor = MultiviewScenePredictor(mesh_db)

In the MultiviewScenePredictor we use the mesh_db to initialize MultiviewRefinement and solve:

1
2
3
4
5
6
7
8
9
problem = MultiviewRefinement(candidates=candidates_n,
cameras=cameras,
pairs_TC1C2=pairs_TC1C2,
mesh_db=self.mesh_db_ba)
ba_outputs = problem.solve(
n_iterations=ba_n_iter,
optimize_cameras=not use_known_camera_poses,
)

The solve function of MultiviewRefinement:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
def solve(self, sample_n_init=1, **lm_kwargs):
timer_init = Timer()
timer_opt = Timer()
timer_misc = Timer()

timer_init.start()
TWO_9d_init, TCW_9d_init = self.robust_initialization_TWO_TCW(n_init=sample_n_init)
timer_init.pause()

timer_opt.start()
TWO_9d_opt, TCW_9d_opt, history = self.optimize_lm(
TWO_9d_init, TCW_9d_init, **lm_kwargs)
timer_opt.pause()

timer_misc.start()
objects, cameras = self.make_scene_infos(TWO_9d_opt, TCW_9d_opt)
objects_init, cameras_init = self.make_scene_infos(TWO_9d_init, TCW_9d_init)
history = self.convert_history(history)
timer_misc.pause()

outputs = dict(
objects_init=objects_init,
cameras_init=cameras_init,
objects=objects,
cameras=cameras,
history=history,
time_init=timer_init.stop(),
time_opt=timer_opt.stop(),
time_misc=timer_misc.stop(),
)
return outputs

Adaption

准备基于run_custom_scenario进行修改
run_custom_scenario的使用方式:

1
python -m cosypose.scripts.run_custom_scenario --scenario=example
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
Setting OMP and MKL num threads to 1.
pybullet build time: Jan 28 2022 20:13:03
0:00:00.000859 - -----------------------------------------------
---------------------------------
0:00:00.000921 - scenario: example
0:00:00.000942 - sv_score_th: 0.3
0:00:00.000956 - n_symmetries_rot: 64
0:00:00.000956 - n_symmetries_rot: 64
0:00:00.000968 - ransac_n_iter: 2000
0:00:00.000980 - ransac_dist_threshold: 0.02
0:00:00.001002 - nms_th: 0.04
0:00:00.001015 - no_visualization: False
0:00:00.001026 - -----------------------------------------------
---------------------------------
0:00:00.569089 - Loaded 796 candidates in 8 views.
0:00:00.570278 - Loaded cameras intrinsics.
0:00:00.690990 - Loaded 30 3D object models.
0:00:00.691047 - Running stage 2 and 3 of CosyPose...
0:00:01.145408 - Num candidates: 107
0:00:01.145468 - Num views: 8
0:00:01.145728 - Estimating camera poses using RANSAC.
0:00:04.588304 - Matched candidates: 49
0:00:04.588375 - RANSAC time_models: 0:00:02.390068
0:00:04.588398 - RANSAC time_score: 0:00:00.990740
0:00:04.588415 - RANSAC time_misc: 0:00:00.061626
0:00:04.902268 - BA time_init: 0:00:00.005349
0:00:04.902333 - BA time_opt: 0:00:00.091822
0:00:04.902351 - BA time_misc: 0:00:00.004793
0:00:04.491746 - Subscene 0 has 8 objects and 7 cameras.
0:00:04.512850 - Wrote predicted scene (objects+cameras): /home/cyl/cosypose/local_data/custom_scenarios/example/
results/subscene=0/predicted_scene.json
0:00:04.512906 - Wrote predicted objects with pose expressed in camera frame: /home/cyl/cosypose/local_data/custo
m_scenarios/example/results/subscene=0/scene_reprojected.csv

该脚本只接收了candidates, mesh_db和camera_k信息,直接运行mv_predictor

写一个通过list输入构建candidates的function:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
def read_list_candidates_cameras(self, data_list, cameras_K_list):
"""
Creates a PandasTensorCollection from a list of candidates information.

Args:
data_list (list): Each element is a dictionary with keys:
- "candidates" (list of dict): Each candidate dictionary includes:
- "label" (str): The label of the object.
- "score" (float): The confidence score of the object.
- "pose" (torch.Tensor): A [4, 4] torch.Tensor representing the pose matrix.

Returns:
PandasTensorCollection: Contains poses and infos.
"""
all_poses = []
all_infos = []
all_K = []

# Initialize view_id to be assigned automatically
view_id = 0
scene_id = 0 # Fixed value for scene_id

for view, K in zip(data_list, cameras_K_list):
all_K.append(K)
for candidate in view["candidates"]:
label = candidate["label"]
score = candidate["score"]
pose = candidate["pose"]

# Append the pose tensor
all_poses.append(pose)

# Append the metadata
all_infos.append({
"view_id": view_id,
"scene_id": scene_id,
"score": score,
"label": label
})

# Increment view_id for the next set of candidates
view_id += 1

K_tensor = torch.stack(all_K).to(dtype=torch.float32, device="cuda:0")

# Stack poses into a single tensor
poses_tensor = torch.stack(all_poses).to(dtype=torch.float32, device="cuda:0")

# Create a Pandas DataFrame for infos
infos_df = pd.DataFrame(all_infos)
# Return the PandasTensorCollection-like structure
ptc_candidate = tc.PandasTensorCollection(poses=poses_tensor, infos=infos_df)
cam_info = infos_df.loc[:,["view_id"]]
cam_info = cam_info.drop_duplicates()
ptc_cam = tc.PandasTensorCollection(K=K_tensor, infos=cam_info)
return ptc_candidate, ptc_cam
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
# Example usage:
example_data = [
{
"candidates": [
{"label": "obj_000017", "score": 0.829675, "pose": torch.eye(4)},
{"label": "obj_000010", "score": 0.820436, "pose": torch.eye(4) * 2},
]
},
{
"candidates": [
{"label": "obj_000005", "score": 0.104478, "pose": torch.eye(4) * 3},
]
}
]
example_cameras_K = [
torch.eye(3),
torch.eye(3) * 2,
]

cd, cam= read_list_candidates(example_data, example_cameras_K)
cd, cam
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
(PandasTensorCollection(
poses: torch.Size([3, 4, 4]) torch.float32 cuda:0,
----------------------------------------
infos:
view_id scene_id score label
0 0 0 0.829675 obj_000017
1 0 0 0.820436 obj_000010
2 1 0 0.104478 obj_000005
),
PandasTensorCollection(
K: torch.Size([2, 3, 3]) torch.float32 cuda:0,
----------------------------------------
infos:
view_id
0 0
1 1
))

之后就正常调用MultiviewScenePredictor.predict_scene_state() to estimate the scene:

1
2
3
4
5
6
predictions = self.mv_predictor.predict_scene_state(candidates, cameras,
score_th=self.sv_score_th,
use_known_camera_poses=False,
ransac_n_iter= self.ransac_n_iter,
ransac_dist_threshold= self.ransac_dist_threshold,
ba_n_iter= self.ba_n_iter)

之后再使用Non-Maximum Suppression来聚合重复检出的物体

1
2
3
4
5
6
7
8
9
objects = predictions['scene/objects']
cameras = predictions['scene/cameras']
reproj = predictions['ba_output']
#print(predictions)
for view_group in np.unique(objects.infos['view_group']):
objects_ = objects[np.where(objects.infos['view_group'] == view_group)[0]]
cameras_ = cameras[np.where(cameras.infos['view_group'] == view_group)[0]]
reproj_ = reproj[np.where(reproj.infos['view_group'] == view_group)[0]]
objects_ = nms3d(objects_, th= self.nms_th, poses_attr='TWO')

最终输出objects_

1
2
3
4
5
6
7
8
9
10
11
12
13
14
PandasTensorCollection(
TWO: torch.Size([10, 4, 4]) torch.float32 cuda:0,
----------------------------------------
infos:
obj_id score label n_cand view_group group_id scene_id
0 2 5.469747 obj_000016 7 0 0 16
1 0 5.450335 obj_000017 8 0 0 16
2 4 4.098602 obj_000012 8 0 0 16
3 1 3.380887 obj_000010 6 0 0 16
4 5 2.771779 obj_000015 6 0 0 16
5 3 1.453180 obj_000011 4 0 0 16
6 9 1.183983 obj_000014 3 0 0 16
7 8 1.106775 obj_000013 2 0 0 16
)

Usage

Please refer to the notebook custom_scene.ipynb.

6d pose -- unity coordinate
Posted 2024-08-01Updated 2025-05-08Notea minute read (About 129 words)

6d pose -- unity coordinate

6d pose -> unity coordinate

unity 使用左手坐标系,普遍的 6d 算法使用右手坐标系,所以得出[R;t]后需要做一步针对 y 轴的反射变换

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
def right_to_left_hand_pose_R(R):
# 定义反射矩阵
M = torch.tensor([
[ 1, 0, 0],
[ 0, -1, 0],
[ 0, 0, 1]
], dtype=R.dtype)

# 转换旋转矩阵
R_prime = M @ R @ M

return R_prime

def right_to_left_hand_pose_t(t):
# 定义反射矩阵
M = torch.tensor([
[ 1, 0, 0],
[ 0, -1, 0],
[ 0, 0, 1]
], dtype=t.dtype)

# 转换位移向量
t_prime = M @ t

return t_prime

可以看到效果很好:

Posted 2024-08-01Updated 2025-05-08Notea minute read (About 125 words)

6-D Pose Estimation Survey

6-D Pose Estimation Survey

Model based (CAD model)

State of The Art: Foundation Pose (https://github.com/NVlabs/FoundationPose)

RGB

CASAPose (https://github.com/fraunhoferhhi/casapose?tab=readme-ov-file)

MegaPose (https://github.com/megapose6d/megapose6d)

D-RGB

MegaPose (https://github.com/megapose6d/megapose6d)

OVE6D (https://github.com/dingdingcai/OVE6D-pose)

Non-model

Gen6D (https://github.com/liuyuan-pal/Gen6D)

Yolov5 (https://github.com/cviviers/YOLOv5-6D-Pose)

Chen Yulin

Chen Yulin

SJTU student

Manchester by the Sea

Posts

259

Categories

8

Tags

187

Follow

Archives

  • May 20255
  • April 202517
  • March 202545
  • February 202512
  • January 202513
  • December 202412
  • November 20244
  • October 202418
  • September 202417
  • August 202413
  • July 20243
  • June 20245
  • May 202413
  • April 202417
  • March 20241
  • January 20241
  • December 20231
  • May 202346
  • August 20221
  • May 20226
  • April 20229

Recents

Feature Pyramid Networks for Object Detection

2025-05-08

Feature Pyramid Networks for Object Detection

Review

Write Latex in Neovim on Archlinux

2025-05-07

Write Latex in Neovim on Archlinux

Note

Davinci-resolve on Archlinux

2025-05-07

Davinci-resolve on Archlinux

Note

Deformable Convolutional Networks

2025-05-06

Deformable Convolutional Networks

Review

2025-05-01

2025 Summer Schedule

Schedule

Tags

3D-Scene4
6-D3
AI10
AIGC1
AR2
Academic1
Algorithm1
Aliyun1
App2
Atlas1
BS41
Beautify1
Behaviorism1
Business1
C1
CADC1
CD1
CLIP5
CNN1
CV27
Capstone10
Communication2
Contrastive-Learning3
Control2
Csharp9
Css1
Cuda3
DD1
DINO4
DT1
Dataframe1
Debate5
Debugger1
Diffusion1
Discrete-Mathematics1
Docker1
Docs2
Dynamic-programming1
ESP322
Education1
Embeded-System9
Embodied-AI8
Emoation1
Emotion12
Ethic1
FL1
FPN2
Family1
Federated-Learning1
Foundation1
Functional programming1
GPT3
Game5
Gated-NN2
Git7
Github1
Godot3
HPC1
HRI2
Haskell1
Health2
Hexo10
Hierarchical1
Html5
Humanism1
Hyprland2
IK1
Image-Grounding1
Image-Text5
Image-generation1
ImitationLearning3
Jolt1
Json1
LLM12
LSP2
Latex2
Life4
LinearAlgebra1
Linux20
Live2d1
Love3
Lua1
MBTI1
ML5
MR/AR3
Mason1
Math3
Meme1
Message-Passing1
Mod3
Motivation1
Movie1
Multi-modal6
Multi-view1
Music5
NLP4
NN4
Network2
Nodejs5
Numpy1
Nvim9
Object-Detection4
Open-Vocabulary9
OpenCV1
Oral1
PHD1
PSY5
Pandas2
Panoptic1
Path1
Philosophy3
PhysX1
Physical-Scene4
Physics-engine1
Pio2
Planning1
Plugin8
PoseEstimation3
Postgraduate1
Prefab1
Probability1
Python26
Pytorch1
QML1
Quantum1
RNN4
ROS3
Reading19
Real2Sim1
Reconstruct9
Regex2
Reinforcement-learning1
Repository5
Representation-Learning1
Research-paper86
Robot1
Robotics16
SJTU-Lecture1
SQL2
SSH2
Scene-graph29
Scene-synthesis1
Science-fiction1
Scrap1
Script2
Segmentation7
Semantic12
Shader3
Shell4
Signals and Systems1
Sim2Real1
Sklearn1
Snippets1
Society4
Star-rail1
Subgraph1
Submodule1
Supervised-learning2
Survey3
TC1
TOEFL1
Task-Planning6
Tasks4
Tech Communication1
Torch4
Transformer11
Translation-Embedding2
Travel2
Unity20
Unsupervised-learning1
VLM5
VLP2
Version-management1
ViT4
VideoEditing2
Vim1
Visual-Relation20
WSL1
Waybar1
Wayland1
Web1
Website1
Well-being1
Window-manager2
YKLL3
Zen2
🐱1
Chen Yulin's BlogChen Yulin's Blog

© 2025 Chen Yulin  Powered by Hexo & Icarus

×