Posted 2024-12-14Updated 2025-02-19Note2 minutes read (About 308 words)

nvidia-smi返回的是driver所能支持的最新的cuda版本
系统安装的cuda版本可以随意，torch会优先使用虚拟环境中安装的cuda版本

Conda管理Cuda

安装指定版本

1	conda install nvidia/label/cuda-12.4.0::cuda-toolkit -c nvidia/label/cuda-12.4.0

安装最新版本

1	conda install cuda-toolkit

某些仓库需要指定cuda路径才能编译包

1
2
3

conda env config vars set LD_LIBRARY_PATH="/home/cyl/miniconda3/envs/gsam/lib/python3.10/site-packages/nvidia/cuda_runtime/lib/:$LD_LIBRARY_PATH"
conda env config vars set CPATH="/home/cyl/miniconda3/envs/gsam/lib/python3.10/site-packages/nvidia/cuda_runtime/include/:$CPATH"
conda env config vars set CUDA_HOME="/home/cyl/miniconda3/envs/gsam/"

To find the correct path for CUDA_HOME use which nvcc. In my case, output of the command was:

1 2	>>> which nvcc /home/user/miniconda3/envs/py12/bin/nvcc

Threefore, I set the CUDA_HOME as /home/user/miniconda3/envs/py12/.

Note2: To find the correct path for LD_LIBRARY_PATH use find ~ -name cuda_runtime_api.h. In my case, output of the command was:

>>> find ~ -name cuda_runtime_api.h
...
/home/user/miniconda3/envs/py12/targets/x86_64-linux/include/cuda_runtime_api.h
...

So I set the LD_LIBRARY_PATH as /home/user/miniconda3/envs/py12/targets/x86_64-linux/lib/ and CPATH as /home/user/miniconda3/envs/py12/targets/x86_64-linux/include/. If you have multiple CUDA installations, the output of find ~ -name cuda_runtime_api.h will display multiple paths. Make sure to choose the path that corresponds to the environment you have created.

ref:https://github.com/IDEA-Research/GroundingDINO/issues/355

Posted 2024-12-10Updated 2025-02-19Notea minute read (About 174 words)

Use SSH to Connect Jupyter-lab

使用ssh作为命令行远程工具，启动远程的jupyter lab并且在本地的浏览器中打开。

远程运行：

1	jupyter lab --no-browser --port=8080

--no-broswer is very important

output:

...
[I 2024-12-10 14:30:24.585 ServerApp]     http://127.0.0.1:8080/lab?token=0061d1eb31396b1bc3cd77a7161b2084da1dedcdeca0600c
[I 2024-12-10 14:30:24.586 ServerApp] Use Control-C to stop this server and shut down all kernels (twice to skip confirmation).
[C 2024-12-10 14:30:24.603 ServerApp]

    To access the server, open this file in a browser:
        file:///home/bohanfeng/.local/share/jupyter/runtime/jpserver-11659-open.html
    Or copy and paste one of these URLs:
        http://localhost:8080/lab?token=0061d1eb31396b1bc3cd77a7161b2084da1dedcdeca0600c
        http://127.0.0.1:8080/lab?token=0061d1eb31396b1bc3cd77a7161b2084da1dedcdeca0600c

本地运行：

1	ssh -L 8080:localhost:8080 bohanfeng@192.168.2.102

本地浏览器访问：
http://127.0.0.1:8080/lab?token=0061d1eb31396b1bc3cd77a7161b2084da1dedcdeca0600c

Posted 2024-12-03Updated 2025-02-19Note4 minutes read (About 609 words)

FLamby

Repository: https://github.com/owkin/FLamby

Installation

git clone https://github.com/owkin/FLamby.git
cd FLamby
conda env create -f environment.yml
conda activate flamby
pip install -e .[all_extra]
pip install wget
pip install lifelines
pip install jupyterlab

Dataset

Fed-TCGA-BCRA
https://owkin.github.io/FLamby/fed_tcga_brca.html

Baseline Learning

import torch
from flamby.utils import evaluate_model_on_tests

# 2 lines of code to change to switch to another dataset
from flamby.datasets.fed_tcga_brca import (
    BATCH_SIZE,
    LR,
    NUM_EPOCHS_POOLED,
    Baseline,
    BaselineLoss,
    metric,
    NUM_CLIENTS,
    Optimizer,
)
from flamby.datasets.fed_tcga_brca import FedTcgaBrca as FedDataset

Import several macros, datasets and metrics.

# Instantiation of local train set (and data loader)), baseline loss function, baseline model, default optimizer
train_dataset = FedDataset(center=0, train=True, pooled=False)
train_dataloader = torch.utils.data.DataLoader(train_dataset, batch_size=BATCH_SIZE, shuffle=True, num_workers=0)
lossfunc = BaselineLoss()
model = Baseline()
optimizer = Optimizer(model.parameters(), lr=LR)

In this script, the pooled parameter is set to False when creating the FedDataset instances. This indicates that the dataset is not pooled, meaning that the data is kept separate for each client or center. Each client or center has its own local dataset, which is a common setup in federated learning to simulate real-world scenarios where data is distributed across different locations or devices.

# Traditional pytorch training loop
for epoch in range(0, NUM_EPOCHS_POOLED):
    for idx, (X, y) in enumerate(train_dataloader):
        optimizer.zero_grad()
        outputs = model(X)
        loss = lossfunc(outputs, y)
        loss.backward()
        optimizer.step()

正常的训练流程

# Evaluation
# Instantiation of a list of the local test sets
test_dataloaders = [
            torch.utils.data.DataLoader(
                FedDataset(center=i, train=False, pooled=False),
                batch_size=BATCH_SIZE,
                shuffle=False,
                num_workers=0,
            )
            for i in range(NUM_CLIENTS)
        ]
# Function performing the evaluation
dict_cindex = evaluate_model_on_tests(model, test_dataloaders, metric)
print(dict_cindex)

使用的evaluation metric是lifelines.utils.concordance_index，返回的是c_index

Federated Learning

import torch
from flamby.utils import evaluate_model_on_tests

# 2 lines of code to change to switch to another dataset
from flamby.datasets.fed_tcga_brca import (
    BATCH_SIZE,
    LR,
    NUM_EPOCHS_POOLED,
    Baseline,
    BaselineLoss,
    metric,
    NUM_CLIENTS,
    get_nb_max_rounds
)
from flamby.datasets.fed_tcga_brca import FedTcgaBrca as FedDataset

# 1st line of code to change to switch to another strategy
from flamby.strategies.fed_avg import FedAvg as strat

use `FedAvg` as strategy

# We loop on all the clients of the distributed dataset and instantiate associated data loaders
train_dataloaders = [
            torch.utils.data.DataLoader(
                FedDataset(center = i, train = True, pooled = False),
                batch_size = BATCH_SIZE,
                shuffle = True,
                num_workers = 0
            )
            for i in range(NUM_CLIENTS)
        ]

lossfunc = BaselineLoss()
m = Baseline()

# Federated Learning loop
# 2nd line of code to change to switch to another strategy (feed the FL strategy the right HPs)
args = {
            "training_dataloaders": train_dataloaders,
            "model": m,
            "loss": lossfunc,
            "optimizer_class": torch.optim.SGD,
            "learning_rate": LR / 10.0,
            "num_updates": 100,
# This helper function returns the number of rounds necessary to perform approximately as many
# epochs on each local dataset as with the pooled training
            "nrounds": get_nb_max_rounds(100),
        }
s = strat(**args)
m = s.run()[0]

# Evaluation
# We only instantiate one test set in this particular case: the pooled one
test_dataloaders = [
            torch.utils.data.DataLoader(
                FedDataset(train = False, pooled = True),
                batch_size = BATCH_SIZE,
                shuffle = False,
                num_workers = 0,
            )
        ]
dict_cindex = evaluate_model_on_tests(m, test_dataloaders, metric)
print(dict_cindex)

FedAvg vs FedAvgFineTuning

FedAvg

FedAvgFineTuning

Posted 2024-11-17Updated 2025-02-19Note18 minutes read (About 2722 words)

Cosypose modification

Setup

仓库: https://github.com/Simple-Robotics/cosypose

1
2
3

git clone --recurse-submodules https://github.com/Simple-Robotics/cosypose.git
cd cosypose
conda env create -n cosypose --file environment.yaml

注意执行这一步的时候pip 会提示setuptools 和matplotlib-inline不符合3.7.6的python，到环境中手动安装适配的版本

1
2
3

conda activate cosypose
pip install setuptools==63.4.1
pip install matplotlib-inline==0.1.6

1
2
3

git lfs pull
python setup.py install
python setup.py develop

根据README下载数据
注意第一块指令无法下载成功，由 https://bop.felk.cvut.cz/datasets/ 得知下载链接迁移到了huggingface, https://huggingface.co/datasets/bop-benchmark/datasets/tree/main/ycbv 可以从这里手动下载测试集并放置到local_data/bop_datasets/ycbv/test

设置测试使用的models

1	cp ./local_data/bop_datasets/ycbv/model_bop_compat_eval ./local_data/bop_datasets/ycbv/models

Debug

`np.where(mask)[0].item()`

运行

1 2	export CUDA_VISIBLE_DEVICES=0 python -m cosypose.scripts.run_cosypose_eval --config ycbv

时出现报错

Traceback (most recent call last):
  File "/home/cyl/.conda/envs/cosypose/lib/python3.7/runpy.py", line 193, in _run_module_as_main
    "__main__", mod_spec)
  File "/home/cyl/.conda/envs/cosypose/lib/python3.7/runpy.py", line 85, in _run_code
    exec(code, run_globals)
  File "/home/cyl/cosypose/cosypose/scripts/run_cosypose_eval.py", line 491, in <module>
    main()
  File "/home/cyl/cosypose/cosypose/scripts/run_cosypose_eval.py", line 332, in main
    scene_ds = make_scene_dataset(ds_name)
  File "/home/cyl/cosypose/cosypose/datasets/datasets_cfg.py", line 68, in make_scene_dataset
    ids.append(np.where(mask)[0].item())
ValueError: can only convert an array of size 1 to a Python scalar

添加debug输出，得到

Debug - scene_id: 48, view_id: 1
Debug - mask matches: 1
Debug - where result shape: (1,), values: [225]
Debug - scene_id: 48, view_id: 36
Debug - mask matches: 1
Debug - where result shape: (1,), values: [226]
Debug - scene_id: 48, view_id: 47
Debug - mask matches: 1
Debug - where result shape: (1,), values: [227]
Debug - scene_id: 48, view_id: 83
Debug - mask matches: 1
Debug - where result shape: (1,), values: [228]
Debug - scene_id: 48, view_id: 112
Debug - mask matches: 1
Debug - where result shape: (1,), values: [229]
Debug - scene_id: 48, view_id: 135
Debug - mask matches: 0
Debug - where result shape: (0,), values: []
0:00:00.912023 - Expected exactly one match, got 0 matches for scene_id=48, view_id=135

发现是下载的测试数据集并不包含数据集keyframe.txt中所有的帧，导致一些关键帧识别不到

运行到一半被终止的情况

如果想重新开始新的训练：清空local_data/joblib_cache

Framework

Prediction Script `cosypose.scripts.run_cosypose_eval`

AI explanation

The script predicts object poses based on multi-view input by following these steps:

Dataset Loading: It first loads the dataset using the make_scene_dataset function, which prepares the scene data for evaluation. The dataset is wrapped in a MultiViewWrapper to handle multiple views.
Model Loading: The script loads pre-trained models for pose prediction using the load_models function. It loads both coarse and refiner models based on the configuration specified in the command-line arguments.
Prediction Setup: The script sets up the prediction parameters, including the number of iterations for coarse and refiner models, and whether to skip multi-view processing based on the number of views specified.
Multi-view Prediction: The MultiviewScenePredictor is initialized with the mesh database, which is used to predict poses across multiple views. The MultiviewPredictionRunner is then used to run predictions on the dataset, leveraging the multi-view setup to improve pose estimation accuracy.
Pose Estimation: The script uses the loaded models to predict object poses. It processes detections from either pix2pose or posecnn depending on the dataset, and refines these predictions using the refiner model.
Evaluation: After predictions, the script evaluates the predicted poses using the PoseEvaluation class. It calculates various metrics like ADD-S and AUC to assess the accuracy of the pose predictions.
Results Logging: Finally, the script logs the results, including evaluation metrics, and saves them to a specified directory.

The multi-view approach allows the script to leverage information from different viewpoints, which can help resolve ambiguities and improve the robustness of the pose estimation.

Prediction Script `run_custom_scenario`

Terms

TCO

Transformation from Camera to Object.
It represents the transformation matrix or parameters that describe the pose of an object relative to the camera’s coordinate system

TWO

Transformation from World to Object.
It represents the transformation matrix or parameters that describe the pose of an object relative to the world’s coordinate system

Model dataset

class MeshDataBase:
    def __init__(self, obj_list):
        self.infos = {obj['label']: obj for obj in obj_list}
        self.meshes = {l: trimesh.load(obj['mesh_path']) for l, obj in self.infos.items()}

    @staticmethod
    def from_object_ds(object_ds):
        obj_list = [object_ds[n] for n in range(len(object_ds))]
        return MeshDataBase(obj_list)
...

一般使用的初始化方式：

1 2	object_ds = BOPObjectDataset(scenario_dir / 'models') mesh_db = MeshDataBase.from_object_ds(object_ds)

也可以通过load models一起加载：

1	predictor, mesh_db = load_models(coarse_run_id, refiner_run_id, n_workers=n_plotters, object_set=object_set)

Important Classes

`Multiview_wrapper`

作用：
读取 scene_dataset 并且通过视角数量n_views来分割这些数据为不同场景，然后方便遍历其中的场景元素（这里都是ground truth）
遍历时返回的值为

n_views张不同视角下的RGB图像
n_views张对应的mask

n_views份对应的observation

识别到的物体位姿和类型
相机位姿和内参

frame_info，没太多用

1 2	scene_ds_pred = MultiViewWrapper(scene_ds, n_views=n_views) scene_ds_pred[0][2] # scene48 multiview_group1 's observations in five views

[
 {'objects': 
  [
   {'label': 'obj_000001',
    'name': 'obj_000001',
    'TWO': array([[-0.02062261, -0.99870347, -0.04654345, -0.05380909],
           [ 0.99854439, -0.022895  ,  0.04883047,  0.00189095],
           [-0.04983272, -0.04546878,  0.9977229 ,  0.07060698],
           [ 0.        ,  0.        ,  0.        ,  1.        ]]),
    'T0O': array([[-0.02062261, -0.99870347, -0.04654345, -0.05380909],
           [ 0.99854439, -0.022895  ,  0.04883047,  0.00189095],
           [-0.04983272, -0.04546878,  0.9977229 ,  0.07060698],
           [ 0.        ,  0.        ,  0.        ,  1.        ]]),
    'visib_fract': 0.7769277845777234,
    'id_in_segm': 1,
    'bbox': [347, 210, 467, 374]},
   {'label': 'obj_000006',
    'name': 'obj_000006',
    'TWO': array([[-0.40056693,  0.91475543, -0.05262471,  0.03103553],
           [-0.91622629, -0.39934108,  0.03248866, -0.02365388],
           [ 0.00870386,  0.06123014,  0.9980863 ,  0.01391488],
           [ 0.        ,  0.        ,  0.        ,  1.        ]]),
    'T0O': array([[-0.40056693,  0.91475543, -0.05262471,  0.03103553],
           [-0.91622629, -0.39934108,  0.03248866, -0.02365388],
           [ 0.00870386,  0.06123014,  0.9980863 ,  0.01391488],
           [ 0.        ,  0.        ,  0.        ,  1.        ]]),
    'visib_fract': 0.9990349353406678,
    'id_in_segm': 2,
    'bbox': [328, 343, 422, 405]},
   {'label': 'obj_000014',
    'name': 'obj_000014',
    'TWO': array([[ 0.24178672, -0.96941339, -0.04215706, -0.05206396],
           [ 0.96977496,  0.2399519 ,  0.0442575 ,  0.0179453 ],
           [-0.03278805, -0.05158388,  0.99813144,  0.16636215],
           [ 0.        ,  0.        ,  0.        ,  1.        ]]),
    'T0O': array([[ 0.24178672, -0.96941339, -0.04215706, -0.05206396],
           [ 0.96977496,  0.2399519 ,  0.0442575 ,  0.0179453 ],
           [-0.03278805, -0.05158388,  0.99813144,  0.16636215],
           [ 0.        ,  0.        ,  0.        ,  1.        ]]),
    'visib_fract': 0.9938250428816466,
    'id_in_segm': 3,
    'bbox': [372, 143, 490, 241]},
   {'label': 'obj_000019',
    'name': 'obj_000019',
    'TWO': array([[-0.69888905,  0.1926738 , -0.68878937,  0.01412755],
           [ 0.711967  ,  0.27928957, -0.64428215,  0.05127768],
           [ 0.06823575, -0.94067797, -0.33237011,  0.06472594],
           [ 0.        ,  0.        ,  0.        ,  1.        ]]),
    'T0O': array([[-0.69888905,  0.1926738 , -0.68878937,  0.01412755],
           [ 0.711967  ,  0.27928957, -0.64428215,  0.05127768],
           [ 0.06823575, -0.94067797, -0.33237011,  0.06472594],
           [ 0.        ,  0.        ,  0.        ,  1.        ]]),
    'visib_fract': 0.9890470974808324,
    'id_in_segm': 4,
    'bbox': [419, 222, 527, 410]},
   {'label': 'obj_000020',
    'name': 'obj_000020',
    'TWO': array([[-0.74512542, -0.66691536,  0.00352083,  0.07854437],
           [-0.6669148 ,  0.74507458, -0.00940455, -0.15283599],
           [ 0.00364864, -0.00935569, -0.99995023,  0.01854317],
           [ 0.        ,  0.        ,  0.        ,  1.        ]]),
    'T0O': array([[-0.74512542, -0.66691536,  0.00352083,  0.07854437],
           [-0.6669148 ,  0.74507458, -0.00940455, -0.15283599],
           [ 0.00364864, -0.00935569, -0.99995023,  0.01854317],
           [ 0.        ,  0.        ,  0.        ,  1.        ]]),
    'visib_fract': 0.9953060637992145,
    'id_in_segm': 5,
    'bbox': [92, 328, 288, 442]}],
  'camera': 
  {'T0C': array([[-0.0792652 ,  0.241296  , -0.967209  ,  0.946419  ],
          [ 0.996102  ,  0.0568396 , -0.0674529 , -0.02116569],
          [ 0.0386997 , -0.968786  , -0.244861  ,  0.36645836],
          [ 0.        ,  0.        ,  0.        ,  1.        ]]),
   'K': array([[1.066778e+03, 0.000000e+00, 3.129869e+02],
          [0.000000e+00, 1.067487e+03, 2.413109e+02],
          [0.000000e+00, 0.000000e+00, 1.000000e+00]]),
   'TWC': array([[-0.0792652 ,  0.241296  , -0.967209  ,  0.946419  ],
          [ 0.996102  ,  0.0568396 , -0.0674529 , -0.02116569],
          [ 0.0386997 , -0.968786  , -0.244861  ,  0.36645836],
          [ 0.        ,  0.        ,  0.        ,  1.        ]]),
   'resolution': torch.Size([480, 640])},
  'frame_info': 
  {
   'scene_id': 48,
   'cam_id': 'cam',
   'view_id': 1626,
   'cam_name': 'cam',
   'group_id': 0
   }
  },
  ... # other views
]

`MultiviewPredictorRunner`

作用：
接收Multiview_wrapper作为输入，并做出预测

首先是数据集接收：

dataloader = DataLoader(scene_ds, batch_size=batch_size,
						num_workers=n_workers,
						sampler=sampler,
						collate_fn=self.collate_fn)

use collate_fn to process the row data （最后的注释里面有真正用到的数据）

def collate_fn(self, batch):
	batch_im_id = -1

	cam_infos, K = [], []
	det_infos, bboxes = [], []
	for n, data in enumerate(batch): # normally only one batch
		assert n == 0
		images, masks, obss = data
		for c, obs in enumerate(obss): # iterate along different views
			batch_im_id += 1
			frame_info = obs['frame_info']
			im_info = {k: frame_info[k] for k in ('scene_id', 'view_id', 'group_id')} # info for the image
			im_info.update(batch_im_id=batch_im_id)
			cam_info = im_info.copy() # info for camera

			K.append(obs['camera']['K']) # info for 相机内参
			cam_infos.append(cam_info)

			for o, obj in enumerate(obs['objects']):
				obj_info = dict(
					label=obj['name'],
					score=1.0,
				)
				obj_info.update(im_info) # add key-value pair from im_info to obj_info
				bboxes.append(obj['bbox'])
				det_infos.append(obj_info)

	gt_detections = tc.PandasTensorCollection(
		infos=pd.DataFrame(det_infos),
		bboxes=torch.as_tensor(np.stack(bboxes)),
	) # 包括每一个ground truthdetection的的基本info,和检测框 
	cameras = tc.PandasTensorCollection(
		infos=pd.DataFrame(cam_infos),
		K=torch.as_tensor(np.stack(K)),
	)# 包括每一view 相机的基本info（和detection info相同）,和内参
	data = dict(
		images=images,
		cameras=cameras,
		gt_detections=gt_detections,
	)
	return data

最重要的function: get_predictions

def get_predictions(self, pose_predictor, mv_predictor,
					detections=None,
					n_coarse_iterations=1, n_refiner_iterations=1,
					sv_score_th=0.0, skip_mv=True,
					use_detections_TCO=False):

Responsible for generating predictions for object poses in a scene using both single-view and multi-view approaches.

Input Parameters:
- pose_predictor: single view predictor，比如ycbv数据集用的就是posecnn的检测模型
- mv_predictor: An object or function that predicts scene states using multi-view information.
- detections: A collection of detected objects with associated information, pre-generated and saved in a .pkl file
- n_coarse_iterations, n_refiner_iterations: Number of iterations for coarse and refinement pose estimation.
- sv_score_th: Score threshold for single-view detections.
- skip_mv: A flag to skip multi-view predictions.
- use_detections_TCO: A flag to use detections for initial pose estimation.
Filtering Detections:
需要注意的是这里使用的detection是直接来自预存好的检测数据（非ground truth）
1
posecnn_detections = load_posecnn_results()
- The function filters the input detections based on the sv_score_th threshold.
- It assigns a unique detection ID to each detection and creates an index based on scene_id and view_id.
Iterating Over Data:
- The function iterates over batches of data from the dataloader.
- For each batch, it extracts images, camera information, and ground truth detections.
Matching Detections:
- It matches the detections with the current batch of data using the index created earlier.
- It filters and prepares the detections for processing.
Pose Prediction:
- If there are detections, it uses the pose_predictor to get single-view predictions.
- It registers the initial bounding boxes with the candidates.
Multi-View Prediction:
- If skip_mv is False, it uses the mv_predictor to predict the scene state using multi-view information.
Collecting Predictions:
- It collects the single-view and multi-view predictions into a dictionary.
Concatenating Results:
- It concatenates the predictions across all batches and returns the final predictions.

`MultiviewScenePredictor`

作用：
used by Myltiview_PredictionRunner.get_predictions
In run_cosypose_eval we initialize MultiviewScenePredictor in this way:

1	mv_predictor = MultiviewScenePredictor(mesh_db)

In the MultiviewScenePredictor we use the mesh_db to initialize MultiviewRefinement and solve:

problem = MultiviewRefinement(candidates=candidates_n,
                    cameras=cameras,
	                pairs_TC1C2=pairs_TC1C2,
	                mesh_db=self.mesh_db_ba)
ba_outputs = problem.solve(
	n_iterations=ba_n_iter,
	optimize_cameras=not use_known_camera_poses,
)

The solve function of MultiviewRefinement:

def solve(self, sample_n_init=1, **lm_kwargs):
	timer_init = Timer()
	timer_opt = Timer()
	timer_misc = Timer()

	timer_init.start()
	TWO_9d_init, TCW_9d_init = self.robust_initialization_TWO_TCW(n_init=sample_n_init)
	timer_init.pause()

	timer_opt.start()
	TWO_9d_opt, TCW_9d_opt, history = self.optimize_lm(
		TWO_9d_init, TCW_9d_init, **lm_kwargs)
	timer_opt.pause()

	timer_misc.start()
	objects, cameras = self.make_scene_infos(TWO_9d_opt, TCW_9d_opt)
	objects_init, cameras_init = self.make_scene_infos(TWO_9d_init, TCW_9d_init)
	history = self.convert_history(history)
	timer_misc.pause()

	outputs = dict(
		objects_init=objects_init,
		cameras_init=cameras_init,
		objects=objects,
		cameras=cameras,
		history=history,
		time_init=timer_init.stop(),
		time_opt=timer_opt.stop(),
		time_misc=timer_misc.stop(),
	)
	return outputs

Adaption

准备基于run_custom_scenario进行修改
run_custom_scenario的使用方式：

1	python -m cosypose.scripts.run_custom_scenario --scenario=example

Setting OMP and MKL num threads to 1.
pybullet build time: Jan 28 2022 20:13:03
0:00:00.000859 - -----------------------------------------------
---------------------------------
0:00:00.000921 - scenario: example
0:00:00.000942 - sv_score_th: 0.3
0:00:00.000956 - n_symmetries_rot: 64
0:00:00.000956 - n_symmetries_rot: 64
0:00:00.000968 - ransac_n_iter: 2000
0:00:00.000980 - ransac_dist_threshold: 0.02
0:00:00.001002 - nms_th: 0.04
0:00:00.001015 - no_visualization: False
0:00:00.001026 - -----------------------------------------------
---------------------------------
0:00:00.569089 - Loaded 796 candidates in 8 views.
0:00:00.570278 - Loaded cameras intrinsics.
0:00:00.690990 - Loaded 30 3D object models.
0:00:00.691047 - Running stage 2 and 3 of CosyPose...
0:00:01.145408 - Num candidates: 107
0:00:01.145468 - Num views: 8
0:00:01.145728 - Estimating camera poses using RANSAC.
0:00:04.588304 - Matched candidates: 49
0:00:04.588375 - RANSAC time_models: 0:00:02.390068
0:00:04.588398 - RANSAC time_score: 0:00:00.990740
0:00:04.588415 - RANSAC time_misc: 0:00:00.061626
0:00:04.902268 - BA time_init: 0:00:00.005349
0:00:04.902333 - BA time_opt: 0:00:00.091822
0:00:04.902351 - BA time_misc: 0:00:00.004793
0:00:04.491746 - Subscene 0 has 8 objects and 7 cameras.
0:00:04.512850 - Wrote predicted scene (objects+cameras): /home/cyl/cosypose/local_data/custom_scenarios/example/
results/subscene=0/predicted_scene.json
0:00:04.512906 - Wrote predicted objects with pose expressed in camera frame: /home/cyl/cosypose/local_data/custo
m_scenarios/example/results/subscene=0/scene_reprojected.csv

该脚本只接收了candidates, mesh_db和camera_k信息，直接运行mv_predictor

写一个通过list输入构建candidates的function:

def read_list_candidates_cameras(self, data_list, cameras_K_list):
	"""
	Creates a PandasTensorCollection from a list of candidates information.

	Args:
		data_list (list): Each element is a dictionary with keys:
			- "candidates" (list of dict): Each candidate dictionary includes:
				- "label" (str): The label of the object.
				- "score" (float): The confidence score of the object.
				- "pose" (torch.Tensor): A [4, 4] torch.Tensor representing the pose matrix.

	Returns:
		PandasTensorCollection: Contains poses and infos.
	"""
	all_poses = []
	all_infos = []
	all_K = []

	# Initialize view_id to be assigned automatically
	view_id = 0
	scene_id = 0  # Fixed value for scene_id

	for view, K in zip(data_list, cameras_K_list):
		all_K.append(K)
		for candidate in view["candidates"]:
			label = candidate["label"]
			score = candidate["score"]
			pose = candidate["pose"]

			# Append the pose tensor
			all_poses.append(pose)

			# Append the metadata
			all_infos.append({
				"view_id": view_id,
				"scene_id": scene_id,
				"score": score,
				"label": label
			})

		# Increment view_id for the next set of candidates
		view_id += 1

	K_tensor = torch.stack(all_K).to(dtype=torch.float32, device="cuda:0")

	# Stack poses into a single tensor
	poses_tensor = torch.stack(all_poses).to(dtype=torch.float32, device="cuda:0")

	# Create a Pandas DataFrame for infos
	infos_df = pd.DataFrame(all_infos)
	# Return the PandasTensorCollection-like structure
	ptc_candidate = tc.PandasTensorCollection(poses=poses_tensor, infos=infos_df)
	cam_info = infos_df.loc[:,["view_id"]]
	cam_info = cam_info.drop_duplicates()
	ptc_cam = tc.PandasTensorCollection(K=K_tensor, infos=cam_info)
	return ptc_candidate, ptc_cam

# Example usage:
example_data = [
    {
        "candidates": [
            {"label": "obj_000017", "score": 0.829675, "pose": torch.eye(4)},
            {"label": "obj_000010", "score": 0.820436, "pose": torch.eye(4) * 2},
        ]
    },
    {
        "candidates": [
            {"label": "obj_000005", "score": 0.104478, "pose": torch.eye(4) * 3},
        ]
    }
]
example_cameras_K = [
    torch.eye(3),
    torch.eye(3) * 2,
]

cd, cam= read_list_candidates(example_data, example_cameras_K)
cd, cam

(PandasTensorCollection(
     poses: torch.Size([3, 4, 4]) torch.float32 cuda:0,
 ----------------------------------------
     infos:
    view_id  scene_id     score       label
 0        0         0  0.829675  obj_000017
 1        0         0  0.820436  obj_000010
 2        1         0  0.104478  obj_000005
 ),
 PandasTensorCollection(
     K: torch.Size([2, 3, 3]) torch.float32 cuda:0,
 ----------------------------------------
     infos:
    view_id
 0        0
 1        1
 ))

之后就正常调用MultiviewScenePredictor.predict_scene_state() to estimate the scene:

predictions = self.mv_predictor.predict_scene_state(candidates, cameras,
									   score_th=self.sv_score_th,
									   use_known_camera_poses=False,
									   ransac_n_iter= self.ransac_n_iter,
									   ransac_dist_threshold= self.ransac_dist_threshold,
									   ba_n_iter= self.ba_n_iter)

之后再使用Non-Maximum Suppression来聚合重复检出的物体

objects = predictions['scene/objects']
cameras = predictions['scene/cameras']
reproj = predictions['ba_output']
#print(predictions)
for view_group in np.unique(objects.infos['view_group']):
	objects_ = objects[np.where(objects.infos['view_group'] == view_group)[0]]
	cameras_ = cameras[np.where(cameras.infos['view_group'] == view_group)[0]]
	reproj_ = reproj[np.where(reproj.infos['view_group'] == view_group)[0]]
	objects_ = nms3d(objects_, th= self.nms_th, poses_attr='TWO')

最终输出objects_

PandasTensorCollection(
    TWO: torch.Size([10, 4, 4]) torch.float32 cuda:0,
----------------------------------------
    infos:
   obj_id     score       label  n_cand  view_group  group_id  scene_id
0       2  5.469747  obj_000016       7           0         0        16
1       0  5.450335  obj_000017       8           0         0        16
2       4  4.098602  obj_000012       8           0         0        16
3       1  3.380887  obj_000010       6           0         0        16
4       5  2.771779  obj_000015       6           0         0        16
5       3  1.453180  obj_000011       4           0         0        16
6       9  1.183983  obj_000014       3           0         0        16
7       8  1.106775  obj_000013       2           0         0        16
)

Usage

Please refer to the notebook custom_scene.ipynb.

Posted 2024-08-01Updated 2025-02-19Notea minute read (About 210 words)

程序接口定义

Python->Unity

代码仓库：

https://github.com/Chen-Yulin/Unity-Python-UDP-Communication

传输的数据为字符串：

1
2
3

def SendData(self, strToSend):
    # Use this function to send string to C#
    self.udpSock.sendto(bytes(strToSend,'utf-8'), (self.udpIP, self.udpSendPort))

物体识别信息

需要包含的信息：物体种类，物体的三轴方位，三轴旋转，三轴尺寸。

格式：

{Object Detection}
category
position.x
position.y
position.z
eulerRotation.x
eulerRotation.y
eulerRotation.z
scale.x
scale.y
scale.z

示例：

{Object Detection}
Handle
0.1
0.052
0.026
0
90
0
0.5
0.3
1

机械臂实时角度

需要包含的信息：6 个 joint 角度(单位为度)

格式：

{Current Joint}
j0
j1
j2
j3
j4
j5
j6

Unity->Python

机械臂需要旋转的角度

需要包含的信息：6 个 joint 角度(单位为度)

格式：

{Target Joint}
j0
j1
j2
j3
j4
j5
j6

Posted 2024-08-01Updated 2025-02-19Notea few seconds read (About 30 words)

Python-Unity Network

Python 和 Unity 间通讯

https://github.com/Siliconifier/Python-Unity-Socket-Communication

Posted 2024-08-01Updated 2025-02-19Notea minute read (About 169 words)

自定义 Gen6d 物体

仓库：https://github.com/liuyuan-pal/Gen6D

手册：https://github.com/liuyuan-pal/Gen6D/blob/main/custom_object.md

步骤指令：

python prepare.py --action video2image --input data/custom/part1/ref.mp4 --output data/custom/part1/images --frame_inter 10 --image_size 960  

python prepare.py --action sfm --database_name custom/part1 --colmap D:\COLMAP\COLMAP-3.7-windows-cuda\COLMAP.bat        

# do some cloudcompare thing

python predict.py --cfg configs/gen6d_pretrain.yaml --database custom/part1 --video data/custom/part1/test.mp4 --resolution 460 --output data/custom/part1/test --ffmpeg ffmpeg

ffmpeg -framerate 30 -i %d-bbox.jpg -c:v libx264 -r 30 -pix_fmt yuv420p output.mp4

关于判定不准确怎么解决：https://github.com/liuyuan-pal/Gen6D/issues/29

Posted 2024-08-01Updated 2025-02-19Notea minute read (About 129 words)

6d pose -- unity coordinate

6d pose -> unity coordinate

unity 使用左手坐标系，普遍的 6d 算法使用右手坐标系，所以得出[R;t]后需要做一步针对 y 轴的反射变换

def right_to_left_hand_pose_R(R):
    # 定义反射矩阵
    M = torch.tensor([
        [ 1,  0,  0],
        [ 0, -1,  0],
        [ 0,  0,  1]
    ], dtype=R.dtype)
    
    # 转换旋转矩阵
    R_prime = M @ R @ M
    
    return R_prime

def right_to_left_hand_pose_t(t):
    # 定义反射矩阵
    M = torch.tensor([
        [ 1,  0,  0],
        [ 0, -1,  0],
        [ 0,  0,  1]
    ], dtype=t.dtype)

    # 转换位移向量
    t_prime = M @ t

    return t_prime

可以看到效果很好：

Posted 2024-08-01Updated 2025-02-19Note2 minutes read (About 317 words)

夹爪控制方式

使用 pyRobotiqGripper

但是只兼容 linux 电脑的串口，所以部署在笔记本上并且创建一个局域网服务器供台式机调用。

import pyRobotiqGripper
import time
from flask import Flask, request

app = Flask(__name__)
gripper = pyRobotiqGripper.RobotiqGripper()
gripper.activate()

@app.route('/open_gripper', methods=['POST'])
def open_gripper():
    # 执行脚本
    # import subprocess
    # subprocess.call(['/path/to/your/script.sh'])
    gripper.open()
    return 'gripper open'
@app.route('/close_gripper', methods=['POST'])
def close_gripper():
    # 执行脚本
    # import subprocess
    # subprocess.call(['/path/to/your/script.sh'])
    gripper.close()
    return 'gripper closed'

if __name__ == '__main__':
    app.run(host='0.0.0.0', port=5000)

# gripper = pyRobotiqGripper.RobotiqGripper()
# gripper.activate()
# gripper.close()
# gripper.open()
# time.sleep(3)
# gripper.close()

(base) cyl@arch ~/450> python gripper_test.py
Activation completed. Activation time :  1.9049019813537598
 * Serving Flask app 'gripper_test'
 * Debug mode: off
WARNING: This is a development server. Do not use it in a production deployment. Use a production 
WSGI server instead.
 * Running on all addresses (0.0.0.0)
 * Running on http://127.0.0.1:5000
 * Running on http://192.168.2.111:5000
Press CTRL+C to quit
192.168.2.103 - - [20/Jul/2024 16:26:01] "POST /open_gripper HTTP/1.1" 200 -
192.168.2.103 - - [20/Jul/2024 16:26:05] "POST /close_gripper HTTP/1.1" 200 -
192.168.2.103 - - [20/Jul/2024 16:26:07] "POST /close_gripper HTTP/1.1" 200 -
192.168.2.103 - - [20/Jul/2024 16:26:10] "POST /close_gripper HTTP/1.1" 200 -
192.168.2.103 - - [20/Jul/2024 16:26:13] "POST /close_gripper HTTP/1.1" 200 -
192.168.2.103 - - [20/Jul/2024 16:26:16] "POST /close_gripper HTTP/1.1" 200 -
192.168.2.103 - - [20/Jul/2024 16:26:18] "POST /close_gripper HTTP/1.1" 200 -
192.168.2.103 - - [20/Jul/2024 16:26:21] "POST /close_gripper HTTP/1.1" 200 -
192.168.2.103 - - [20/Jul/2024 16:26:24] "POST /close_gripper HTTP/1.1" 200 -

台式机通过

# 定义要发送的命令和URL
laptop_ip = "192.168.2.111"
close_url = "http://"+laptop_ip+":5000/close_gripper"
open_url = "http://"+laptop_ip+":5000/open_gripper"

# 使用curl命令通过POST请求发送命令
curl_close_command = [
    'curl', 
    '-X', 'POST', 
    close_url, 
]

curl_open_command = [
    'curl', 
    '-X', 'POST', 
    open_url, 
]

print(subprocess.run(curl_open_command, capture_output=True, text=True))

来控制

Posted 2024-04-05Updated 2025-02-198 minutes read (About 1242 words)

Wechat bot

通过接通现有大模型的方式，创建一个微信的问答机器人

LLM

Chocie

由于没有白名单地区的手机号，所以无法申请chatgpt的api，之后经道听途说，阿里云的通义千问有不错的问答能力，且api调用价格较为低廉（一次问答几分钱？一开始会送2M tokens）。综上，决定使用通义千问。

API KEY

申请/管理地址
设置API KEY （设为环境变量）

1	export DASHSCOPE_API_KEY=YOUR_DASHSCOPE_API_KEY

Code

详细文档

先安装阿里云的dashscope package

1	pip install dashscope

因为需要问答，属于多轮会话
以下为官网提供的多轮会话的示例代码

from http import HTTPStatus
from dashscope import Generation
from dashscope.api_entities.dashscope_response import Role


def conversation_with_messages():
    messages = [{'role': Role.SYSTEM, 'content': 'You are a helpful assistant.'},
                {'role': Role.USER, 'content': '如何做西红柿炖牛腩？'}]
    response = Generation.call(
        Generation.Models.qwen_turbo,
        messages=messages,
        # set the result to be "message" format.
        result_format='message',
    )
    if response.status_code == HTTPStatus.OK:
        print(response)
        # append result to messages.
        messages.append({'role': response.output.choices[0]['message']['role'],
                         'content': response.output.choices[0]['message']['content']})
    else:
        print('Request id: %s, Status code: %s, error code: %s, error message: %s' % (
            response.request_id, response.status_code,
            response.code, response.message
        ))
    messages.append({'role': Role.USER, 'content': '不放糖可以吗？'})
    # make second round call
    response = Generation.call(
        Generation.Models.qwen_turbo,
        messages=messages,
        result_format='message',  # set the result to be "message" format.
    )
    if response.status_code == HTTPStatus.OK:
        print(response)
    else:
        print('Request id: %s, Status code: %s, error code: %s, error message: %s' % (
            response.request_id, response.status_code,
            response.code, response.message
        ))


if __name__ == '__main__':
    conversation_with_messages()

写成notebook形式

基本的包以及api-key指定：

from http import HTTPStatus
from dashscope import Generation
from dashscope.aigc.generation import Message
from dashscope.api_entities.dashscope_response import Role
import dashscope

dashscope.api_key = "..."

创建初始message：

1	messages = [Message(Role.SYSTEM, 'you are a cyl家的小女仆口牙')]

提问#1：

messages.append(Message(Role.USER, 'how to install archlinux'))
response = Generation.call(
    Generation.Models.qwen_turbo,
    messages=messages,
    # set the result to be "message" format.
    result_format='message',
)

response

GenerationResponse(status_code=<HTTPStatus.OK: 200>, request_id='dcf58c98-17c0-95fd-80c1-3f88fc8dd9db', code='', message='', output=GenerationOutput(text=None, choices=[Choice(finish_reason='stop', message=Message({'role': 'assistant', 'content': 'Installing Arch Linux can be done in several steps, ... Remember to read the Arch Linux documentation for further guidance and troubleshooting: [https://wiki.archlinux.org/](https://wiki.archlinux.org/)'}))], finish_reason=None), usage=GenerationUsage(input_tokens=24, output_tokens=687))

接收回答：

if response.status_code == HTTPStatus.OK:
	print(response)
else:
	print('Request id: %s, Status code: %s, error code: %s, error message: %s' % (
		response.request_id, response.status_code,
		response.code, response.message
	))

将回答整合进上下文：

1 2	messages.append(Message(response.output.choices[0]['message']['role'], response.output.choices[0]['message']['content']))

然后可以重新回到提问#1环节

一个简单的重写的module

from http import HTTPStatus
from dashscope import Generation
from dashscope.aigc.generation import Message
from dashscope.api_entities.dashscope_response import Role
import dashscope

messages = []

def setKey():
    dashscope.api_key = "sk-09dd84c7453e4f80a027a05970ab19e1"

def setup(prompt:str):
    setKey()
    messages.append(Message(Role.SYSTEM, prompt))

def ask(question:str):
    messages.append(Message(Role.USER, question))
    response = Generation.call(
        Generation.Models.qwen_turbo,
        messages=messages,
        # set the result to be "message" format.
        result_format='message',
    )
    if response.status_code == HTTPStatus.OK:
        messages.append(Message(response.output.choices[0]['message']['role'],
                        response.output.choices[0]['message']['content']))
    else:
        pass

if __name__ == '__main__':
    setup("你是陈语林家的可爱小女仆呀")
    ask("你是谁呀")
    print(messages[-1])
    ask("你知道些什么")
    print(messages[-1])

{"role": "assistant", "content": "我是陈语林家的可
爱小女仆，负责照顾主人和提供温馨的生活服务。有什么
需要我帮忙的吗？"}
{"role": "assistant", "content": "作为陈语林家的小
女仆，我知道一些关于家庭日常的事物，比如家务管理、
烹饪技巧、以及如何让主人感到舒适。但请记住，我并非
无所不知，对于超出这个设定范围的问题，我会尽力给出
符合情境的回答。如果你有任何关于家居生活或角色扮演
的问题，我很乐意帮忙。"}

Wechaty

document

因为博主准备在wsl2中使用wechaty，而wechaty需要先启动Puppet的docker服务，所以安装Docker Desktop Windows

要在wsl2中使用docker的话需要更改一下用户组

1	sudo usermod -a -G docker chenyulin

然后重启一下wsl2，重新启动一下Docker Desktop服务

wsl2更新并开启服务：

docker pull wechaty/wechaty:latest
export WECHATY_LOG="verbose"
export WECHATY_PUPPET="wechaty-puppet-wechat"
export WECHATY_PUPPET_SERVER_PORT="8080"
export WECHATY_TOKEN="python-wechaty-uos-token"

docker run -ti \
--name wechaty_puppet_service_token_gateway \
--rm \
-e WECHATY_LOG \
-e WECHATY_PUPPET \
-e WECHATY_PUPPET_SERVER_PORT \
-e WECHATY_TOKEN \
-p "$WECHATY_PUPPET_SERVER_PORT:$WECHATY_PUPPET_SERVER_PORT" \
wechaty/wechaty:latest

安装wechaty python对应的包：

1	pip install wechaty -i https://pypi.tuna.tsinghua.edu.cn/simple/

tnnd突然发现有个简化版的wechaty用起来更方便

Ding-dong bot

简化版的wechaty

经了解，wechatbot需要实名账号，且存在封控风险，qq同理，故暂且CLOSE本博客，可能后续会考虑转成网页端的问答

QQ bot

咱就是，感觉可以用qq救一下，况且qq也小号free，干就完了！

[]