
Deep Hierarchical Semantic Segmentation
Repository: https://github.com/owkin/FLamby
1 | git clone https://github.com/owkin/FLamby.git |
Fed-TCGA-BCRA
https://owkin.github.io/FLamby/fed_tcga_brca.html
1 | import torch |
Import several macros, datasets and metrics.
1 | # Instantiation of local train set (and data loader)), baseline loss function, baseline model, default optimizer |
In this script, the pooled
parameter is set to False
when creating the FedDataset
instances. This indicates that the dataset is not pooled, meaning that the data is kept separate for each client or center. Each client or center has its own local dataset, which is a common setup in federated learning to simulate real-world scenarios where data is distributed across different locations or devices.
1 | # Traditional pytorch training loop |
正常的训练流程
1 | # Evaluation |
使用的evaluation metric是lifelines.utils.concordance_index
,返回的是c_index
1 | import torch |
1 | # We loop on all the clients of the distributed dataset and instantiate associated data loaders |
1 | # Federated Learning loop |
Convert Raw RGB-D to tree-structure scene(maybe in unity), for more
发现和lff近期发表的一篇文章思想非常一致 https://arxiv.org/html/2410.07408v1
和场景理解的对比
仓库: https://github.com/Simple-Robotics/cosypose
1 | git clone --recurse-submodules https://github.com/Simple-Robotics/cosypose.git |
注意执行这一步的时候pip 会提示setuptools 和matplotlib-inline不符合3.7.6的python,到环境中手动安装适配的版本
根据README下载数据
注意第一块指令无法下载成功,由 https://bop.felk.cvut.cz/datasets/ 得知下载链接迁移到了huggingface, https://huggingface.co/datasets/bop-benchmark/datasets/tree/main/ycbv 可以从这里手动下载测试集并放置到local_data/bop_datasets/ycbv/test
设置测试使用的models
np.where(mask)[0].item()
运行
时出现报错
1 | Traceback (most recent call last): |
添加debug输出,得到
1 | Debug - scene_id: 48, view_id: 1 |
发现是下载的测试数据集并不包含数据集keyframe.txt中所有的帧,导致一些关键帧识别不到
如果想重新开始新的训练: 清空local_data/joblib_cache
cosypose.scripts.run_cosypose_eval
The script predicts object poses based on multi-view input by following these steps:
Dataset Loading: It first loads the dataset using the make_scene_dataset
function, which prepares the scene data for evaluation. The dataset is wrapped in a MultiViewWrapper
to handle multiple views.
Model Loading: The script loads pre-trained models for pose prediction using the load_models
function. It loads both coarse and refiner models based on the configuration specified in the command-line arguments.
Prediction Setup: The script sets up the prediction parameters, including the number of iterations for coarse and refiner models, and whether to skip multi-view processing based on the number of views specified.
Multi-view Prediction: The MultiviewScenePredictor
is initialized with the mesh database, which is used to predict poses across multiple views. The MultiviewPredictionRunner
is then used to run predictions on the dataset, leveraging the multi-view setup to improve pose estimation accuracy.
Pose Estimation: The script uses the loaded models to predict object poses. It processes detections from either pix2pose
or posecnn
depending on the dataset, and refines these predictions using the refiner model.
Evaluation: After predictions, the script evaluates the predicted poses using the PoseEvaluation
class. It calculates various metrics like ADD-S and AUC to assess the accuracy of the pose predictions.
Results Logging: Finally, the script logs the results, including evaluation metrics, and saves them to a specified directory.
The multi-view approach allows the script to leverage information from different viewpoints, which can help resolve ambiguities and improve the robustness of the pose estimation.
run_custom_scenario
Transformation from Camera to Object.
It represents the transformation matrix or parameters that describe the pose of an object relative to the camera’s coordinate system
Transformation from World to Object.
It represents the transformation matrix or parameters that describe the pose of an object relative to the world’s coordinate system
1 | class MeshDataBase: |
一般使用的初始化方式:
1 | object_ds = BOPObjectDataset(scenario_dir / 'models') |
也可以通过load models一起加载:
1 | predictor, mesh_db = load_models(coarse_run_id, refiner_run_id, n_workers=n_plotters, object_set=object_set) |
Multiview_wrapper
作用:
读取 scene_dataset 并且通过视角数量n_views
来分割这些数据为不同场景,然后方便遍历其中的场景元素(这里都是ground truth)
遍历时返回的值为
n_views
张不同视角下的RGB图像n_views
张对应的maskn_views
份对应的observation1 | [ |
MultiviewPredictorRunner
作用:
接收Multiview_wrapper
作为输入,并做出预测
首先是数据集接收:
1 | dataloader = DataLoader(scene_ds, batch_size=batch_size, |
use collate_fn
to process the row data (最后的注释里面有真正用到的数据)
1 | def collate_fn(self, batch): |
最重要的function: get_predictions
1 | def get_predictions(self, pose_predictor, mv_predictor, |
Responsible for generating predictions for object poses in a scene using both single-view and multi-view approaches.
Input Parameters:
pose_predictor
: single view predictor,比如ycbv数据集用的就是posecnn的检测模型mv_predictor
: An object or function that predicts scene states using multi-view information.detections
: A collection of detected objects with associated information, pre-generated and saved in a .pkl filen_coarse_iterations
, n_refiner_iterations
: Number of iterations for coarse and refinement pose estimation.sv_score_th
: Score threshold for single-view detections.skip_mv
: A flag to skip multi-view predictions.use_detections_TCO
: A flag to use detections for initial pose estimation.Filtering Detections:
需要注意的是这里使用的detection是直接来自预存好的检测数据(非ground truth)
detections
based on the sv_score_th
threshold.scene_id
and view_id
.Iterating Over Data:
dataloader
.Matching Detections:
Pose Prediction:
pose_predictor
to get single-view predictions.Multi-View Prediction:
skip_mv
is False
, it uses the mv_predictor
to predict the scene state using multi-view information.Collecting Predictions:
Concatenating Results:
MultiviewScenePredictor
作用:
used by Myltiview_PredictionRunner.get_predictions
In run_cosypose_eval
we initialize MultiviewScenePredictor
in this way:
In the MultiviewScenePredictor
we use the mesh_db to initialize MultiviewRefinement
and solve:
1 | problem = MultiviewRefinement(candidates=candidates_n, |
The solve
function of MultiviewRefinement
:
1 | def solve(self, sample_n_init=1, **lm_kwargs): |
准备基于run_custom_scenario
进行修改run_custom_scenario
的使用方式:
1 | Setting OMP and MKL num threads to 1. |
该脚本只接收了candidates, mesh_db和camera_k信息,直接运行mv_predictor
写一个通过list输入构建candidates的function:
1 | def read_list_candidates_cameras(self, data_list, cameras_K_list): |
1 | # Example usage: |
1 | (PandasTensorCollection( |
之后就正常调用MultiviewScenePredictor.predict_scene_state()
to estimate the scene:
1 | predictions = self.mv_predictor.predict_scene_state(candidates, cameras, |
之后再使用Non-Maximum Suppression来聚合重复检出的物体
1 | objects = predictions['scene/objects'] |
最终输出objects_
1 | PandasTensorCollection( |
Please refer to the notebook custom_scene.ipynb
.
CosyPose-- Consistent multi-view multi-object 6D pose estimation
Estimate accurate 6D poses of multiple known objects in a 3D scene captured by multiple cameras with unknown positions
AR2-D2 -- Training a Robot Without a Robot
机器人执行任务的视频数据集非常重要,特别是对于Visual Imitation Learning来说。
想要获得这些训练集视频,传统的方法是人工引导机器人做相关动作,然后再录制,耗费大量人力和时间成本,最关键的是机器人是固定在实验室内的,能接触到的物品和任务比较有限,因此这些训练数据中不包含更日常的场景。
提出了一个IOS APP,可以通过追踪用户手部的动作在视频中生成一个执行动作的AR机器人。
如上图,AR2-D2 的设计和实现由两个主要组件组成。第一个组件是一个手机应用程序,它将 AR 机器人投射到现实世界中,允许用户与物理对象和 AR 机器人进行交互。第二个组件将收集的视频转换为可用于训练不同行为克隆代理的格式,这些克隆代理随后可以部署在真实的机器人上。
Unity + AR Foundation kit(用于生成一个虚拟机械臂并布置在场景中)
传感器:苹果设备摄像头和自带的LiDAR
通过ios自己的人手姿态算法和深度信息获取手部动作,由此获取机械臂需要运动到的关键点,并且可以让AR界面中的机械臂移动到指定位置。
得到APP生成的视频后消除人手并填补消除的区域(E2FGVI),就可以得到机械臂操作物体的视频,它可以用作基于视觉的模仿学习的训练数据。
围绕三个常见的机器人任务收集演示:{press, push, pick up}
使用 Perciver-Actor (PERACT)训练基于 Transformer 的语言引导行为cloning policy
PERACT takes a 3D voxel observation and a language goal (v, l) as input and produces discretized outputs for translation, rotation, and gripper state of the end-effector. These outputs, coupled with a motion planner, enable the execution of the task specified by the language goal.
每一个agent执行一种任务({press, push, pick up}),先训练3k次,然后再微调训练(3k iteration),用于缩小iphone摄像机和agent使用的kinect v2相机之间的偏差。
微调结果
测试结果
Human-robot interaction for robotic manipulator programming in Mixed Reality
和我毕设很像的工作,居然已经发ICRA了?
虽然近些年有关AR在人机交互方面应用的研究有很多,但是这些研究大都缺少系统性的分析
Recently, an increasing number of studies in HCI, HRI, and robotics have demonstrated how AR enables better interactions between people and robots. However, often research remains focused on individual explorations and key design strategies, and research questions are rarely analyzed systematically.
本文主要给目前AR人机交互领域做一下分类(基于460篇文章)
AR人机交互主要分为这几种研究维度
AR最大的优势就是能够提供超出物理限制的丰富视觉反馈,减少工人的认知负荷
这个研究最终的目标是提供一个对于该领域的共同基础和理解。
机器人系统不单指传统工业机器人,在本研究中,我们不局限于任一种机器人。
Robotic interfaces 主要指”Interfaces that use robots or other actuated systems as medium for HCI”.
该研究通过design space dimensions来呈现该领域的分类
拓宽了HCI和HRI的文献研究
讨论了促进该领域进一步研究的开放性研究问题和机会
有一个交互式网站 https://ilab.ucalgary.ca/ar-and-robotics/
根据增强现实硬件的布置位置(dimension 1),可以分为
这篇主要讨论呈现AR内容的方式
Blog Template For New Hexo User
本地增添博客内容(markdown文件)->hexo根据文件内容生成网页源码->上通过指令上传(push)到github->github自行部署静态页面
https://www.cnblogs.com/xueweisuoyong/p/11914045.html
因为把本地写的内容传到github,需要绑定一个ssh密钥
参见:https://docs.github.com/en/authentication/connecting-to-github-with-ssh/generating-a-new-ssh-key-and-adding-it-to-the-ssh-agent
把这串公钥添加到github ssh settings里面
https://blog.csdn.net/lizhong2008/article/details/133844070
最新版本即可
设定本地的git config
1 | git config --global user.email "guanshengyuanlu@163.com" |
克隆仓库
安装环境
网站上的每一篇文章在本地都是一份markdown文本文件,存在source/_posts
中
通过指令hexo new "article title"
来创建一篇新博客
然后到对应文件里面编辑就行了
markdown的编辑器推荐用typora,当然如果足够硬核的话用txt文本编辑器也毫无问题!
如果想添加图片的话,就往同文件夹下的资源文件夹(和这篇博客名字相同的文件夹)中添加照片然后在文中输入
可以参考Sample Blog
添加完自己想要的内容之后用./deploy.sh
部署一下网站,稍等片刻,进到网站里就可以看到最新的变化了。
进到themes/icarus/source/img
文件夹📁
如果重新部署后发现没有更改,那就网页里按一下<Ctrl>+F5
如上图,改avatar.png
参考 https://chen-yulin.github.io/2024/09/03/%5BOBS%5Dhexo-Hexo%20Comment%20System%20--%20Twikoo/