
Here's something encrypted, password is required to continue reading.
Read more爱心屋签到: aixinwu.sjtu.edu.cn/products/asw-store
每日二GRISSO💊
Starting from 2/20:
Time | Monday | Tuesday | Wednesday | Thursday | Friday | Saturday | Sunday |
---|---|---|---|---|---|---|---|
08:00 | |||||||
09:00 | |||||||
10:00 | Upgrade linux | 计算机视觉DSY115 | |||||
11:00 | |||||||
12:00 | |||||||
13:00 | |||||||
14:00 | 自然辩证法DSY202 | ||||||
15:00 | |||||||
16:00 | |||||||
17:00 | |||||||
18:00 | 数据挖掘CRQ219 | ||||||
19:00 | |||||||
20:00 | |||||||
21:00 | |||||||
22:00 | |||||||
Credits: 3+3+1 |
Semantic-SAM Repository Application
My repository: https://github.com/Chen-Yulin/Semantic-SAM
官方步骤:
1 | pip3 install torch==1.13.1 torchvision==0.14.1 --extra-index-url https://download.pytorch.org/whl/cu113 |
根据[[Cuda+Torch]],需要先安装cudatoolkit
和cuda-toolkit
1 | conda install nvidia/label/cuda-11.7.0::cuda-toolkit -c nvidia/label/cuda-11.7.0 |
然后按照torch官网的安装指令:
1 | conda install pytorch==1.13.1 torchvision==0.14.1 torchaudio==0.13.1 pytorch-cuda=11.7 -c pytorch -c nvidia |
第二行直接运行可能会报错,提示系统gcc版本过高,安装gcc=11.2.0
1 | conda install -c conda-forge gcc=11.2.0 |
如果编译时出现ld: cannot find -lcudart: No such file or directory collect2: error: ld returned 1 exit status
报错,只是因为没有安装cudatoolkit
^ ^
安装完成后直接import semantic_sam
会报错ModuleNotFoundError: No module named 'MultiScaleDeformableAttention'
^ ^
提示:
1 | Please compile MultiScaleDeformableAttention CUDA op with the following commands: |
需要手动make一下 Mask2Former
:
1 | cd Mask2Former/mask2former/modeling/pixel_decoder/ops/ |
一些版本问题
1 | pip install gradio==3.37.0 |
1 | python demo.py --ckpt ./weights/swinl_only_sam_many2many.pth |
Comment: 效果相较于SAM更多体现了语义的一致性,而不是基于texture进行分割。
1 | python demo_auto_generation.py --ckpt ./weights/swinl_only_sam_many2many.pth |
SayPlan= Grounding Large Language Models using 3D Scene Graphs for Scalable Robot Task Planning
主要的思想都在上面这个伪代码里,通过只展开部分场景图(严格层级结构),来控制输入llm的场景图大小。
A scalable approach to ground LLM-based task planners across environments spanning multiple rooms and floors
Scene Graph 通过networkx (python package)表示
Clio= Real-time Task-Driven Open-Set 3D Scene Graphs
贡献:
提出了针对不同任务需要不同粒度的语义信息,本文是通过结合SAM和[[CLIP多模态预训练模型]]实现,但是忽略了物体之间的谓语关系或者父子关系。本质还是智能做导航,拾取,放下,导航的基本操作。