Chen Yulin's BlogChen Yulin's Blog
HomeArchivesCategoriesTagsAbout
  目录
3D-LLM
Posted 2025-02-13Updated 2025-05-12Review3 minutes read (About 505 words)   visits

3D-LLM

Intro

Recent works have explored aligning images and videos with LLM for a new generation of multi-modal LLMs that equip LLMs with the ability to understand and reason about 2D images.
但是仍缺少对于3D物理空间进行分析的模型, which involves richer concepts such as spatial relationships, affordances, physics and interaction so on.

由此提出了inject the 3D world into large language models, 介绍一个全新的3D-llm模型族,可以将3D表示(即带有功能的3D点云)作为输入,并执行一系列与3D相关的任务。
优势:

  • 关于整个场景的长期记忆可以存储在整体3D表示中,而不是情节的部分视图观测值
  • 3D属性(如提供和空间关系)可以从3D表示形式中进行推论,远远超出了基于语言或基于2D图像的LLM的范围

挑战

  • 数据获取:3D数据的稀缺性阻碍了基于3D的基础模型的发展。 3D数据与语言描述配对甚至更难获得
    • 提出了一组独特的数据生成管道,这些管道可以生成大规模的3D数据与语言配对。
  • Obtain meaningful 3D features that could align with language features for 3D-LLMs: 一种方法是使用类似的对比性范式从头开始训练3D编码,以在2D图像和语言之间对齐。但是,该范式消耗了巨大的数据,时间和GPU资源。
    • 使用了一个3D功能提取器,该提取器构造了渲染的多视图图像的2D预处理特征的3D功能。最近,还使用了2D预训练的CLIP特征来训练其VLMS,也有很多视觉语言模型(例如Blip-2,Flamingo)。由于我们提取的3D功能与2D预处理的功能相同,因此我们可以无缝使用2D VLM作为骨架,并输入3D功能,以进行3D-LLM的有效训练。

TODO

3D-LLM

http://chen-yulin.github.io/2025/02/13/[OBS]Reconstruct Anything-3D-LLM/

Author

Chen Yulin

Posted on

2025-02-13

Updated on

2025-05-12

Licensed under

#RoboticsResearch-paperLLM3D-Scene
《悉达多》 读书会p14
PointLLM

Comments

Chen Yulin

Chen Yulin

SJTU student

Manchester by the Sea

Posts

260

Categories

8

Tags

187

Follow

Catalogue

  • Intro
  • TODO

Archives

  • May 20256
  • April 202517
  • March 202545
  • February 202512
  • January 202513
  • December 202412
  • November 20244
  • October 202418
  • September 202417
  • August 202413
  • July 20243
  • June 20245
  • May 202413
  • April 202417
  • March 20241
  • January 20241
  • December 20231
  • May 202346
  • August 20221
  • May 20226
  • April 20229

Recents

2025-05-13

Part-level Dataset Available for Augmentation

Note

Feature Pyramid Networks for Object Detection

2025-05-08

Feature Pyramid Networks for Object Detection

Review

Write Latex in Neovim on Archlinux

2025-05-07

Write Latex in Neovim on Archlinux

Note

Davinci-resolve on Archlinux

2025-05-07

Davinci-resolve on Archlinux

Note

Deformable Convolutional Networks

2025-05-06

Deformable Convolutional Networks

Review

Tags

3D-Scene4
6-D3
AI10
AIGC1
AR2
Academic1
Algorithm1
Aliyun1
App2
Atlas1
BS41
Beautify1
Behaviorism1
Business1
C1
CADC1
CD1
CLIP5
CNN1
CV28
Capstone10
Communication2
Contrastive-Learning3
Control2
Csharp9
Css1
Cuda3
DD1
DINO4
DT1
Dataframe1
Debate5
Debugger1
Diffusion1
Discrete-Mathematics1
Docker1
Docs2
Dynamic-programming1
ESP322
Education1
Embeded-System9
Embodied-AI8
Emoation1
Emotion12
Ethic1
FL1
FPN2
Family1
Federated-Learning1
Foundation1
Functional programming1
GPT3
Game5
Gated-NN2
Git7
Github1
Godot3
HPC1
HRI2
Haskell1
Health2
Hexo10
Hierarchical1
Html5
Humanism1
Hyprland2
IK1
Image-Grounding1
Image-Text5
Image-generation1
ImitationLearning3
Jolt1
Json1
LLM12
LSP2
Latex2
Life4
LinearAlgebra1
Linux20
Live2d1
Love3
Lua1
MBTI1
ML5
MR/AR3
Mason1
Math3
Meme1
Message-Passing1
Mod3
Motivation1
Movie1
Multi-modal6
Multi-view1
Music5
NLP4
NN4
Network2
Nodejs5
Numpy1
Nvim9
Object-Detection4
Open-Vocabulary9
OpenCV1
Oral1
PHD1
PSY5
Pandas2
Panoptic1
Path1
Philosophy3
PhysX1
Physical-Scene4
Physics-engine1
Pio2
Planning1
Plugin8
PoseEstimation3
Postgraduate1
Prefab1
Probability1
Python26
Pytorch1
QML1
Quantum1
RNN4
ROS3
Reading19
Real2Sim1
Reconstruct9
Regex2
Reinforcement-learning1
Repository5
Representation-Learning1
Research-paper86
Robot1
Robotics16
SJTU-Lecture1
SQL2
SSH2
Scene-graph29
Scene-synthesis1
Science-fiction1
Scrap1
Script2
Segmentation7
Semantic12
Shader3
Shell4
Signals and Systems1
Sim2Real1
Sklearn1
Snippets1
Society4
Star-rail1
Subgraph1
Submodule1
Supervised-learning2
Survey3
TC1
TOEFL1
Task-Planning6
Tasks4
Tech Communication1
Torch4
Transformer11
Translation-Embedding2
Travel2
Unity20
Unsupervised-learning1
VLM5
VLP2
Version-management1
ViT4
VideoEditing2
Vim1
Visual-Relation20
WSL1
Waybar1
Wayland1
Web1
Website1
Well-being1
Window-manager2
YKLL3
Zen2
🐱1
Chen Yulin's BlogChen Yulin's Blog

© 2025 Chen Yulin  Powered by Hexo & Icarus

×