Chen Yulin's BlogChen Yulin's Blog
HomeArchivesCategoriesTagsAbout
ALBEF
Posted 2025-03-04Updated 2026-03-01Reviewa minute read (About 154 words)

ALBEF

使用的backbone是BERT(通过MLM训练)
该研究认为,image encoder的模型大小应该大于text encoder,所以在text encoder这里,只使用六层self attention来提取特征,剩余六层cross attention用于multi-modal encoder。

#Research-paperCVImage2TextContrastive-LearningMultiModalVLPImage-Text
ViLT
Posted 2025-03-04Updated 2026-03-01Reviewa few seconds read (About 3 words)

ViLT

#Research-paperCVTransformerImage2TextMultiModalVLPImage-Text
ZegCLIP
Posted 2025-03-03Updated 2026-03-01Reviewa few seconds read (About 0 words)

ZegCLIP

#Research-paperCVSemanticCLIPOpen-VocabularySegmentation
BLIP
Posted 2025-03-03Updated 2026-03-01Reviewa few seconds read (About 108 words)

BLIP

A vision-language model that unifies vision-language understanding and generation tasks.

#Research-paperCVMulti-modalSemanticCLIPVLPImage-Text
GLIP
Posted 2025-02-19Updated 2026-03-01Review2 minutes read (About 273 words)

GLIP

GLIP是一个学习了object-level, language-aware, and semantic-rich visual representations 的模型。
统一对象检测和短语接地进行预训练。

#Research-paperCVMulti-modalObject-DetectionCLIPContrastive-LearningVLPImage-Grounding
Posted 2025-02-18Updated 2026-03-01Note25 minutes read (About 3683 words)

NSFC

在人机协作的工作环境中,准确地理解与推理工作场景至关重要。传统方法往往依赖静态感知技术,难以处理动态变化的场景信息。随着深度学习和大语言模型的进步,结合场景大模型与知识图谱的多模态推理技术,将为环境理解提供更强的动态感知和智能推理能力。

#CV
Extract Free Dense Labels from CLIP
Posted 2025-02-18Updated 2026-03-01Reviewa few seconds read (About 0 words)

Extract Free Dense Labels from CLIP

#Research-paperCVSemanticCLIPOpen-VocabularySegmentation
Posted 2025-02-17Updated 2026-03-01Note4 minutes read (About 558 words)

Docker

Docker

Docker Proxy 配置 | FuYao Docker 文档


FuYao 平台自定义镜像构建

目录结构

1
2
3
4
~/fuyao/my-docker/
├── Dockerfile
├── nvim-config/ # ~/.config/nvim
└── nvim-data/ # ~/.local/share/nvim

准备本地配置

1
2
3
mkdir -p ~/fuyao/my-docker && cd ~/fuyao/my-docker
cp -r ~/.config/nvim ./nvim-config
cp -r ~/.local/share/nvim ./nvim-data

Dockerfile 模板(含 Neovim 环境)

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
FROM infra-registry-vpc.cn-wulanchabu.cr.aliyuncs.com/data-infra/fuyao:luome-250704-0233
ENV DEBIAN_FRONTEND=noninteractive

ENV MAX_JOBS=1
ENV TZ=Asia/Shanghai

ARG PIP_OPTIONS="-i https://nexus-wl.xiaopeng.link/repository/ai_infra_pypi_group/simple --timeout 120"

# 安装基础工具
RUN apt-get update && apt-get install -y \
ripgrep \
fd-find \
nodejs \
npm \
python3-pip \
&& rm -rf /var/lib/apt/lists/*

# 安装指定版本 Neovim(需与本地版本一致)
RUN curl -LO https://github.com/neovim/neovim/releases/download/v0.10.4/nvim-linux64.tar.gz \
&& tar -xzf nvim-linux64.tar.gz \
&& mv nvim-linux64 /opt/nvim \
&& ln -s /opt/nvim/bin/nvim /usr/local/bin/nvim \
&& rm nvim-linux64.tar.gz

# 安装 pynvim
RUN pip install ${PIP_OPTIONS} pynvim

# 复制 nvim 配置和插件(包含 mason 安装的 LSP)
COPY nvim-config /root/.config/nvim
COPY nvim-data /root/.local/share/nvim

# FuYao 必需组件
COPY --from=infra-registry.cn-wulanchabu.cr.aliyuncs.com/data-infra/public:fuyao-base-1.9.7 \
/opt/data-infra /opt/data-infra
ENV PATH="${PATH}:/opt/data-infra:/root/.local/share/nvim/mason/bin"
ENTRYPOINT ["tini", "-s", "--"]

构建命令

1
2
3
4
5
6
# 使用香港节点(可访问 GitHub)
fuyao docker --push --site=fuyao_hk --image-name=my-nvim --image-tag=v1

# 其他参数
# --timeout 7200 构建超时时间
# -f ./custom.dockerfile 指定 Dockerfile

使用镜像

1
2
3
4
5
6
7
8
# 镜像地址格式
infra-registry-vpc.cn-wulanchabu.cr.aliyuncs.com/data-infra/fuyao:{image-name}-{image-tag}

# 启动交互式 shell
fuyao kubernetes shell --image <镜像地址>

# SSH 进入运行中的任务
fuyao ssh <job_name>

常见问题

Neovim 插件报错 gsplit_plain nil

原因:容器内 Neovim 版本与本地不一致,插件 API 不兼容
解决:Dockerfile 中指定与本地相同的 Neovim 版本

1
2
# 查看本地版本
nvim --version | head -1

Mason LSP 无法使用

解决:确保 mason bin 目录在 PATH 中

1
ENV PATH="${PATH}:/root/.local/share/nvim/mason/bin"

构建超时

  • 使用 --site=fuyao_hk 访问 GitHub
  • 或提前下载文件用 COPY 代替 curl
  • 使用国内 pip 镜像源

参考资源

  • FuYao Docker 文档
  • Docker Proxy 配置
#DockerFuYaoNeovim
ConceptFusion
Posted 2025-02-17Updated 2026-03-01Review2 minutes read (About 297 words)

ConceptFusion

将不同帧$X_t$中的特征集合在M中特征点的公式:

#Research-paperCVMulti-modalReconstruct3D-SceneSemanticCLIP
Grounding-DINO
Posted 2025-02-16Updated 2026-03-01Reviewa minute read (About 216 words)

Grounding-DINO

,

#Research-paperCVTransformerObject-DetectionOpen-VocabularyContrastive-LearningMultiModalDINOImage-Grounding
Previous
Next
  • 1
  • …
  • 11
  • 12
  • 13
  • 14
  • 15
  • …
  • 32
Chen Yulin

Chen Yulin

SJTU student

Manchester by the Sea

Posts

315

Categories

10

Tags

245

Follow

Archives

  • February 202614
  • January 20268
  • December 20253
  • November 20256
  • October 20251
  • September 20253
  • August 20256
  • July 20255
  • June 20256
  • May 202510
  • April 202517
  • March 202545
  • February 202512
  • January 202513
  • December 202412
  • November 20244
  • October 202418
  • September 202416
  • August 202413
  • July 20243
  • June 20245
  • May 202413
  • April 202417
  • March 20241
  • January 20241
  • December 20231
  • May 202346
  • August 20221
  • May 20226
  • April 20229

Recents

2026-02-27

Neovim SSH Clipboard

Note

2026-02-25

MoveIt2-Trajectory-Planning

Note

ROS2-Package-Structure

2026-02-24

ROS2-Package-Structure

Note

exist_label

2026-02-14

exist_label

Note

BAGEL-Unified-Multimodal-Pretraining

2026-02-06

BAGEL-Unified-Multimodal-Pretraining

Review

Tags

3D-Scene17
6-D3
AI16
AIGC1
API1
AR2
Academic1
Algorithm1
Aliyun1
App2
Atlas1
BS41
Bayesian-Inference1
Beautify1
Behaviorism1
Business1
C1
CADC1
CD1
CLI1
CLIP11
CMake1
CNN1
CV68
Camera1
Capstone10
Chemistry1
Claude1
Clipboard1
Communication2
Contrastive-Learning5
Control3
Csharp9
Css1
Cuda3
DD1
DINO4
DT1
Dataframe1
Debate5
Debugger1
Deep-Learning1
Development-Tools1
Diffusion2
Diffusion-Policy1
DiffusionModel4
Discrete-Mathematics1
Disney1
Docker1
Docs2
Dynamic-programming1
ESP322
Education1
Embeded-System9
Embodied-AI19
Emoation1
Emotion13
Ethic1
Experiment2
FL1
FPN2
Family1
Federated-Learning1
Foundation1
FoundationModel4
FuYao1
Functional programming1
GPT3
Game5
Gated-NN3
Git7
Github1
Godot3
Graph1
HPC1
HRI2
Haskell1
Health2
Hexo10
Hierarchical4
Html5
Humanism1
Humanoid1
HumanoidRobot1
Hybrid-Control1
Hyprland2
IK1
Image-Grounding2
Image-Text4
Image-generation2
Image2Text7
ImgGen3
ImitationLearning5
Information-Theory1
Jolt1
Json1
LLM17
LSP2
LatentAction1
Latex2
Lego1
Life4
LinearAlgebra1
Linux22
Live2d1
Love4
Lua1
MBTI1
ML14
MPC2
MR/AR3
Machine-Learning3
Mason1
Math7
Meme1
Message-Passing2
MindPlus1
MoE2
Mod3
Model-Predictive-Control1
Motion-Planning1
Motivation1
MoveIt21
Moveit1
Movie1
Multi-Agent1
Multi-modal14
Multi-view1
MultiModal5
Music5
NLP6
NN12
Neovim2
Network2
Nodejs5
Numpy1
Nvim9
OSC521
Object-Detection9
Open-Vocabulary11
OpenCV1
Oral1
PHD1
PSY5
Package1
Pandas2
Panoptic1
Path1
Philosophy3
PhysX1
Physical-Scene4
Physics-engine1
Pio2
Planning1
Plugin8
PoseEstimation3
Postgraduate1
Prefab1
Probability2
Python31
Pytorch1
QML1
Quantum1
RAG1
RL3
RNN4
ROS6
ROS22
Reading19
Real2Sim2
Reconstruct13
Regex2
Reinforcement-Learning2
Reinforcement-learning1
Repository5
Representation-Learning5
Research-paper97
Robot5
RobotLearning13
Robotics39
SJTU-Lecture1
SQL2
SSH4
Scalability2
Scene-graph34
Scene-synthesis2
Science-fiction1
Scrap1
Script2
Segmentation8
Semantic15
Shader3
Shell4
Signals and Systems1
Sim2Real1
Sklearn1
Snippets1
Society4
Star-rail1
Statistics2
Subgraph1
Submodule1
Supervised-learning2
Survey4
TC1
TOEFL1
Task-Planning9
Tasks5
Tech Communication1
Torch5
Transformer20
Translation-Embedding2
Travel5
UI1
Unified-Multimodal1
Unity20
Unsupervised-learning1
VAE2
VLA4
VLM9
VLP5
VQ-VAE1
Variational-Inference1
Version-management1
ViT5
VideoEditing2
Vim1
Visual-Relation23
WSL1
Waybar1
Wayland1
Web1
Website1
Well-being1
Window-manager2
WorldModel2
YKLL3
Zen2
ament1
♥️2
实习1
🍢1
🍰1
🐱2
🧀1
Chen Yulin's BlogChen Yulin's Blog

© 2026 Chen Yulin  Powered by Hexo & Icarus

×