
One-Shot Visual Imitation Learning via Meta-Learning
A Survey of Imitation Learning- Algorithms, Recent Developments, and Challenges
IL是区别于传统手动编程来赋予机器人自主能力的方法。
IL 允许机器通过演示(人类演示专家行为)来学习所需的行为,从而消除了对显式编程或特定于任务的奖励函数的需要。
IL主要有两个类别:
BC 是一种 IL 技术,它将学习行为的问题视为监督学习任务 。 BC 涉及通过建立环境状态与相应专家操作之间的映射来训练模型来复制专家的行为。专家的行为被记录为一组state-action pair,也称为演示。在训练过程中,模型学习一个函数,利用这些演示作为输入,将当前状态转换为相应的专家操作。经过训练,模型可以利用这个学习函数来生成遇到新状态的动作。
不需要了解环境的潜在动态,计算效率很高,相对简单的方法。
The covariate shift problem: 测试期间观察到的状态分布可能与训练期间观察到的状态分布有所不同,使得代理在遇到未见过的状态时容易出错,而对于如何进行操作缺乏明确的指导。BC监督方法的问题是,当智能体漂移并遇到分布外状态时,它不知道如何返回到演示的状态。
为了解决这个问题:
IRL 涉及一个学徒代理,其任务是推断观察到的演示背后的奖励函数,这些演示被认为源自表现最佳的专家 。然后使用推断的奖励函数通过 RL 训练学习代理的策略。
为了解决“政策->奖励函数“的模糊性,有以下三种IRL
The agent strives to deceive the discriminator by generating trajectories closely resembling those of the expert.
仅通过图像序列来学习,不需要具体的关节动作操作数据。
Unlike the traditional methods, IfO presents a more organic approach to learning from experts, mirroring how humans and animals approach imitation. Humans often learn new behaviors by observing others without detailed knowledge of their actions (e.g., the muscle commands). People learn a diverse range of tasks, from weaving to swimming to playing games, by watching online videos. Despite differences in body shapes, sensory inputs, and timing, humans exhibit an impressive ability to apply knowledge gained from the online demonstrations
将可学习的资源扩大到了线上的视频资源。
过分析观察到的动态,LAPO 推断出行动空间的底层结构,促进潜在行动策略的训练。然后,这些策略可以进行高效的微调,以达到专家级的性能,从而提供离线和在线场景的适应性。使用包含标记动作的小数据集进行离线微调是可行的,而在线微调可以使用奖励来完成。与依赖标记数据来训练逆动力学模型不同,LAPO直接从观察到的环境动态中导出潜在动作信息,而不需要任何标签。
。。。
稍微看了看去年的总结,感觉太不一样了。
前几天在水源上开了个帖子,“一句诗概括你的2024精神状态 =w=。”收到了许多回复,或是痛骂牛马现状,或是感慨蜀道难,或是轻舟已过万重山(实名羡慕)。
我也思考了一下我想用什么诗句来概括。我觉得,可以是“长夜无眠泪已尽,丽夏有期诗长存”。
可能很多年之后再看,2024年会是我人生转折的一年。很俗套的说法是,男人的成长只需要一次分手。我觉得是的。
说实话我现在基本记不起来五月前的日常,除了美赛,辩论赛,两次争吵,分手。恋爱就这么高烈度地把我想要的,不想要的,渴望的,恐惧的东西都一股脑塞给了那个手无寸铁却相信真诚可以克服一切的我。
结果也是显然的,我困惑于为什么人无法彼此理解,我疲倦于解开一环接一环已成死结的误会。以至于最终面对分手的要求时,我仅能维持住最后一丝体面,说出那句最无力的话“我尊重你的选择,你知道我不会选择复合的。”
第一段感情有以下遗憾:和我的相处并没有让对方变得更好;我并没有真正了解最本质的对方以至于从未建立起安全感。
分后的很长时间,我一直在寻找自救的办法。
其一,尝试读了很多书。先是尝试一些心理的书籍,看了《少有人走的路》一到三册。我从这本书的第一章就获得了爱是什么的定义,“希望一个人在心智成熟的路上走得更远”。我想我确实没有从对方身上体会到这种感觉。再之后看了《窄门》,从书中窥见了自己,那个渴望着柏拉图却终与世间格格不入,拒绝妥协却终于失去一切俗世幸福的自己,或者说自己将会变成的样子。再就是《爱欲之死》,我被骂得很爽。逐渐就能看清自己的内心了,也能看清自己曾经历的那些往事了。所谓走出情关指的不是喜欢上下一个人,而是看清那个为情所困的自己啊。
我决定我要变得远比以前更好,我要交一些真正的朋友,要是能有几位异性的朋友就更好了。我还希望我能养成一种淡然的态度,我对太多事情都太过用力。我也希望我能更清晰地梳理自己的内心,减少内耗。
那么,2024年末的我,did I make it after all?
虽然我知道可能真的会有其他人读这篇,但我不会避讳前任的事情,because trading my yesterday is to wish my life away。
在这里想说说我在2024的下半年交到的两位很好的朋友。
同为infj人格的我们真的非常像,用我师兄的话就是man in the mirror。我们如果想,可以一直谈,从下午谈到吃晚饭,从晚饭谈到晚上十点回寝。我们都非常清楚彼此的边界,因此可以非常自由地在安全区域内谈许多许多话题。我们都很喜欢心灵方面的话题,但是他疑似有点太痴迷于荣格的理论了。有的时候也会反怼:我感觉你大部分时间去交朋友都是为了验证那套理论。
是上中高三的同学,但彼时并无多少交集,反倒是今年下半年在微博认识。我为她的文字着迷,就好像是我的心声经由她的笔凝练成了文字。也因此我给她开放了一些我杂记的访问权限。我们时常交流彼此的过去,曾经难解的心结,是什么塑造了我们,提供情绪的支持。我很感激有她出现在这个时间点,真的帮了我很多,塑造了目前大部分的我。
所谓丽夏指的就是这两位,以及其他认识的新朋友,得到过的其他温情了。或许有一天我们也终分道扬镳,从每天都聊天变成一周聊一次,再变成一个月聊一次,但我会希望尽可能延长我们的缘分,心头感受过的温暖,可以留存很久很久,文字也会留很久很久。
我们会慢慢淡忘,就好像过去未曾发生。
在这样的时时刻刻,过去的文字会带我们回到那些蒙尘的遗迹,诉说彼时彼刻的故事。
因此,我需要感谢我这一年断断续续记着的流水账杂记,让我能知道,还有过这些时刻。
或许有些观念可能在目前的我这里刷新了,但看着过去的思考路径,我会知道我从哪里来,谢谢,如果可以,我新的一年也会记的。
其实后期有相当大的杂记发布在了微博上,这里可以收录一些。
因为《倦怠社会》确实是挺久之前看的书了,所以在读书会前看大纲温习了一下。
他者与规训并没有在当代的功绩社会中消失。在功绩社会中,他者/规训体现为“能够做到/成为balabala”,其中决定了balabala的部分的是规训权力。我们往往通过他者来定义自己想成为什么样的人,如果做不到,或者发现尽了一切努力,燃尽了,依然没有达到那个他者,就会难以避免地走向自我攻击,即抑郁。在当代互联网社会,网红大咖,还有成功人士们像是为我们展示了一场很真实的幻梦,让互联网观众就觉得,ta是现实生活中活生生的人,他做到了,我也想像ta一样。在这种很普遍的情况下,功绩语境下的自己,其实已经背离了真正的自己。
再次想起来韩炳哲在书的最后提到的神圣时间,或许这样的读书会于我而言就是这种神圣时间。有些时间与我自身是意义了了的,可能是为企业生产学术价值,可能是为了积累财富价值,在这些时间里,我都像是“飞转的停滞状态”。飞转是因为价值的产生需要我的劳动,停滞是因为我在这种情况下感受不到时间与意义时空的存在。我们需要这样的小聚会,不再让课业或工作或炫耀的需求成为我们拿起一本书的理由。让书成为书,让读完感受到的迷茫或者感动成为这个小聚会的展品,让艺术性回归到生活,如此我们的生活便是神圣的。
地铁站到家有一段是初中上下学的路,因为在听音乐所以就不想骑车上马路。就这么塞着耳机,放着贝斯版的βios和春日影,背着包兜过这段似乎永远不怎么变的街道。
虽然常开玩笑说自己上的是菜中,但我想这种“通勤时仍能时不时魂穿到十年前的感动”绝对是一种奢侈。
那点爱好就像一棵草,本也不必有开花是命,也没受到多少营养,还总被其他东西挤压生存空间,现在还能坚挺地活着,已经不错了。
3年的modder的生涯告一段落了,有始有终。很幸运进入过这么一个纯粹的亚文化圈,见证过它曾经的辉煌与日渐落寞,也尝试过力挽狂澜,最终还是难免散场各奔东西。世上真有什么是永恒的吗?可能也唯有热爱至死不渝。
该哭的是得哭的,就如该说的再见也必须要说。
再会了,谢谢所有的鱼。
“现实中有的人习惯通过包括不限于言语、拳脚、文字等热暴力或冷暴力把自己的压力发泄到别人身上,玩摇滚的则是抨击这种做法,反抗所受到的不公,抽刀向更强大、无法反抗的命运,并把自己的压力转化成音乐传达出去,给予人们鼓励振奋。”
重温完海上钢琴师后有了一些不一样的感慨:
回忆是一个有限的世界,而未来却那么大,看不到尽头。
所以有限回忆中会诞生无限音乐,而无尽的未来只会渐渐坍缩成一条唯一的路。这条唯一的路也会随着时间,诞生无限的音乐。
选好你的航路,保护好你的船。
“每个人做自己人生的主宰”,这件事情无关乎性别。摆烂也好,倒贴也好,当家庭主夫也好,如果它让我们觉得生活可以这么快乐得过下去,能找到自己的意义,那就是好东西。
顺带:果然这个世界如果没有摇滚真是一天都存在不下去呢!
Along the way, the old blame has never let me go for a single day.
But it happens to be just a nightmare that can disappear if I pray.
Therefore, I wish I made everyday count and let the phantom go away.
于我,幻想朋友真的是无处不在的,正如孤独总是如影随形。
mp3中飘出的声线,尝尝会想象其来自某个具象的脸庞;一个人进行city walk,也会想象身边有着一个亦步亦趋的脚步,和我一起匆匆又慢慢地看着彼此都是第一次见到的风景;在诺大而安静的办公室里,我会给ta捎一杯热巧克力,然后我就写着代码,ta则躺在一旁翘着腿看书,或许有时候还有闲心替我照看一下书架上的绿萝。
ta虽然一直都是ta,但却常会改变形象。ta会变成那些塑造了我却又无可奈何分离的人们。也正是因此,ta让我可以忍受那些分离。ta总是从分别中新生,让我看到未来。
我总是在人群中孤独,在独处时却和所有人都在一起。
流眼泪了,怎么会有这样的人。
我们会慢慢淡忘,就好像过去未曾发生。
在这样的时时刻刻,过去的文字会带我们回到那些蒙尘的遗迹,诉说彼时彼刻的故事。
收到过“透明人”的评价,我想如果我做不到对他人透明,大抵便也做不到对我自己透明。
Stay sensitive, stay hurt, stay pure.
24年确实就是这样emotion的一年,但我不讨厌,2025年快乐!
https://validator.w3.org/ : 是一个由万维网联盟(W3C)提供的在线工具,用于检查网页的 HTML、XHTML 或其他标记语言是否符合相关标准和规范。它可以帮助开发者提高网页的质量和兼容性,确保网页在不同浏览器和设备上正确显示。
作为实习生你需要完成网站开发,要求如下(不重要)
During the next few hours you are explained in details all the requirements you have to match. Among
the most emphasized points you learn that you must (i) use the last version of Microsoft Front Page
Express to write the websites; (ii) include as many buttons as possible (even if one is enough); (iii) when a
hidden box is expanded, do so as high as possible above the button which opened it and do not notify the
user; (iv) as much as possible do not disable or hide irrelevant information, simply include it the middle
of the useful content; (v) use and abuse pop ups; (vi) feel free to include Chinese text in the middle of
an English text; (vii) if a page includes videos, ensure they are all fully downloaded before the user can
do anything; You also learn that they pay much attention to the quality of their product, and as such
you should never forget to test your website, IE 6 being recommended.
在接下来的几个小时里,我们会详细解释您需要满足的所有要求。其中最受关注的要点是:您必须 (i) 使用最新版本的 Microsoft Front Page Express 来编写网站;(ii) 包含尽可能多的按钮(即使一个按钮就足够了);(iii) 展开隐藏框时,尽可能将其展开到打开它的按钮上方,并且不要通知用户;(iv) 尽可能不要禁用或隐藏不相关的信息,只需将其包含在有用内容的中间;(v) 使用和滥用弹出窗口;(vi) 可以在英文文本中间随意添加中文文本;(vii) 如果页面包含视频,请确保在用户执行任何操作之前,它们都已完全下载;您还了解到他们非常重视产品质量,因此您永远不要忘记测试您的网站,建议使用 IE 6.
大概是马牛特有的讽刺。。。总之就是总结了一堆网站开发的禁止事项。
因为这些要求的存在(特别是网页开发环境只能在windows下使用),我们作为一位只有一台双系统电脑的实习生遇到了障碍:
每次开发网页我们都需要切换到windows系统并暂停linux系统下的任务。然而在linux下我们需要跑一些任务,需要尽可能早去完成。
针对这些linux任务,马牛貌似将其分成了T个独立的任务,根据output推测,每个任务也可以被暂停,然后切换到windows进行开发。
可以通过把任务转交给朋友的方式来从schedule中移除部分网页开发的任务,至多两个网页。每个网页有对应的用于交易的作业量,作业量加起来不能超过HC的触发阈值。
需要注意的是,计算任务需要的时间 $t_i$ 和网页完成的deadline $t_i$ 不是一个东西,建议区分标记。
计算任务,网页开发以及外援的信息
1 | P H |
To solve this problem, you need to combine optimization, scheduling, and greedy strategies. Here’s a step-by-step guide:
For each computational task, output the minimum completion time determined in the previous step.
Overall complexity: O((P + T) \log (P + T)).
this complexity is acceptable regarding the given input range.
Here’s a high-level pseudocode that can be adapted into OCaml:
1 | (* Input Parsing *) |
The algorithm is designed to solve the problem of scheduling computational tasks on a single computer while meeting constraints related to webpage deadlines and delegation. Below is a formal proof of its correctness.
The problem involves:
To ensure correctness, we need to prove:
The algorithm is correct as it:
Would you like a detailed explanation for a specific part of this proof or assistance implementing it?
The biggest lesson that can be read from 70 years of AI research is that general methods that leverage computation are ultimately the most effective, and by a large margin. The ultimate reason for this is Moore’s law, or rather its generalization of continued exponentially falling cost per unit of computation. Most AI research has been conducted as if the computation available to the agent were constant (in which case leveraging human knowledge would be one of the only ways to improve performance) but, over a slightly longer time than a typical research project, massively more computation inevitably becomes available. Seeking an improvement that makes a difference in the shorter term, researchers seek to leverage their human knowledge of the domain, but the only thing that matters in the long run is the leveraging of computation. These two need not run counter to each other, but in practice they tend to. Time spent on one is time not spent on the other. There are psychological commitments to investment in one approach or the other. And the human-knowledge approach tends to complicate methods in ways that make them less suited to taking advantage of general methods leveraging computation. There were many examples of AI researchers’ belated learning of this bitter lesson, and it is instructive to review some of the most prominent.
In computer chess, the methods that defeated the world champion, Kasparov, in 1997, were based on massive, deep search. At the time, this was looked upon with dismay by the majority of computer-chess researchers who had pursued methods that leveraged human understanding of the special structure of chess. When a simpler, search-based approach with special hardware and software proved vastly more effective, these human-knowledge-based chess researchers were not good losers. They said that “brute force” search may have won this time, but it was not a general strategy, and anyway it was not how people played chess. These researchers wanted methods based on human input to win and were disappointed when they did not.
A similar pattern of research progress was seen in computer Go, only delayed by a further 20 years. Enormous initial efforts went into avoiding search by taking advantage of human knowledge, or of the special features of the game, but all those efforts proved irrelevant, or worse, once search was applied effectively at scale. Also important was the use of learning by self play to learn a value function (as it was in many other games and even in chess, although learning did not play a big role in the 1997 program that first beat a world champion). Learning by self play, and learning in general, is like search in that it enables massive computation to be brought to bear. Search and learning are the two most important classes of techniques for utilizing massive amounts of computation in AI research. In computer Go, as in computer chess, researchers’ initial effort was directed towards utilizing human understanding (so that less search was needed) and only much later was much greater success had by embracing search and learning.
In speech recognition, there was an early competition, sponsored by DARPA, in the 1970s. Entrants included a host of special methods that took advantage of human knowledge—knowledge of words, of phonemes, of the human vocal tract, etc. On the other side were newer methods that were more statistical in nature and did much more computation, based on hidden Markov models (HMMs). Again, the statistical methods won out over the human-knowledge-based methods. This led to a major change in all of natural language processing, gradually over decades, where statistics and computation came to dominate the field. The recent rise of deep learning in speech recognition is the most recent step in this consistent direction. Deep learning methods rely even less on human knowledge, and use even more computation, together with learning on huge training sets, to produce dramatically better speech recognition systems. As in the games, researchers always tried to make systems that worked the way the researchers thought their own minds worked—they tried to put that knowledge in their systems—but it proved ultimately counterproductive, and a colossal waste of researcher’s time, when, through Moore’s law, massive computation became available and a means was found to put it to good use.
In computer vision, there has been a similar pattern. Early methods conceived of vision as searching for edges, or generalized cylinders, or in terms of SIFT features. But today all this is discarded. Modern deep-learning neural networks use only the notions of convolution and certain kinds of invariances, and perform much better.
This is a big lesson. As a field, we still have not thoroughly learned it, as we are continuing to make the same kind of mistakes. To see this, and to effectively resist it, we have to understand the appeal of these mistakes. We have to learn the bitter lesson that building in how we think we think does not work in the long run. The bitter lesson is based on the historical observations that 1) AI researchers have often tried to build knowledge into their agents, 2) this always helps in the short term, and is personally satisfying to the researcher, but 3) in the long run it plateaus and even inhibits further progress, and 4) breakthrough progress eventually arrives by an opposing approach based on scaling computation by search and learning. The eventual success is tinged with bitterness, and often incompletely digested, because it is success over a favored, human-centric approach.
One thing that should be learned from the bitter lesson is the great power of general purpose methods, of methods that continue to scale with increased computation even as the available computation becomes very great. The two methods that seem to scale arbitrarily in this way are search and learning.
The second general point to be learned from the bitter lesson is that the actual contents of minds are tremendously, irredeemably complex; we should stop trying to find simple ways to think about the contents of minds, such as simple ways to think about space, objects, multiple agents, or symmetries. All these are part of the arbitrary, intrinsically-complex, outside world. They are not what should be built in, as their complexity is endless; instead we should build in only the meta-methods that can find and capture this arbitrary complexity. Essential to these methods is that they can find good approximations, but the search for them should be by our methods, not by us. We want AI agents that can discover like we can, not which contain what we have discovered. Building in our discoveries only makes it harder to see how the discovering process can be done.
从 70 年的人工智能研究中可以得到的最大教训是,利用计算的通用方法最终是最有效的,而且是最大的优势。其根本原因是摩尔定律,或者更确切地说是其对单位计算成本持续呈指数下降的概括。大多数人工智能研究都是在代理可用的计算是恒定的情况下进行的(在这种情况下,利用人类知识将是提高性能的唯一方法之一),但与典型的研究项目相比,在稍长的时间内,不可避免地会出现大量的计算可用。为了寻求在短期内产生影响的改进,研究人员试图利用他们对该领域的人类知识,但从长远来看,唯一重要的是利用计算。这两者不必相互矛盾,但在实践中它们往往是相互矛盾的。花在其中一个上的时间就是没有花在另一个上的时间。人们在心理上承诺投资于一种方法或另一种方法。而人类知识方法往往会使方法复杂化,使其不太适合利用利用计算的通用方法。人工智能研究人员迟迟没有吸取这一惨痛教训的例子有很多,回顾一下其中最突出的一些例子很有启发意义。
在计算机象棋中,1997 年击败世界冠军卡斯帕罗夫的方法是基于大规模深度搜索。当时,大多数计算机象棋研究人员对此感到沮丧,他们一直在寻求利用人类对象棋特殊结构的理解的方法。当一种更简单的、基于搜索的方法加上特殊的硬件和软件被证明更为有效时,这些基于人类知识的象棋研究人员就不是善于输的人了。他们说,“蛮力”搜索这次可能赢了,但这不是一种通用策略,而且无论如何它也不是人们下棋的方式。
这些研究人员希望基于人类输入的方法能够获胜,但结果却令他们失望。 计算机围棋也出现了类似的研究进展模式,只是推迟了 20 年。最初,人们付出了巨大的努力,利用人类知识或游戏的特殊功能来避免搜索,但一旦搜索被大规模有效应用,所有这些努力都被证明是无关紧要的,甚至更糟。同样重要的是使用自学来学习价值函数(就像在许多其他游戏甚至国际象棋中一样,尽管学习在 1997 年首次击败世界冠军的程序中并没有发挥重要作用)。自学和一般的学习就像搜索一样,因为它能够发挥大规模计算的作用。搜索和学习是人工智能研究中利用大量计算的两类最重要的技术。在计算机围棋中,就像在计算机国际象棋中一样,研究人员最初的努力是利用人类的理解力(这样就不需要太多的搜索),直到后来,通过采用搜索和学习才取得了更大的成功。
在语音识别方面,20 世纪 70 年代,DARPA 赞助了一场早期的竞赛。参赛者包括大量利用人类知识(单词、音素、人类声道等知识)的特殊方法。另一方面,一些较新的方法更具统计性质,并且基于隐马尔可夫模型 (HMM) 进行更多的计算。统计方法再次战胜了基于人类知识的方法。这导致了整个自然语言处理领域发生了重大变化,几十年来,统计和计算逐渐占据了主导地位。语音识别中深度学习的兴起是朝着这一一致方向迈出的最新一步。深度学习方法更少地依赖人类知识,使用更多的计算,再加上对大量训练集的学习,从而产生了更好的语音识别系统。就像在游戏中一样,研究人员总是试图制造出按照他们认为自己的想法运作的系统——他们试图将这些知识放入他们的系统中——但最终却适得其反,浪费了研究人员大量的时间,而摩尔定律让大规模计算成为可能,并找到了一种充分利用它的方法。
在计算机视觉中,也有类似的模式。早期的方法将视觉设想为搜索边缘、广义圆柱体或 SIFT 特征。但今天所有这些都被抛弃了。现代深度学习神经网络只使用卷积和某些类型的不变性的概念,而且表现要好得多。
这是一个很大的教训。作为一个领域,我们还没有彻底学会它,因为我们还在继续犯同样的错误。要看到这一点,并有效地抵制它,我们必须了解这些错误的吸引力。我们必须学会不那么痛苦.
nvidia-smi返回的是driver所能支持的最新的cuda版本
系统安装的cuda版本可以随意,torch会优先使用虚拟环境中安装的cuda版本
安装指定版本cuda-toolkit
1 | conda install nvidia/label/cuda-12.4.0::cuda-toolkit -c nvidia/label/cuda-12.4.0 |
安装最新版本
1 | conda install cuda-toolkit |
某些仓库需要指定cuda路径才能编译包
1 | conda env config vars set LD_LIBRARY_PATH="/home/cyl/miniconda3/envs/gsam/lib/python3.10/site-packages/nvidia/cuda_runtime/lib/:$LD_LIBRARY_PATH" |
Note: 注意改变了库路径之后nvim中的lsp会报错,建议之后改回去
1 | conda env config vars set LD_LIBRARY_PATH="" |
Note: To find the correct path for CUDA_HOME use which nvcc. In my case, output of the command was:
1 | >>> which nvcc |
Therefore, I set the CUDA_HOME as /home/user/miniconda3/envs/py12/.
Note: To find the correct path for LD_LIBRARY_PATH use find ~ -name cuda_runtime_api.h. In my case, output of the command was:
1 | >>> find ~ -name cuda_runtime_api.h |
So I set the LD_LIBRARY_PATH as /home/user/miniconda3/envs/py12/targets/x86_64-linux/lib/ and CPATH as /home/user/miniconda3/envs/py12/targets/x86_64-linux/include/. If you have multiple CUDA installations, the output of find ~ -name cuda_runtime_api.h will display multiple paths. Make sure to choose the path that corresponds to the environment you have created.
ref:https://github.com/IDEA-Research/GroundingDINO/issues/355
Note: Always reboot the computer after the cuda is upgraded
Note: 在更改LD_LIBRARY_PATH后可能会导致neovim的pyright无法运行,所以建议在编译完成后设回该变量
1 | conda env config vars set LD_LIBRARY_PATH="" |
cudatoolkit和cuda-toolkit这两个可以同时安装
如果不安装cudatoolkit可能会在编译时出现ld: cannot find -lcudart: No such file or directory collect2: error: ld returned 1 exit status 报错
使用以下指令获取版本信息
1 | python -c 'import torch;print(torch.__version__);print(torch.version.cuda)' |
1 | 2.0.0+cu117 |
Use SSH to Connect Jupyter-lab
使用ssh作为命令行远程工具,启动远程的jupyter lab并且在本地的浏览器中打开。
远程运行:
1 | jupyter lab --no-browser --port=8080 |
--no-brosweris very important
output:
1 | ... |
本地运行:
1 | ssh -L 8080:localhost:8080 bohanfeng@192.168.2.102 |
本地浏览器访问:
http://127.0.0.1:8080/lab?token=0061d1eb31396b1bc3cd77a7161b2084da1dedcdeca0600c
ACDC- Automated Creation of Digital Cousins for Robust Policy Learning
数字孪生(DT)作为现实世界非常精确的映射虽然可以用于高精度的训练但是生产DT资产过于繁琐且没有泛化性,不能做到zero-shot。
数字表亲(DC)通过比对模型特征,从模型库中选择类似的表亲模型,用于重建场景训练机械臂。让机械臂针对不同第一次见的场景具有泛化性。
(a)它减少了手动微调的需要,以保证一定的保真度,从而能够完全自动化地创建数字表亲,(b)它通过提供一组增强的场景来训练机器人策略,从而有助于更好地应对原始场景中的变化。
ACDC is our automated pipeline for generating fully interactive simulated scenes from a single RGB image, and is broken down into three steps:
(1) an extraction step, in which relevant object masks are extracted from the raw input image
(2) a matching step, in which we select digital cousins for individual objects extracted from the original scene
(3) a generation step, in which the selected digital cousins are post-processed and compiled together to form a fully-interactive, physically-plausible digital cousin scene.