Posted 2025-07-06Updated 2025-07-24Notea few seconds read (About 54 words)

使用ssh作为命令行远程工具，启动远程的tensorboardx并且在本地的浏览器中打开。

远程运行：

1	tensorboard --logdir <path> --port 6006

本地运行：

1	ssh -N -f -L localhost:16006:localhost:6006 bohan@10.11.16.146

Posted 2025-06-11Updated 2025-07-24Notea few seconds read (About 26 words)

数据挖掘考试纲要-中文

01:

02:

03:

04:

05:

06:

07:

10:

12:

13:

14:

15:

16:

Posted 2025-06-11Updated 2025-07-24Notea few seconds read (About 13 words)

数据挖掘考试纲要

01:

02:

03:

04:

05:

06:

07:

10:

12:

13:

14:

15:

16:

Posted 2024-12-17Updated 2025-07-24Note17 minutes read (About 2557 words)

The Bitter Lesson

Rich Sutton

March 13, 2019

The biggest lesson that can be read from 70 years of AI research is that general methods that leverage computation are ultimately the most effective, and by a large margin. The ultimate reason for this is Moore’s law, or rather its generalization of continued exponentially falling cost per unit of computation. Most AI research has been conducted as if the computation available to the agent were constant (in which case leveraging human knowledge would be one of the only ways to improve performance) but, over a slightly longer time than a typical research project, massively more computation inevitably becomes available. Seeking an improvement that makes a difference in the shorter term, researchers seek to leverage their human knowledge of the domain, but the only thing that matters in the long run is the leveraging of computation. These two need not run counter to each other, but in practice they tend to. Time spent on one is time not spent on the other. There are psychological commitments to investment in one approach or the other. And the human-knowledge approach tends to complicate methods in ways that make them less suited to taking advantage of general methods leveraging computation. There were many examples of AI researchers’ belated learning of this bitter lesson, and it is instructive to review some of the most prominent.

In computer chess, the methods that defeated the world champion, Kasparov, in 1997, were based on massive, deep search. At the time, this was looked upon with dismay by the majority of computer-chess researchers who had pursued methods that leveraged human understanding of the special structure of chess. When a simpler, search-based approach with special hardware and software proved vastly more effective, these human-knowledge-based chess researchers were not good losers. They said that “brute force” search may have won this time, but it was not a general strategy, and anyway it was not how people played chess. These researchers wanted methods based on human input to win and were disappointed when they did not.

A similar pattern of research progress was seen in computer Go, only delayed by a further 20 years. Enormous initial efforts went into avoiding search by taking advantage of human knowledge, or of the special features of the game, but all those efforts proved irrelevant, or worse, once search was applied effectively at scale. Also important was the use of learning by self play to learn a value function (as it was in many other games and even in chess, although learning did not play a big role in the 1997 program that first beat a world champion). Learning by self play, and learning in general, is like search in that it enables massive computation to be brought to bear. Search and learning are the two most important classes of techniques for utilizing massive amounts of computation in AI research. In computer Go, as in computer chess, researchers’ initial effort was directed towards utilizing human understanding (so that less search was needed) and only much later was much greater success had by embracing search and learning.

In speech recognition, there was an early competition, sponsored by DARPA, in the 1970s. Entrants included a host of special methods that took advantage of human knowledge—knowledge of words, of phonemes, of the human vocal tract, etc. On the other side were newer methods that were more statistical in nature and did much more computation, based on hidden Markov models (HMMs). Again, the statistical methods won out over the human-knowledge-based methods. This led to a major change in all of natural language processing, gradually over decades, where statistics and computation came to dominate the field. The recent rise of deep learning in speech recognition is the most recent step in this consistent direction. Deep learning methods rely even less on human knowledge, and use even more computation, together with learning on huge training sets, to produce dramatically better speech recognition systems. As in the games, researchers always tried to make systems that worked the way the researchers thought their own minds worked—they tried to put that knowledge in their systems—but it proved ultimately counterproductive, and a colossal waste of researcher’s time, when, through Moore’s law, massive computation became available and a means was found to put it to good use.

In computer vision, there has been a similar pattern. Early methods conceived of vision as searching for edges, or generalized cylinders, or in terms of SIFT features. But today all this is discarded. Modern deep-learning neural networks use only the notions of convolution and certain kinds of invariances, and perform much better.

This is a big lesson. As a field, we still have not thoroughly learned it, as we are continuing to make the same kind of mistakes. To see this, and to effectively resist it, we have to understand the appeal of these mistakes. We have to learn the bitter lesson that building in how we think we think does not work in the long run. The bitter lesson is based on the historical observations that 1) AI researchers have often tried to build knowledge into their agents, 2) this always helps in the short term, and is personally satisfying to the researcher, but 3) in the long run it plateaus and even inhibits further progress, and 4) breakthrough progress eventually arrives by an opposing approach based on scaling computation by search and learning. The eventual success is tinged with bitterness, and often incompletely digested, because it is success over a favored, human-centric approach.

One thing that should be learned from the bitter lesson is the great power of general purpose methods, of methods that continue to scale with increased computation even as the available computation becomes very great. The two methods that seem to scale arbitrarily in this way are search and learning.

The second general point to be learned from the bitter lesson is that the actual contents of minds are tremendously, irredeemably complex; we should stop trying to find simple ways to think about the contents of minds, such as simple ways to think about space, objects, multiple agents, or symmetries. All these are part of the arbitrary, intrinsically-complex, outside world. They are not what should be built in, as their complexity is endless; instead we should build in only the meta-methods that can find and capture this arbitrary complexity. Essential to these methods is that they can find good approximations, but the search for them should be by our methods, not by us. We want AI agents that can discover like we can, not which contain what we have discovered. Building in our discoveries only makes it harder to see how the discovering process can be done.

从 70 年的人工智能研究中可以得到的最大教训是，利用计算的通用方法最终是最有效的，而且是最大的优势。其根本原因是摩尔定律，或者更确切地说是其对单位计算成本持续呈指数下降的概括。大多数人工智能研究都是在代理可用的计算是恒定的情况下进行的（在这种情况下，利用人类知识将是提高性能的唯一方法之一），但与典型的研究项目相比，在稍长的时间内，不可避免地会出现大量的计算可用。为了寻求在短期内产生影响的改进，研究人员试图利用他们对该领域的人类知识，但从长远来看，唯一重要的是利用计算。这两者不必相互矛盾，但在实践中它们往往是相互矛盾的。花在其中一个上的时间就是没有花在另一个上的时间。人们在心理上承诺投资于一种方法或另一种方法。而人类知识方法往往会使方法复杂化，使其不太适合利用利用计算的通用方法。人工智能研究人员迟迟没有吸取这一惨痛教训的例子有很多，回顾一下其中最突出的一些例子很有启发意义。
在计算机象棋中，1997 年击败世界冠军卡斯帕罗夫的方法是基于大规模深度搜索。当时，大多数计算机象棋研究人员对此感到沮丧，他们一直在寻求利用人类对象棋特殊结构的理解的方法。当一种更简单的、基于搜索的方法加上特殊的硬件和软件被证明更为有效时，这些基于人类知识的象棋研究人员就不是善于输的人了。他们说，“蛮力”搜索这次可能赢了，但这不是一种通用策略，而且无论如何它也不是人们下棋的方式。

这些研究人员希望基于人类输入的方法能够获胜，但结果却令他们失望。计算机围棋也出现了类似的研究进展模式，只是推迟了 20 年。最初，人们付出了巨大的努力，利用人类知识或游戏的特殊功能来避免搜索，但一旦搜索被大规模有效应用，所有这些努力都被证明是无关紧要的，甚至更糟。同样重要的是使用自学来学习价值函数（就像在许多其他游戏甚至国际象棋中一样，尽管学习在 1997 年首次击败世界冠军的程序中并没有发挥重要作用）。自学和一般的学习就像搜索一样，因为它能够发挥大规模计算的作用。搜索和学习是人工智能研究中利用大量计算的两类最重要的技术。在计算机围棋中，就像在计算机国际象棋中一样，研究人员最初的努力是利用人类的理解力（这样就不需要太多的搜索），直到后来，通过采用搜索和学习才取得了更大的成功。

在语音识别方面，20 世纪 70 年代，DARPA 赞助了一场早期的竞赛。参赛者包括大量利用人类知识（单词、音素、人类声道等知识）的特殊方法。另一方面，一些较新的方法更具统计性质，并且基于隐马尔可夫模型 (HMM) 进行更多的计算。统计方法再次战胜了基于人类知识的方法。这导致了整个自然语言处理领域发生了重大变化，几十年来，统计和计算逐渐占据了主导地位。语音识别中深度学习的兴起是朝着这一一致方向迈出的最新一步。深度学习方法更少地依赖人类知识，使用更多的计算，再加上对大量训练集的学习，从而产生了更好的语音识别系统。就像在游戏中一样，研究人员总是试图制造出按照他们认为自己的想法运作的系统——他们试图将这些知识放入他们的系统中——但最终却适得其反，浪费了研究人员大量的时间，而摩尔定律让大规模计算成为可能，并找到了一种充分利用它的方法。

在计算机视觉中，也有类似的模式。早期的方法将视觉设想为搜索边缘、广义圆柱体或 SIFT 特征。但今天所有这些都被抛弃了。现代深度学习神经网络只使用卷积和某些类型的不变性的概念，而且表现要好得多。

这是一个很大的教训。作为一个领域，我们还没有彻底学会它，因为我们还在继续犯同样的错误。要看到这一点，并有效地抵制它，我们必须了解这些错误的吸引力。我们必须学会不那么痛苦.

Posted 2024-12-03Updated 2025-07-24Note4 minutes read (About 609 words)

FLamby

Repository: https://github.com/owkin/FLamby

Installation

git clone https://github.com/owkin/FLamby.git
cd FLamby
conda env create -f environment.yml
conda activate flamby
pip install -e .[all_extra]
pip install wget
pip install lifelines
pip install jupyterlab

Dataset

Fed-TCGA-BCRA
https://owkin.github.io/FLamby/fed_tcga_brca.html

Baseline Learning

import torch
from flamby.utils import evaluate_model_on_tests

# 2 lines of code to change to switch to another dataset
from flamby.datasets.fed_tcga_brca import (
    BATCH_SIZE,
    LR,
    NUM_EPOCHS_POOLED,
    Baseline,
    BaselineLoss,
    metric,
    NUM_CLIENTS,
    Optimizer,
)
from flamby.datasets.fed_tcga_brca import FedTcgaBrca as FedDataset

Import several macros, datasets and metrics.

# Instantiation of local train set (and data loader)), baseline loss function, baseline model, default optimizer
train_dataset = FedDataset(center=0, train=True, pooled=False)
train_dataloader = torch.utils.data.DataLoader(train_dataset, batch_size=BATCH_SIZE, shuffle=True, num_workers=0)
lossfunc = BaselineLoss()
model = Baseline()
optimizer = Optimizer(model.parameters(), lr=LR)

In this script, the pooled parameter is set to False when creating the FedDataset instances. This indicates that the dataset is not pooled, meaning that the data is kept separate for each client or center. Each client or center has its own local dataset, which is a common setup in federated learning to simulate real-world scenarios where data is distributed across different locations or devices.

# Traditional pytorch training loop
for epoch in range(0, NUM_EPOCHS_POOLED):
    for idx, (X, y) in enumerate(train_dataloader):
        optimizer.zero_grad()
        outputs = model(X)
        loss = lossfunc(outputs, y)
        loss.backward()
        optimizer.step()

正常的训练流程

# Evaluation
# Instantiation of a list of the local test sets
test_dataloaders = [
            torch.utils.data.DataLoader(
                FedDataset(center=i, train=False, pooled=False),
                batch_size=BATCH_SIZE,
                shuffle=False,
                num_workers=0,
            )
            for i in range(NUM_CLIENTS)
        ]
# Function performing the evaluation
dict_cindex = evaluate_model_on_tests(model, test_dataloaders, metric)
print(dict_cindex)

使用的evaluation metric是lifelines.utils.concordance_index，返回的是c_index

Federated Learning

import torch
from flamby.utils import evaluate_model_on_tests

# 2 lines of code to change to switch to another dataset
from flamby.datasets.fed_tcga_brca import (
    BATCH_SIZE,
    LR,
    NUM_EPOCHS_POOLED,
    Baseline,
    BaselineLoss,
    metric,
    NUM_CLIENTS,
    get_nb_max_rounds
)
from flamby.datasets.fed_tcga_brca import FedTcgaBrca as FedDataset

# 1st line of code to change to switch to another strategy
from flamby.strategies.fed_avg import FedAvg as strat

use `FedAvg` as strategy

# We loop on all the clients of the distributed dataset and instantiate associated data loaders
train_dataloaders = [
            torch.utils.data.DataLoader(
                FedDataset(center = i, train = True, pooled = False),
                batch_size = BATCH_SIZE,
                shuffle = True,
                num_workers = 0
            )
            for i in range(NUM_CLIENTS)
        ]

lossfunc = BaselineLoss()
m = Baseline()

# Federated Learning loop
# 2nd line of code to change to switch to another strategy (feed the FL strategy the right HPs)
args = {
            "training_dataloaders": train_dataloaders,
            "model": m,
            "loss": lossfunc,
            "optimizer_class": torch.optim.SGD,
            "learning_rate": LR / 10.0,
            "num_updates": 100,
# This helper function returns the number of rounds necessary to perform approximately as many
# epochs on each local dataset as with the pooled training
            "nrounds": get_nb_max_rounds(100),
        }
s = strat(**args)
m = s.run()[0]

# Evaluation
# We only instantiate one test set in this particular case: the pooled one
test_dataloaders = [
            torch.utils.data.DataLoader(
                FedDataset(train = False, pooled = True),
                batch_size = BATCH_SIZE,
                shuffle = False,
                num_workers = 0,
            )
        ]
dict_cindex = evaluate_model_on_tests(m, test_dataloaders, metric)
print(dict_cindex)

FedAvg vs FedAvgFineTuning

FedAvg

FedAvgFineTuning

Posted 2024-10-24Updated 2025-07-24Review10 minutes read (About 1451 words)

Federated Learning Atlas

联邦学习（Federated Learning, FL）作为一种新兴的分布式机器学习方法，已经引起了大量研究的关注。要系统地理解联邦学习的相关研究，建议遵循以下结构化的阅读图谱，以便逐步加深对其原理、应用和挑战的理解。

1. 基础与概念性论文

这些论文介绍了联邦学习的基本概念、目标、以及经典算法，是了解联邦学习的起点。

Konečnỳ, J., et al. (2016). “Federated Learning: Strategies for Improving Communication Efficiency” arXiv
- 介绍了联邦学习的概念，提出了最早期的算法（如FedAvg），并讨论了如何优化通信效率。
McMahan, H. B., et al. (2017). “Communication-Efficient Learning of Deep Networks from Decentralized Data” arXiv
- 这篇论文提出了经典的Federated Averaging (FedAvg) 算法，系统阐述了在分布式环境下训练深度学习模型时的通信效率问题。
Yang, Q., Liu, Y., Cheng, Y., Kang, Y., Chen, T., & Yu, H. (2019). “Federated Learning” ACM Transactions on Intelligent Systems and Technology (TIST)
- 详细综述了联邦学习的基本框架、挑战、技术和应用，适合作为综述性的阅读材料。

2. 隐私保护与安全性

联邦学习的一个重要目标是确保数据的隐私和安全，这一领域的研究为其提供了理论基础和技术手段。

Bonawitz, K., et al. (2017). “Practical Secure Aggregation for Federated Learning on User-Held Data” arXiv
- 讨论了如何在联邦学习中实现安全聚合（Secure Aggregation），即确保服务器无法知道单个客户端的模型更新内容，以保护用户隐私。
Geyer, R. C., Klein, T., & Nabi, M. (2017). “Differentially Private Federated Learning: A Client Level Perspective” arXiv
- 探讨了如何将差分隐私（Differential Privacy）应用于联邦学习中，以确保用户模型更新时的隐私。
Zhao, Y., et al. (2018). “Federated Learning with Non-IID Data” arXiv
- 讨论了在非独立同分布（non-IID）数据的情况下，如何在联邦学习中实现模型训练，这是实际应用中的重要挑战之一。

3. 优化与效率

联邦学习中的通信和计算效率问题是该领域的关键研究方向，许多研究尝试通过各种方法优化模型训练过程中的资源消耗。

Li, X., et al. (2020). “Federated Optimization in Heterogeneous Networks” arXiv
- 讨论了在客户端计算能力和网络资源异质性情况下如何进行联邦优化。
Kairouz, P., et al. (2021). “Advances and Open Problems in Federated Learning” arXiv
- 这篇论文对联邦学习的现状、挑战以及未来的研究方向进行了系统性综述，覆盖了通信效率、模型性能、隐私保护等多个方面。
Chen, M., et al. (2020). “Joint Learning and Communication Optimization for Federated Learning over Wireless Networks” arXiv
- 探讨了如何在无线网络环境下优化联邦学习中的学习效率和通信效率。

4. 系统实现与工具

要更好地理解联邦学习在实际中的应用和系统架构，可以参考一些开源框架和实际实现案例。

Google AI. “Federated Learning for Mobile Keyboard Prediction” Blog Post
- 这是联邦学习最早的实际应用之一，讲述了Google如何使用联邦学习提升手机键盘的预测能力。
TensorFlow Federated (TFF): GitHub
- TensorFlow Federated是Google推出的一个开源框架，用于实现联邦学习的系统实验。通过阅读其文档，可以深入理解联邦学习的具体实现细节。

5. 联邦学习在各领域的应用

联邦学习在诸多行业中都具有广泛的应用，了解这些应用有助于扩展对联邦学习实际意义的认识。

Rieke, N., et al. (2020). “The Future of Digital Health with Federated Learning” arXiv
- 探讨了联邦学习在医疗健康领域的应用，特别是在跨医院数据无法集中共享的情况下如何训练模型。
Hard, A., et al. (2019). “Federated Learning for Mobile Keyboard Prediction” arXiv
- 描述了联邦学习在智能手机上如何用于改善键盘输入的预测性能。

6. 联邦学习的挑战与未来研究方向

对于未来的研究，联邦学习还面临许多挑战，比如系统异质性、模型性能与隐私保护的平衡等。

Wang, J., et al. (2021). “Federated Learning: Challenges, Methods, and Future Directions” arXiv
- 这篇论文对联邦学习面临的主要挑战进行了分析，如数据不均衡、通信成本、模型性能等，并提出了一些未来的研究方向。

联邦学习论文阅读图谱总结：

基础理论：先了解联邦学习的基本框架和经典算法。
隐私与安全：深入研究数据隐私保护和安全机制。
优化与效率：关注如何优化联邦学习中的通信与计算。
系统实现：通过工具和实际案例理解系统实现细节。
应用领域：了解联邦学习在不同领域的实际应用。
挑战与未来方向：展望联邦学习的未来挑战和潜在研究方向。

通过这个图谱，你可以系统地了解联邦学习的关键领域，并逐步深入到各个具体问题的解决方法与研究前沿。

Posted 2024-10-22Updated 2025-07-24Review2 minutes read (About 329 words)

ProxiML -- Building Machine Learning Classifiers for Photonic Quantum Computing

https://dl.acm.org/doi/pdf/10.1145/3620666.3651367

Background

Qumode

Qumodes are a different way of carrying and manipulating quantum information than qubits.

就像二进制在电脑中的encoding方式，总共n bit的内存，那必然只能有 $2^n$ 种内存state，这是由二进制0或1的特性决定的。然后在让我们看qubit和qumode

States for qubits

If we go to qubits, not much in this picture changes. While a qubit has infinitely many possible states, it turns out that you should look at what is called the basis of the state space, which loosely said means that you should find the minimal number of states in which you can express every other state. For a qubit, this turns out to be two, for example the up state and the down state. To use the language from above, each qubit therefore has 2 ‘possible assignments’, and you have n of them, so by the arguments presented above, there are $2^ⁿ$ unique states. Because we are doing quantum mechanics, superpositions of these states are also allowed, but that doesn’t change the picture: the dimensionality of the system is still $2^ⁿ$.

Qubits是由单个光子的量子态决定的，的存储维数限制依然是 $2^n$

States for qumodes

相较于qubit,qumode针对的是一个光场的状态，理论上可以有无限state

Squeezing gates

Squeezing gates on the vacuum state generate different states in the qumodes S2 gate generates the Two-mode Squeezed Vacuum (TMSV) state when applied to the vacuum state |0, 0⟩ which can be mathematically expressed as

$z = re^{iφ}$ is the squeezing parameters

Posted 2023-05-10Updated 2025-07-246 minutes read (About 941 words)

ML Basics

Reference
Types

Reference

Deep-Reinforcement-Learning-With-Python

Types

Supervised learning

In supervised learning, the machine learns from training data. The training data consists of a labeled pair of inputs and outputs. So, we train the model (agent) using the training data in such a way that the model can generalize its learning to new unseen data. It is called supervised learning because the training data acts as a supervisor, since it has a labeled pair of inputs and outputs, and it guides the model in learning the given task.

Regression

Quantitative response
predict a quantitative variable from a set of features

Classification

Categorical response
predict a categorical variable

Unsupervised learning

Similar to supervised learning, in unsupervised learning, we train the model (agent) based on the training data. But in the case of unsupervised learning, the training data does not contain any labels; that is, it consists of only inputs and not outputs. The goal of unsupervised learning is to determine hidden patterns in the input. There is a common misconception that RL is a kind of unsupervised learning, but it is not. In unsupervised learning, the model learns the hidden structure, whereas, in RL, the model learns by maximizing the reward.

Reinforcement learning

Action space

The set of all possible actions in the environment is called the action space. Thus, for this grid world environment, the action space will be [up, down, left, right]. We can categorize action spaces into two types:

Discrete action space
When our action space consists of actions that are discrete, then it is called a discrete action space. For instance, in the grid world environment, our action space consists of four discrete actions, which are up, down, left, right, and so it is called a discrete action space.
Continuous action space
When our action space consists of actions that are continuous, then it is called a continuous action space. For instance, let’s suppose we are training an agent to drive a car, then our action space will consist of several actions that have continuous values, such as the speed at which we need to drive the car, the number of degrees we need to rotate the wheel, and so on. In cases where our action space consists of actions that are continuous, it is called a continuous action space.

Policy

A policy defines the agent’s behavior in an environment. The policy tells the agent what action to perform in each state.
Over a series of iterations, the agent will learn a good policy that gives a positive reward.
The optimal policy tells the agent to perform the correct action in each state so that the agent can receive a good reward.

Deterministic Policy
deterministic policy tells the agent to perform a one particular action in a state. Thus, the deterministic policy maps the state to one particular action
Stochastic Policy
maps the state to a probability distribution over an action space.
- Categorical policy
  when the action space is discrete
  uses categorical probability distribution over action space to select actions
- Gaussian policy
  when our action space is continuous
  the stochastic policy uses Gaussian probability distribution over action space to select actions when the action space is continuous

Episode

The agent interacts with the environment by performing some action starting from the initial state and reach the final state. This agent-environment interaction starting from the initial state until the final state is called an episode. For instance, in the car racing video game, the agent plays the game by starting from the initial state (starting point of the race) and reach the final state (endpoint of the race). This is considered an episode. An episode is also often called trajectory (path taken by the agent)

Episodic task
As the name suggests episodic task is the one that has the terminal state. That is, episodic tasks are basically tasks made up of episodes and thus they have a terminal state. Example: Car racing game.
Continuous task
Unlike episodic tasks, continuous tasks do not contain any episodes and so they don’t have any terminal state. For example, a personal assistance robot does not have a terminal state.

Horizon

Horizon is the time step until which the agent interacts with the environment. We can classify the horizon into two:

Finite horizon
If the agent environment interaction stops at a particular time step then it is called finite Horizon. For instance, in the episodic tasks agent interacts with the environment starting from the initial state at time step t =0 and reach the final state at a time step T. Since the agent environment interaction stops at the time step T, it is considered a finite horizon.
Infinite horizon
If the agent environment interaction never stops then it is called an infinite horizon. For instance, we learned that the continuous task does not have any terminal states, so the agent environment interaction will never stop in the continuous task and so it is considered an infinite horizon.

Return

Return is the sum of rewards received by the agent in an episode.

Value function

Value function or the value of the state is the expected return that the agent would get starting from the state $s$ following the policy $\pi$

Q function

implies the expected return agent would obtain starting from the state $s$ and an action $a$ following the policy $\pi$.