Cross-Domain Few-shot Learning學習目錄

Yanwei Liu
6 min readNov 11, 2020

--

閉集分類問題(closed-set problem),即測試和訓練的每個類別都有具體的標籤,不包含未知的類別(unknown category or unseen category);如著名的MNIST和ImageNet數據集,裡麵包含的每個類別為確定的。以MNIST(字符分類)為例,裡麵包含了0~9的數字類別,測試時也是0~9的類別,並不包含如字母A~Z等的未知類別,閉集分類問題即:區分這10個類別開集分類問題(open-set problem),不僅僅包含0~9的數字類別,還包含其他如A~Z等等的未知類別,但是這些未知的類別並沒有標籤,分類器無法知道這些未知類別裡面圖像的具體類別,如:是否是A,這些許許多多的不同類別圖像共同構成了一個類別:未知類別,在檢測裡面我們叫做背景類別(background),而開集分類問題即:區分這10個類別且拒絕其他未知類別

[推薦第一篇閱讀的文章][Done]few-shot learning, zero-shot learning, one-shot learning,any-shot learning, C-way K-shot,Meta-learn
https://blog.csdn.net/Strive_For_Future/article/details/108940405

如同標題所示,這篇文章很淺顯易懂的介紹了這些專業術語的概念,可以做為研究者學習該領域知識的入門文章,搞懂了這些概念後,後續的研究就不容易因為名詞很類似而造成混淆。

我自己對這些名詞的理解:

few-shot learning: 只有很少的樣本的學習方式
zero-shot learning: 無任何樣本的學習方式
one-shot learning: 只有1個樣本的學習方式
any-shot learning: 可以作為few-shot和zero-shot的總稱,也可指任意樣本數的學習方式。
C-way K-shot: 有C個class,K個sample的表示,是few-shot learning常使用到的名詞。meta learning: 學習如何學習(learning how to learn)。可分為兩個階段,首先是1. meta-training(support set): 模型先在ImageNet上「學習如何進行圖像分類」
2. meta-testing(query set): 將meta training學會的知識(如何進行圖像分類)(模型權重),繼續運用在目前(少樣本學習)的訓練過程。
meta-training和meta-testing的概念相當容易混淆。例子:貓熊的照片只有1張,但是我有很多貓、狗、鴨子、鳥、大象的照片。先在貓、狗、鴨子、鳥、大象的照片上進行訓練稱作(meta-training),學會如何圖像分類後,再進行貓熊的分類模型訓練(meta-testing)。Zero-shot learning(ZSL) vs. Generalized zero-shot learning(GZSL):
ZSL和GZSL在訓練時都使用Seen classes的Images和Attributes
ZSL測試時只使用Unseen classes;GZSL測試時使用Seen classes和Unseen Classes

Inductive ZSL vs. Transductive ZSL

# https://www.zhihu.com/question/68275921Inductive setting:在Training階段不使用unlabelled test features are not used,一般的Surpervised Learning都屬於這一類transductive setting : 在Training階段使用unlabelled test features(只有圖片資料)transductive和inductive的區別在於我們想要預測的樣本,是不是我們在訓練的時候已經見(用)過的。通常Transductive比inductive的效果要好,因為inductive沒有testing的data

Generalized Zero-shot learning的評斷指標:
harmonic mean accuracy (H),跟F1 Score很類似的概念,能平衡Seen和UnSeen的Accuracy

harmonic mean accuracy (H)

[Done]Zero-Shot Learning — A Comprehensive Evaluation of the Good, the Bad and the Ugly

  • proposed the Harmonic mean of seen and unseen class accuracy as a unified measure for performance in GZSL setting
  • propose a new zero-shot learning dataset, the Animals with Attributes 2 (AWA2)
  • 提出了數種有Source Code的ZSL方法的HM比較

https://arxiv.org/abs/1707.00600

[網路上有人整理好的少樣本學習論文清單]Small Data Paper

Few shot learning的Standard tasks和Random tasks

取自Mutual-Information Based Few-Shot Classification

Zero-shot learning中dataset中的mat檔案是如何建立的?

========================================Zero-Shot Learning — A Comprehensive Evaluation of the Good, the Bad and the Ugly
========================================

res101.mat
Our image embeddings are 2048-dim top-layer pooling units of the 101-layered ResNet We use the original ResNet-101 that is pre-trained on ImageNet with 1K classes, i.e. the balanced subset, and we do not fine-tune it for any of the mentioned datasets

att_splits.mat
for aPY, AWA1, AWA2, CUB and SUN, we use the per-class attributes between values 0 and 1 that are provided with the datasets

For ImageNet as attributes of 21K classes are not available, we use Word2Vec [27] trained on Wikipedia provided by [14]. Note that an evaluation of class embeddings is out of the scope of this paper. We refer the reader to [9] for more details on the topic.

[27] T. Mikolov, I. Sutskever, K. Chen, G. S. Corrado, and J. Dean, “Distributed representations of words and phrases and their compositionality,”in NIPS, 2013. 2, 3, 6

[14] S. Changpinyo, W.-L. Chao, B. Gong, and F. Sha, “Synthesized classifiers
for zero-shot learning,” in CVPR, 2016. 1, 2, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13

========================================Feature Generating Networks for Zero-Shot Learning(cvpr18xian)
========================================

res101.mat

As real CNN features, we extract 2048-dim toplayer pooling units of the 101-layered ResNet [21] from the entire image. We do not do any image pre-processing such as cropping or use any other data augmentation techniques. ResNet is pre-trained on ImageNet 1K and not fine-tuned. As synthetic CNN features, we generate 2048-dim CNN features using our f-xGAN model.

att_splits.mat

As the class embedding, unless it is stated otherwise, we use per-class attributes for AWA (85-dim), CUB (312-dim) and SUN (102-dim). Furthermore, for CUB and Flowers, we extract 1024-dim character-based CNN-RNN [35] features from fine-grained visual descriptions (10 sentences per image). None of the sentences are seen during training the CNN-RNN. We build per-class sentences by averaging the CNN-RNN features that belong to the same class.

Implementation details. In all f-xGAN models, both the generator and the discriminator are MLP with LeakyReLU activation. The generator consists of a single hidden layer with 4096 hidden units. Its output layer is ReLU because we aim to learn the top max-pooling units of ResNet-101. While the discriminator of f-GAN has one hidden layer with 1024 hidden units in order to stabilize the GAN training, the discriminators of f-WGAN and f-CLSWGAN have one hidden layer with 4096 hidden units as WGAN [19] does not have instability issues thus a stronger discriminator can be applied here. We do not apply batch normalization our empirical evaluation showed a significant degradation of the accuracy when batch normalization is used. The noise z is drawn from a unit Gaussian with the same dimensionality as the class embedding. We use λ = 10 as suggested in [19] and β = 0.01 across all the datasets

========================================
Latent Embeddings for Zero-shot Classification
========================================

res101.mat

Image and class embeddings. In our latent embedding (LatEm) model, the image embeddings (image features) and class embeddings (side information) are two essential components. To facilitate direct comparison with the state-of-the-art, we use the embeddings provided by [2]. Briefly, as image embeddings we use the 1, 024 dimensional outputs of the top-layer pooling units of the pre-trained GoogleNet [36] extracted from the whole image. We do not do any task specific pre-processing on images such as cropping foreground objects.

att_splits.mat

As class embeddings we evaluate four different alternatives, i.e. attributes (att), word2vec (w2v), glove (glo) and hierarchies (hie). Attributes [20, 9] are distinguishing properties of objects that are obtained through human annotation. For fine-grained datasets such as CUB and Dogs, as objects are visually very similar to each other, a large number of attributes are needed. Among the three datasets used, CUB contains 312 attributes, AWA contains 85 attributes while Dogs does not contain annotations for attributes. Our attribute class embedding is a vector per-class measuring the strength of each attribute based on human judgment.

Generalized Zero-Shot Learning using Multimodal Variational Auto-Encoder with Semantic Concepts

M-VAE

https://arxiv.org/abs/2106.14082

[Done] Will Multi-modal Data Improves Few-shot Learning?
https://arxiv.org/abs/2107.11853

[Done]Leveraging the Feature Distribution in Transfer-based Few-Shot Learning(PT+MAP)

https://arxiv.org/abs/2006.03806

https://github.com/yhu01/PT-MAP

[Done]Zero-sample surface defect detection and classification based on semantic feedback neural network
使用cylinder liner(汽缸) 瑕疵資料進行Zero shot learning

https://arxiv.org/abs/2106.07959

[Done]Anomaly Detection for Solder Joints Using β-VAE(使用β-VAE來進行焊點的異常偵測,效果比VAE還要好約略1個百分比)
https://arxiv.org/abs/2104.11927

[Done]Adaptive and Generative Zero-Shot Learning
http://www.csie.ntu.edu.tw/~htlin/paper/doc/iclr21agzsl.pdf

[Done]Generalized Zero- and Few-Shot Learning via Aligned Variational Autoencoders(CADA-VAE)
https://arxiv.org/abs/1812.01784

[Done]Concept Learners for Few-Shot Learning(COMET)
https://arxiv.org/abs/2007.07375

[Done]From Generalized zero-shot learning to long-tail with class descriptors(DRAGON)
https://arxiv.org/abs/2004.02235

[Done]Large-Scale Zero-Shot Image Classification from Rich and Diverse Textual Descriptions
http://arxiv.org/abs/2103.09669

[Done]f-VAEGAN-D2: A Feature Generating Framework for Any-Shot Learning(f-VAEGAN)
https://arxiv.org/abs/1903.10132

[Done]Latent Embedding Feedback and Discriminative Features for Zero-Shot Classification(tf-VAEGAN)
https://arxiv.org/abs/2003.07833
[中文介紹]
https://blog.csdn.net/weixin_39704651/article/details/108803378

[Done]A Survey of Zero-Shot Learning: Settings, Methods, and Applications
https://www.ntulily.org/wp-content/uploads/journal/A_Survey_of_Zero-Shot_Learning_Settings_Methods_and_Applications_accepted.pdf

[Done]Learning from Few Samples: A Survey
https://arxiv.org/abs/2007.15484

[Done]Generalizing from a Few Examples: A Survey on Few-Shot Learning
https://arxiv.org/abs/1904.05046

[Done][入門科普文][推薦第一篇閱讀的文章]Zero-Shot Learning
https://cetinsamet.medium.com/zero-shot-learning-53080995d45f
本文除了很入門的介紹Zero-Shot learning外,也有程式碼實作

[Done][入門科普文]What Is Zero-Shot Learning?
https://analyticsindiamag.com/what-is-zero-shot-learning/

[Done][中文論文]A decadal survey of zero-shot image classification
https://engine.scichina.com/publisher/scp/journal/SSI/49/10/10.1360/N112018-00312?slug=fulltext

[Done][中文論文]Research and Development on Zero-Shot Learning
https://engine.scichina.com/publisher/zhongkeqikan/journal/AAS2/46/1/10.16383/j.aas.c180429?slug=fulltext

[Done][課程教材]Meta Learning (Part 1)
http://speech.ee.ntu.edu.tw/~tlkagk/courses/ML_2019/Lecture/Meta1%20(v6).pdf

[Done][課程教材]Meta Learning (Part 2)
http://speech.ee.ntu.edu.tw/~tlkagk/courses/ML_2019/Lecture/Meta2%20(v4).pdf

[Done][課程教材]ML 108–2 Domain Adaptation.pdf
https://drive.google.com/file/d/15wlfUtTmnb4cEAHZtNJ9_jJE26nSNhAX/view

[Done][課程教材]Meta Learning & More
http://speech.ee.ntu.edu.tw/~tlkagk/courses/ML2020/Meta_learning_and_more.pdf

[Done][課程教材]Recurrent Neural Networks & Transformer (I) Meta-Learning; Few-Shot and Zero-Shot Classification (I)
http://vllab.ee.ntu.edu.tw/uploads/1/1/1/6/111696467/dlcv_w11.pdf

[Done][課程教材]Meta-Learning; Few-Shot and Zero-Shot Classification (II)
http://vllab.ee.ntu.edu.tw/uploads/1/1/1/6/111696467/dlcv_w12.pdf

[Done][課程教材]From Domain Adaptation to Domain Generalization
http://vllab.ee.ntu.edu.tw/uploads/1/1/1/6/111696467/dlcv_w13.pdf

[Done] [2010.03522] A Survey of Deep Meta-Learning
https://arxiv.org/abs/2010.03522

[Done] [1904.04232] A Closer Look at Few-shot Classification
https://arxiv.org/abs/1904.04232

[Done][2101.11461] Machine learning with limited data
https://arxiv.org/abs/2101.11461

[Done][2001.08735] Cross-Domain Few-Shot Classification via Learned Feature-Wise Transformation
https://arxiv.org/abs/2001.08735

[Done] 【論文閱讀】[meta learning]cross-domain few-shot classification via learned feature-wise transformation
https://blog.csdn.net/weixin_44919895/article/details/105374274

[Done] 小樣本學習跨域(Cross-domain)問題總結- 知乎
https://zhuanlan.zhihu.com/p/257805250

[Done] A Survey of Cross-Domain Few-shot Learning — 知乎
https://zhuanlan.zhihu.com/p/277709444

[Done]小樣本學習方法(FSL)演變過程
https://zhuanlan.zhihu.com/p/149983811

[Done]零次学习(Zero-Shot Learning)入门 — 知乎
https://zhuanlan.zhihu.com/p/34656727

[1710.03463] Learning to Generalize: Meta-Learning for Domain Generalization
https://arxiv.org/abs/1710.03463

[1912.07200] A Broader Study of Cross-Domain Few-Shot Learning
https://arxiv.org/abs/1912.07200

[2004.14164] MICK: A Meta-Learning Framework for Few-shot Relation Classification with Little Training Data
https://arxiv.org/abs/2004.14164

[2005.10544] Cross-Domain Few-Shot Learning with Meta Fine-Tuning
https://arxiv.org/abs/2005.10544

[2006.11384] A Transductive Multi-Head Model for Cross-Domain Few-Shot Learning
https://arxiv.org/abs/2006.11384

[2010.06498] Cross-Domain Few-Shot Learning by Representation Fusion
https://arxiv.org/abs/2010.06498

[2011.00179] Combining Domain-Specific Meta-Learners in the Parameter Space for Cross-Domain Few-Shot Classification
https://arxiv.org/abs/2011.00179

Explain and Improve: Cross-Domain Few-Shot-Learning Using Explanations
http://interpretable-ml.org/icml2020workshop/pdf/22.pdf

--

--

No responses yet