Domain Adaptation學習目錄
Domain adaptation is a type of transductive transfer learning. Generally seeks to learn a model from a source labeled data that can be generalized to a target domain by minimizing the difference between domain distributions.
Domain adaptation is a special case of transfer learning
Inductive transfer learning refers to the situation where the source and target tasks differ, no matter whether or not domains are different.In this setting, the source domain may or may not include annotated data, but a few labeled data in the target domain are required as training data.
transductive transfer learning, tasks remain unchanged while domains differ, and labeled data are available only in the source domain. However, part of the unlabeled data in the target domain is required at training time to obtain its marginal probability distribution.
unsupervised transfer learning refers to the scenario where the tasks are
different but similar to inductive transfer learning; however, both source and target domains include unlabeled data.
unsupervised domain adaptation as the case in which both labeled source
data and unlabeled target data are available.
semi-supervised domain adaptation as the case in which labeled source data in addition to some labeled target data are available. It may be better to perform semi-supervised adaptation if some labeled target examples are available rather than using the labeled target examples to hyperparameter tune an unsupervised adaptation method.
supervised domain adaptation as the case in which both labeled source and target data are available.
domain generalization, in which a model is trained on multiple source domains with labeled data and then tested on a separate target domain that was not seen during training. This contrasts with domain adaptation where target examples (possibly unlabeled) are available during training.
multi-source domain adaptation,where there are multiple source domains but still only one target domain
domain adaptation can be divided into four main categories; closed set, open set, partial, and universal domain adaptation.
Closed set: both source and target domains share the same
classes while there still exists a domain gap between domains
Open set: related domain share some labels in the common label set and also they may have private labels
partial domain: source domain can be considered as a generic domain that consists of an abundant number of classes, and the target is only a subset of the source label set with fewer classes.
Universal DA:source and target domains may share common label sets, and also each domain may have a private label set or outlier classes.
Domain shift mainly can be categorized into three classes: prior shift, covariate shift, and concept shift.
Prior shift:class imbalance
covariate shift: Sample selection bias and missing data are two causes for the covariate shift
Concept shift: known as data drift, is a scenario where data distributions remain unchanged, while conditional distributions differ between domains
Maximum mean discrepancy (MMD) [83, 84] is a two-sample statistical test of the hypothesis that two distributions are equal based on observed samples from the two distributions. The test is computed from the difference between the mean values of a smooth function on the two domains’ samples. If the means are different, then the samples are likely not from the same distribution.
Instance-Based Adaptation:
deal with the shift between data distributions by minimizing the target risk based on the source labeled data. A typical solution to the covariate shift problem is to use importance weighting approaches to compensate for the
bias by re-weighting the samples in the source domain based on the ratio of target and source domain densities
Feature-Based Adaptation:
Map the source data into the target data by learning a transformation that extracts invariant feature representation across domains.create a new feature representation by transforming the original features into a new feature space and then minimize the gap between domains in the new representation space in an optimization procedure, while preserving the underlying structure of the original data.
Subscpace-based:discover a common intermediate representation that is shared between domains. Create a low-dimensional representation of original data in the form of a linear subspace for each domain, and then reduce the discrepancy between the subspaces to construct the intermediate representation
Transformation-based:Feature transformation transforms the original features into a new feature representation to minimize the discrepancy between the marginal and the conditional distributions while preserving the original data’s underlying structure and characteristics.
Reconstruction-based: aim to reduce the disparity between domain distributions by sample reconstruction in an intermediate feature representation.
Deep domain adaptation
was proposed to address the lack of sufficient
labeled data while boosting the model performance by deploying deep network properties along with domain adaptation techniques
Discrepancy-based: Deep Adaptation Network (DAN) which utilizes the deep neural networks in the domain adaptation setting to learn transferable features across domains.the discrepancy between conditional distributions is reduced by measuring and minimizing the conditional MMD based objective function.
Reconstruction-based: utilizes autoencoder to align the discrepancy between domains by minimizing the reconstruction error and learning invariant and transferable representation across domains. The purpose of using autoencoder in domain adaptation is to learn the parameters of the encoder based on the samples in one domain (source) and adapt the decoder to reconstruct the samples in another domain (target).
Adversarial-based: Adversarial domain adaptation approaches tend to minimize the distribution discrepancy between domains to obtain transferable and domain invariant features.The main idea of adversarial domain adaptation was inspired by the Generative Adversarial Nets (GAN) [61], which tends to minimize the cross-domain discrepancy through an adversarial objective.
Visual adversarial domain adaptation also utilizes generative adversarial nets (GAN) to reduce the shift between domains. In generative adversarial domain adaptation, generator model G aims to synthesize implausible images, while
discriminator model D seeks to identify between synthesized and the real samples. Visual domain adaptation techniques using generative adversarial networks adopt representations in pixel-level, feature-level, or both.
Pseudo-Labling: a diverse ensemble trained on source data may be used to label target data. Then, if the ensemble is highly confident, those now-labeled target examples can be used to train a classifier for target data.
Saito et al. [206] combine elements of adversarial domain-invariant feature learning, ensemble methods, and target discriminative features in their maximum classifier discrepancy (MCD) method.
Losses:
Distance functions play a variety of roles in domain adaptation losses. A distance loss can be used to align two distributions by minimizing a distance function (e.g., MMD)
Promote Differences. Methods that rely on multiple networks learning different features (such as to make an ensemble diverse) do so by promoting differences between the networks.
Cycle Consistency / Reconstruction. A cycle consistency loss or reconstruction loss is commonly used in domain mapping methods to avoid requiring a dataset of corresponding images to be available in both domains.
Semantic consistency loss requires that a classifier output (or semantic segmentation labeling) from the original source image is the same as the same classifier’s output on the pixel-level mapped target output.
The task loss used is generally a cross-entropy loss, or more specifically the negative log likelihood of a softmax distribution [80] when using a softmax output layer.
Adversarial.by forcing a network (either a feature extractor or generator) to produce outputs indistinguishable between two domains (source and target or real and fake).
8.4 Balancing Classes
Hoffman et al. [96] note that the frequency-weighted intersection over union results in their paper were very close to the target-only model accuracy (an approximate upper bound). Thus, they conclude that domain mapping followed by domain-invariant feature learning is very effective for the common classes in the SYNTHIA dataset (season adaptation on a synthetic driving dataset). It is possible then that additional balancing of classes could help the not-as-common classes to perform better. In addition, data augmentation through occluding parts of the images may improve class balancing as would the adversarial spatial dropout network by Wang et al. [248] since the two best classes (road and sky) were likely in almost every image.
Different types of Domain Adaptation
1. Discrepancy-based: these methods minimize the distance between the source domain and the target domain using different statically defined distance functions.
2. Adversarial-based: these methods identify the domain invariant features via two competing networks.
3. Pseudo-labeling-based: these methods generate pseudo labels for the target domain to reduce the domain divergence.
4. Reconstruction-based: these methods map two domains into a shared domain while preserving domain specific features.
5. Representation-based: these methods utilize the trained network to extract intermediate representations as an input for a new network.
6. Attention-based: these methods pay attention to regions of interests (ROIs), which maintains shared information of both source domain and the target domain.
AdaMatch: A Unified Approach to Semi-Supervised Learning and Domain Adaptation
AdaMatch extends the FixMatch algorithm by (1) addressing the distribution shift between source and target domains present in the batch norm statistics, (2) adjusting the pseudo-label confidence threshold on-the-fly, and (3) using a modified version of distribution alignment
這篇論文算是Fix Match的延伸版本,透過random logit interpolation、distribution alignment、relative confidence threshold這三個方法,提升在SSL、SSDA和UDA的準確度。在Illustrating examples章節有對distribution alignment、relative confidence threshold的介紹。
Relative confidence thresholding example:
Suppose that we have a dataset X where a given model’s average top-1 confidence is 0.7 on labeled data. Using a default confidence threshold of τ = 0.9 for pseudo-labels will exclude almost all unlabeled data since they are unlikely to exceed the maximum labeled data confidence. In this scenario, relative confidence thresholding is particularly useful. By multiplying the default confidence threshold τ by the average top-1 labeled confidence, we can obtain a relative confidence ratio of cτ = 0.9 × 0.7 = 0.63 which is more likely to capture a meaningful fraction of the unlabeled data. In the case of CIFAR-10, typically the softmax of most of the labeled training examples will reach 1.0. When the average top-1 confidence on the labeled data is 1.0, the relative confidence threshold cτ and the default confidence threshold τ are the same. Therefore, one can see relative confidence thresholding as a generalization of the confidence thresholding concept.
Prevention of gradient back-propagation on guessed labels is a standard practice in SSL works that favors convergence.
參考資料
A Brief Review of Domain Adaptation
https://arxiv.org/abs/2010.03978
A survey of Unsupervised Deep Domain Adaptation
https://arxiv.org/abs/1812.02849
A Survey of Unsupervised Domain Adaptation for Visual Recognition
http://arxiv.org/abs/2112.06745