We first introduce some notations that are needed in the discussion in this survey. We refer to the training domain where labeled data is abundant as the source domain, and the test domain where labeled data is not available or very little as the target domain. Let
denote the input variable (i.e. an observation) and
the output variable (i.e. a class label). We use
to denote the true underlying joint distribution of
and
, which is unknown. In domain adaptation, this joint distribution in the target domain differs from that in the source domain. We therefore use
to denote the true underlying joint distribution in the target domain, and
to denote that in the source domain. We use
,
,
and
to denote the true marginal distributions of
and
in the target and the source domains, respectively. Similarly, we use
,
,
and
to denote the true conditional distributions in the two domains. We use lowercase
to denote a specific value of
, and lowercase
to denote a specific class label. A specific
is also referred to as an observation, an unlabeled instance or simply an instance. A pair
is referred to as a labeled instance. Here,
, where
is the input space, i.e. the set of all possible observations. Similarly,
, where
is the class label set. Without any ambiguity,
or simply
should refer to the joint probability of
and
. Similarly,
(or
),
(or
),
(or
) and
(or
) also refer to probabilities rather than distributions.
We assume that there is always a relatively large amount of labeled data available in the source domain. We use
to denote this set of labeled instances in the source domain. In the target domain, we assume that we always have access to a large amount of unlabeled data, and we use
to denote this set of unlabeled instances. Sometimes, we may also have a small amount of labeled data from the target domain, which is denoted as
. In the case when
is not available, we refer to the problem as unsupervised domain adaptation, while when
is available, we refer to the problem as supervised domain adaptation.