We first introduce some notations that are needed in the discussion in this survey. We refer to the training domain where labeled data is abundant as the *source* domain, and the test domain where labeled data is not available or very little as the *target* domain. Let denote the input variable (i.e. an observation) and the output variable (i.e. a class label). We use to denote the true underlying joint distribution of and , which is unknown. In domain adaptation, this joint distribution in the target domain differs from that in the source domain. We therefore use to denote the true underlying joint distribution in the target domain, and to denote that in the source domain. We use , , and to denote the true marginal distributions of and in the target and the source domains, respectively. Similarly, we use
,
,
and
to denote the true conditional distributions in the two domains. We use lowercase to denote a specific value of , and lowercase to denote a specific class label. A specific is also referred to as an observation, an unlabeled instance or simply an instance. A pair is referred to as a labeled instance. Here,
, where
is the input space, i.e. the set of all possible observations. Similarly,
, where
is the class label set. Without any ambiguity,
or simply should refer to the joint probability of and . Similarly, (or ), (or ),
(or ) and
(or ) also refer to probabilities rather than distributions.

We assume that there is always a relatively large amount of *labeled* data available in the source domain. We use
to denote this set of labeled instances in the source domain. In the target domain, we assume that we always have access to a large amount of *unlabeled* data, and we use
to denote this set of unlabeled instances. Sometimes, we may also have a small amount of labeled data from the target domain, which is denoted as
. In the case when is not available, we refer to the problem as *unsupervised domain adaptation*, while when is available, we refer to the problem as *supervised domain adaptation*.