next up previous contents
Next: 2.2 Overview Up: 2 Notations and Overview Previous: 2 Notations and Overview   Contents

2.1 Notations

We first introduce some notations that are needed in the discussion in this survey. We refer to the training domain where labeled data is abundant as the source domain, and the test domain where labeled data is not available or very little as the target domain. Let $ X$ denote the input variable (i.e. an observation) and $ Y$ the output variable (i.e. a class label). We use $ P(X, Y)$ to denote the true underlying joint distribution of $ X$ and $ Y$, which is unknown. In domain adaptation, this joint distribution in the target domain differs from that in the source domain. We therefore use $ P_t(X, Y)$ to denote the true underlying joint distribution in the target domain, and $ P_s(X, Y)$ to denote that in the source domain. We use $ P_t(Y)$, $ P_s(Y)$, $ P_t(X)$ and $ P_s(X)$ to denote the true marginal distributions of $ Y$ and $ X$ in the target and the source domains, respectively. Similarly, we use $ P_t(X \vert Y)$, $ P_s(X \vert Y)$, $ P_t(Y \vert X)$ and $ P_s(Y \vert X)$ to denote the true conditional distributions in the two domains. We use lowercase $ x$ to denote a specific value of $ X$, and lowercase $ y$ to denote a specific class label. A specific $ x$ is also referred to as an observation, an unlabeled instance or simply an instance. A pair $ (x, y)$ is referred to as a labeled instance. Here, $ x \in \mathcal{X}$, where $ \mathcal{X}$ is the input space, i.e. the set of all possible observations. Similarly, $ y \in \mathcal{Y}$, where $ \mathcal{Y}$ is the class label set. Without any ambiguity, $ P(X = x, Y = y)$ or simply $ P(x, y)$ should refer to the joint probability of $ X = x$ and $ Y = y$. Similarly, $ P(X = x)$ (or $ P(x)$), $ P(Y = y)$ (or $ P(y)$), $ P(X = x \vert Y = y)$ (or $ P(x \vert y)$) and $ P(Y = y \vert X = x)$ (or $ P(y \vert x)$) also refer to probabilities rather than distributions.

We assume that there is always a relatively large amount of labeled data available in the source domain. We use $ D_s = \{(x^s_i, y^s_i)\}_{i = 1}^{N_s}$ to denote this set of labeled instances in the source domain. In the target domain, we assume that we always have access to a large amount of unlabeled data, and we use $ D_{t, u} = \{x^{t, u}_i\}_{i = 1}^{N_{t, u}}$ to denote this set of unlabeled instances. Sometimes, we may also have a small amount of labeled data from the target domain, which is denoted as $ D_{t, l} = \{(x^{t, l}_i, y^{t, l}_i)\}_{i = 1}^{N_{t, l}}$. In the case when $ D_{t, l}$ is not available, we refer to the problem as unsupervised domain adaptation, while when $ D_{t, l}$ is available, we refer to the problem as supervised domain adaptation.


next up previous contents
Next: 2.2 Overview Up: 2 Notations and Overview Previous: 2 Notations and Overview   Contents
Jing Jiang 2008-03-06