next up previous contents
Next: 3.2 Covariate Shift Up: 3 Instance Weighting Previous: 3 Instance Weighting   Contents

3.1 Class Imbalance

One simple assumption we can make about the connection between the distributions of the source and the target domains is that given the same class label, the conditional distributions of $ X$ are the same in the two domains. However, the class distributions may be different in the source and the target domains. Formally, we assume that $ P_s(X \vert Y = y) = P_t(X \vert Y = y)$ for all $ y \in \mathcal{Y}$, but $ P_s(Y) \ne P_t(Y)$. This difference is referred to as the class imbalance problem in some work (Japkowicz and Stephen, 2002).

When this class imbalance assumption is made, the ratio $ \frac{P_t(x, y)}{P_s(x, y)}$ that we derived in Equation ([*]) can be rewritten as follows:

$\displaystyle \frac{P_t(x, y)}{P_s(x, y)}$ $\displaystyle =$ $\displaystyle \frac{P_t(y)}{P_s(y)} \frac{P_t(x \vert y)}{P_s(x \vert y)}$  
  $\displaystyle =$ $\displaystyle \frac{P_t(y)}{P_s(y)}.$  

Therefore, we only need to use $ \frac{P_t(y)}{P_s(y)}$ to weight the instances. This approach has been explored in (Lin et al., 2002). Alternatively, we can re-sample the training instances from the source domain so that the re-sampled data roughly has the same class distribution as the target domain. In re-sampling methods, under-represented classes are over-sampled, and over-represented classes are under-sampled (Chawla et al., 2002; Zhu and Hovy, 2007; Kubat and Matwin, 1997).

For classification algorithms that directly model the probability distribution $ P(Y \vert X)$ such as logistic regression classifiers, it can be shown theoretically that the estimated probability $ P_s(y \vert x)$ can be transformed into $ P_t(y \vert x)$ in the following way (Lin et al., 2002; Chan and Ng, 2005):

$\displaystyle P_t(y \vert x)$ $\displaystyle =$ $\displaystyle \frac{ r(y) P_s(y \vert x)}{\sum_{y' \in \mathcal{Y}} r(y') P_s(y' \vert x)},$  

where $ r(y)$ is defined as
$\displaystyle r(y) = \frac{P_t(y)}{P_s(y)}.$      

Now we can first estimate $ P_s(y \vert x)$ from the source domain, and then derive $ P_t(y \vert x)$ using $ P_s(Y)$ and $ P_t(Y)$.

For other classification algorithms that do not directly model $ P(Y \vert X)$, such as naive Bayes classifiers and support vector machines, if $ P(Y \vert X)$ can be obtained through careful calibration, the same trick can be applied. Chan and Ng (2006) applied this method to the domain adaptation problem in word sense disambiguation (WSD) using naive Bayes classifiers.

In practice, one needs to know the class distribution in the target domain in order to apply the methods described above. In some studies, it is assumed that this distribution is known a priori (Lin et al., 2002). However, in reality, we may not have this information. Chan and Ng (2005) proposed to use the EM algorithm to estimate the class distribution in the target domain.

next up previous contents
Next: 3.2 Covariate Shift Up: 3 Instance Weighting Previous: 3 Instance Weighting   Contents
Jing Jiang 2008-03-06