next up previous contents
Next: 7 Multi-Task Learning Up: A Literature Survey on Previous: 5 Change of Representation   Contents


6 Bayesian Priors

Most of the work reviewed in the previous sections does not require labeled data from the target domain. In this section and the next section, we review two kinds of methods that work for supervised domain adaptation, i.e. when a small amount of labeled data from the target domain is available.

When we use the maximum a posterior (MAP) estimation approach for supervised learning, we can encode some prior knowledge about the classification model into a Bayesian prior distribution $ P(\theta)$, where $ \theta$ is the model parameter. More specifically, instead of maximizing

$\displaystyle \prod_{i = 1}^N P(y_i \vert x_i; \theta),$      

we maximize
$\displaystyle P(\theta) \prod_{i = 1}^N P(y_i \vert x_i; \theta).$      

In domain adaptation, the prior knowledge can be drawn from the source domain. More specifically, we first construct a Bayesian prior $ P(\theta \vert D_s)$, which is dependent on the labeled instances from the source domain. We then maximize the following objective function:

$\displaystyle P(\theta \vert D_s) P(D_{t, l} \vert \theta)$ $\displaystyle =$ $\displaystyle P(\theta \vert D_s) \prod_{i = 1}^{N_{t, l}} P(y^t_i \vert x^t_i; \theta).$  

Li and Bilmes (2007) proposed a general Bayesian divergence prior framework for domain adaptation. They then showed how this general prior can be instantiated for generative classifiers and discriminative classifiers. Chelba and Acero (2004) applied this kind of a Bayesian prior for the task of adapting a maximum entropy capitalizer across domains.


next up previous contents
Next: 7 Multi-Task Learning Up: A Literature Survey on Previous: 5 Change of Representation   Contents
Jing Jiang 2008-03-06