Most of the work reviewed in the previous sections does not require labeled data from the target domain. In this section and the next section, we review two kinds of methods that work for supervised domain adaptation, i.e. when a small amount of labeled data from the target domain is available.
When we use the maximum a posterior (MAP) estimation approach for supervised learning, we can encode some prior knowledge about the classification model into a Bayesian prior distribution , where is the model parameter. More specifically, instead of maximizing
In domain adaptation, the prior knowledge can be drawn from the source domain. More specifically, we first construct a Bayesian prior
, which is dependent on the labeled instances from the source domain. We then maximize the following objective function:
Li and Bilmes (2007) proposed a general Bayesian divergence prior framework for domain adaptation. They then showed how this general prior can be instantiated for generative classifiers and discriminative classifiers. Chelba and Acero (2004) applied this kind of a Bayesian prior for the task of adapting a maximum entropy capitalizer across domains.