next up previous contents
Next: 8 Ensemble Methods Up: A Literature Survey on Previous: 6 Bayesian Priors   Contents


7 Multi-Task Learning

Multi-task learning, sometimes known as transfer learning, is a machine learning topic highly related to domain adaptation. The original definition of multi-task learning considers a different setting than domain adaptation. In multi-task learning, there is a single distribution of the observation, i.e. a single $ P(X)$. There are, however, a number of different output variables $ Y_1, Y_2, \ldots, Y_M$, corresponding to $ M$ different tasks. Therefore, there are $ M$ different joint distributions $ \{P(X, Y_k)\}_{k = 1}^M$. Note that the class label sets are different for these $ M$ different tasks. We assume that these different tasks are related. When learning $ M$ conditional models $ \{P(Y_k \vert X, \theta_k)\}_{k = 1}^M$ for the $ M$ tasks, we impose a common component shared by $ \{\theta_k\}_{k = 1}^M$. There have been a number of studies on multi-task learning (Ben-David and Schuller, 2003; Micchelli and Pontil, 2005; Xue et al., 2007; Caruana, 1997).

Strictly speaking, domain adaptation is a different problem than multi-task learning because we have only a single task but different domains. However, domain adaptation can be treated as a special case of multi-task learning, where we have two tasks, one on the source domain and the other on the target domain, and the class label sets of these two tasks are the same. If we have some labeled data from the target domain, we can then directly apply some existing multi-task learning algorithm.

Indeed, some domain adaptation methods proposed recently are essentially multi-task learning algorithms. Daumé III (2007) proposed a simple method for domain adaptation based on feature duplications. The idea is to make a domain-specific copy of the original features for each domain. An instance from domain $ k$ is then represented by both the original features and the features specific to domain $ k$. It can be shown that when linear classification algorithms are used, this feature duplication based method is equivalent to decomposing the model parameter $ \theta_k$ for domain $ k$ into $ \theta_c + \theta_k'$, where $ \theta_c$ is shared by all domains. This formulation then is very similar to the regularized multi-task learning method proposed by Evgeniou and Pontil (2004). Similarly, Jiang and Zhai (2007b) proposed a two-stage domain adaptation method, where in the first generalization stage, labeled instances from $ K$ different source training domains are used together to train $ K$ different models, but these models share a common component, and this common model component only applies to a subset of features that are considered generalizable across domains.


next up previous contents
Next: 8 Ensemble Methods Up: A Literature Survey on Previous: 6 Bayesian Priors   Contents
Jing Jiang 2008-03-06