Hui Fang ------------------------------------------------------------ Web domain: One interesting research topic is how to allow users to query the web using more sophisticated query language instead of just keyword queries. The keyword query has the advantage of simplicity, but it does not allow user to specify their information needs precisely. For example, you are a new DAIS Ph.D student and you are preparing for the Qualify Exam, so you want to search for all the courses related to Database and information system area. You can send such a query as "related course database information system" to Google. However, you will be very disappointed with the results returned by Google, which only supports the keyword query. For this topic, the user can be any web user ,and the data can be any indexed web pages. To solve this problem, many techniques will play an important role, such as text summarization, text categorization and information extraction etc. One of the most challenging problems is about query language. Unlike traditional database, there is no schema for web data, which creates huge challenge for defining a query language. It is still worth discussing that whether we should have a universal query language or several special query language for different domain. Email domain: One real world topic is how to mine some interesting knowledge from the emails. For example, as a consumer, you are usually interested in the products from some websites. Therefore, you subscribe their mailing list and will get a lot of information including when their products are on sale and when they have new products. If the system can automatically detect the pattern of promotion information of that website and recommend a good time for you to purchase the product, you would save money. For this topic, the user can be any newsletter subscriber. And the data is the newsletters subscribed from a particular website. The challenges include how to identify those emails including sale information from others, how to extract the useful information (e.g. 50% off original price or 20% off sale price) and how to analyze those extracted information and make the decision whether the user should purchase or not. Literature Domain: One topic is how to automatically build a "literature network" for a topic. In a literature network, every node is a paper which is related to the topic and every edge between nodes is annotated with the relation between these two paper (i.e. why one paper cite the other and how these two papers are related ). Such literature network will provides users a whole picture of the area, which makes literature survey easier. Note that there are two major types of citation between papers, one is about some known techniques (not necessarily related to the topic) and the other one is about the previous and related work. The first type of citation should not be included in the network. The users are researchers. The data are conference papers, journal papers and books. The major challenge is how to identify the relations between two papers, which involving the techniques of information extraction, information summarization and text categorization.