There are two modes of information access -- "pull" and "push", depending on whether the user initiates the process. In the pull mode, a user searches for information by using a search engine (e.g., Google) or browes information items through structures available on the information space (e.g., Yahoo directory). In the push mode, an information management system keeps track of a user's interest and recommends any relevant incoming information items to a user. My specific interests in intelligent information access are centered on information retrieval (leading to better search engine technologies such as personalized search), information organization (creating structures to assist a user in browsing), and information filtering (i.e., information recommendation).
In text data mining, I am especially interested in comparative text mining, which is concerned with extracting common and unique themes from a set of comparable text collections. Depending on the sets to compare, comparative text mining potentially covers spatiotemporal text mining, cross-language text mining, novelty detection, and many other interesting text mining problems as special cases and has many applications such as opinion summarization, business intelligence, text federation, and customer relationship management.
I believe that natural language processing is crucial to all kinds of text management tasks, and I am especially interested in the development of algorithms that exploit language technologies, such as statistical language models (i.e., probabilistic models of text).
I have a broad interest in all kinds of applications of text information management, such as Web search, digital libraries, and email management.
I am becoming more and more interested in bioinformatics for several reasons. First, it is obviously an excellent application domain of text information management techniques. Second, the methods used in bioinformatics (e.g., Hidden Markov Models) tend to be similar to those used for processing text. Third, bioinformatics is fast growing and presents many interesting and challenging new computational problems. Finally, and most importantly, I like the fact that research in bioinformatics brings computer science closer to scientify discovery.
My current interests in bioinformatics include (1) Biology literature analysis: the goal is to use natural language processing and text mining techniques to extract useful information from literature that can benefit a biologist either directly or indirectly through combination of literature analysis and other biological data analysis. (2) Gene regulatory pattern analysis : the goal is to use machine learning and data mining techniques to find regulatory motifs and other TF binding site characteristics in the upstream subsequences of co-regulated genes to understand gene functions and regulations. (3) Massive protein motif analysis : the goal is to build a dictionary of "elementary" motifs with structural and functional information through statistical analysis of protein sequences and Gene Ontology annotations.
More about me ...