Han Liu ------------------------------------------------------------   TOPIC 3: Bioinformatics (I preferred) Topic: Developing Statistical models for Peptide Tandem Mass Spectrometry Data Analysis Description: Molecular biology has been revolutionized by the advent of high throughoutput experimental methods that could investigate  thousands of genes or proteins in parallel. With the great success of Microarray analysis techniques for genomics, mass spectrometry based proteomics becomes the next hot point in the literature.  However, unlike the reliable microarray based analysis methods for genes, interpreting high-throughoutput peptide tandem mass spectrometry data is still an open problem. The large volume of data generated from peptide tandem mass spectrometry experiments is full of noise and unknown underlying biochemistry principles. How to utilize these data to  extract useful information and knowledge remains a problem. In this project, our long term research goal is two folds: 1. decide what kind of proteins are presented in the tissue samples. 2. decide the quantitative ratios of different proteins. Under the guidance of these two directions, many sub goals could be derived, such as, how to design efficient and effective scoring functions for sequence databases searching, how to design probabilistic models to simulate interactions between different proteins, how to derive useful features from pure peptide sequences and spectrum data.   The rough outline for this project is: 1. make a literature survey and write a review report, to summarize what the other researchers are currently doing. 2. identify one or two promising topics from the literature survey. 3. conduct the research work and get some initial result. 4. finish the course project paper.   Group: this project could be a 1 person project, since it needs some biology background about peptide tandem mass spectromety experiments and machine learning knowledge, however, if there are other students whoe are really interested in it, it may be expaned to a 2-team member group.   Programing Language: MATLAB or C, MATLAB is prefered since it's easy to visualize the result   Knowledge Needed: some machine learning backgrounds and biological background about Tandem Mass Spectrometry experiments.        TOPIC 1: WEB   Topic: More intelligent search engine Description: Current search engine "Google", even though powerful, not "smart" enough, it can only conduct exact search with "key word" matching. However, this works only under the assumption that user could specify the "best" keywords. If the user himself only have some vague ideas, Google may not be good enough. Therefore, a function component for "Vague searching" may be added.   An example is illustrated here: When a user wants to buy a cheap computer, he may input a batch of keywords: "computer", "PC", "cheap", "personal computer", if we use Google directly, Google may return the results contain "Computer + PC +cheap + personal computer". However, the diresed result maybe the sale inforamtion page of DELL computer. The desired techniques should be natural language processing and semantic web.   TOPIC 2: EMIL   Topic: Email linkage analysis Description: With the increase of the volume of current email system, more and more emails can be stored, Sometimes, a Ph.D. student could receive more than 2000 emails/ per year, while a professor may receive more than 10,000 emails/ per year. These emails need a better way to organize for efficient information reterival. The content based linkage analysis of the historical emails may be an interesting topic to investigate.  With the corresponding linkage, when the user receive one new email, he could easily get the corresponding content from the other emails. The desired techniques should include natural language processing and probabilistic graphical models.