Suggested Project Topics for CS598CXZ
You are encouraged to come up with other new ideas!!!
Theme 1: Better support for searching our department website
In our brainstorming classes, several challenges nominated are related with improving search engine capabilities, including focusing on a domain, more structured/semantic queries, exploiting domain ontology, and personalized search, among others. It would have more real world impact if you could explore
some of these directions with a goal like providing better support for searching our department website. Currently, the department website is powered with Google. The high-level challenge is simply to do better than Google for our local website.
The following are some concrete ideas:
- Sever-side implicit feedback: By tracking the IP address, we may assume that the webpage requests and/or queries from the same IP address are from the same user and exploit the short-term history information (e.g., queries, and requested documents) to improve search results (e.g., by reranking Google's results).
- Domain-specific search result summarization: Can we exploit the fact that all the pages are from a CS department to summarize the search results more effectively? The baseline is the ranked list presentation from Google. Better presentation of the results may involve clustering/classifying/annotating pages according to a simple CS-domain ontology. It would be very interesting to explore methods for
generating artificial hyperlinks that would allow a user to navigate through the search results
more efficiently. E.g., we may index segments/blocks of pages and adding artificial links to the blocks so that a user can jump to a related block easily.
- Web site summarization: Can we automatically generate a semantic site map for our department website? The semantic site map can be as simple as a network of clusters of webpages. Can we summarize the department website to generate a set of "What's new in CS?" highlights?
- Web structure optimization: Can we mine the web access log to suggest better organization of our website? For example, if we see a sequence of requests for pages A, B, C, D, E occurring frequently,
we may consider adding a short cut from A to E.
- People profile construction: Can we automatically construct a "profile page" for each person in the
CS domain, which contains pointers to all kinds of information about the person. The simplest solution is simply to search with the person's name as a query and then cluster the results. Such profile pages can then be used to support a special search function -- searching for information about a person.
We have also discussed other ideas in the class.
Theme 2: Improving Productivity of the Technology Service Group (TSG)
A specific application domain for better email management techniques
is the Technology Service Group (TSG) in our department, i.e., the group of people who respond to
email requests sent to the userhelp account. There are several levels of support:
- "Eliminating" questions: Can we mine an email archive to extract frequently asked questions and their answers? If we can do this and post an automatically compiled FAQ on the web, people would not need to ask some commonly asked questions, since they can see the results immediately.
- Finding "similar questions": Can we search the email archieve to find similar questions, thus the answers. This could provide some very limited question answering.
- Thread-based email organization: Can we better organize email messages based on themes? E.g., cluster threads to generate a hierachical structure. Can we index and search a thread, instead of searching
Theme 3: Illinois Smart Email Assistant (ISEA)
A really high impact research topic is to develop a better personal email manager.
Such an email manager should go beyond its regular functions as an email reader/composer,
, and support integration of email handling, web access, and perhaps
personal information collection. It should support advanced information management capabilities (thread-based management, summarization, task support, etc). In the class, we have discussed many ideas about how to better manage personal email. This project may need a significant amount of development work,
but if successfully carried out, the potential impact may be huge.
Theme 4: Improving Research Productivity: Literature Assistant for Scientists
There are many ways we can build some tools to help a researcher to improve research productivity.
- Finding related papers: Given a topic or a paragraph in a paper, find, from the web, all related papers and then summarize them by extracting reference information. There are at least two challenges here:
(1) How to find related papers (not just related pages)? Google Scholar could help solve this problem, but its archive may not include all the papers on the web. (2) How can we automatically extract
the reference information?
- Glossary compilation: Can we mine the literature to compile a list of terms and their definitions?
When there are multiple definitions, we need to cluster them.
- Tutorial generation: Given a topics, use CiteSeer to collect all related articles and then generate
a tutorial on this topic. The tutorial can be as simple as some reorganization of search results.
- "What's hot in CS?" generation: Can we mine a relatively large subset of literature to discover
"what's hot in CS?". A simple method is to do clustering of the literature paragraphs, and then compare clusters at different times to reveal any emerging clusters.
- "Google dictionary": Mine the web for English usages as we discussed in the class.
- Entity summarization for biology literature: Can we generate a good summary of functional information about a gene or protein? The simplest summary can be a clustering view of search results using a gene name as a query. The challenge is to do better than this simple solution.
- Literature recommender: Most researchers nowadays put their publications on the Web. Their home pages
are often the best sources for getting their newest publications. A scientist is often aware of those active
researchers in the same field, so one can imagine that we can rely on a program to visit the home pages
of all the related researchers regularly and automatically detect any new publications in the field. Develop such
a "literature recommender". It should
take as input a list of researchers' homepage URLs and some keywords that descript some research interests. It will
then periodically crawl certain number of pages linked to these homepages, identify any new publications, and
decide whether these new publications are relevant to the user's interest. Any interesting publications
will be sent to the user.
You are encouraged to come up with other new ideas!!!