William Lee -------------------------------------------------------------- Web === Challenge: To find in-depth knowledge for a particular entity on the web. Example: I type in "Microsoft", and I want to find the earnings, revenue, and locations for the Microsoft corporation. Users: Researcher, stock analysis, people who work in human resources, and all who want to do research on a person, company, or a particular topic. Data: The complete web Method: The semantic web may be the solution to this problem. However, it may be still useful to disambiguate among entities with common names and cluster the pages, then do summarization on them. Email ===== Challenge: To extract the major discussion topics within the newsgroup, a list of FAQs, and the most active and knowledgeable contributers in a newsgroup or mailing list. Users: People that utilize a technical newsgroup, a company that wants to gather a list of most common occuring problems in a product, technical support departments that want to construct knowledge bases for better customer support. Data: emails from the newsgroup or mailing list Method: One may think clustering and summarization of the cluster may be the solution to this, but in the newsgroup and mailing list there are more specific challenges and interesting ways to exploit IR techniques. Literature ========== Challenge: To find and discover how a topic has evolved through time Users: Researchers in different fields, managers who want to streamline the company's process by looking for inefficiencies, etc. Data: Scientific literature, company documents Method: In the simpliest sense, it may be interesting to find the function parent_of(A,B) where A and B are documents and much of the content of B comes from (or influenced by) A. With this function and a timestamp for each document, it should be possible to create a timeline that shows the lineage of a concept.