Multi-Faceted Comparative Text Summarization
[ Team ]
[ Results ]
[ Publications ]
[ Funding ]
Text summarization is to generate a concise summary of documents to
help a user quickly digest information and has become increasingly important due to the
explosive growth of information. Most existing summarization methods
can only generate an unstructured summary with a simple list of sentences.
In many applications, however, we want to generate a (more structured)
multi-faceted comparative summary, in which sentences are grouped into
multiple facets and compared across different views. For example, a summary about laptop opinions
may group sentences into
facets such as "battery life" and "memory" and separate sentences
with positive and negative opinions in each facet.
This project aims to systematically study this new summarization problem (called multi-faceted
comparative summarization). We will develop general probabilistic approaches that can
be applied to multiple instances of the problem in different domains.
The basic idea is to use probabilistic mixture models to model and extract
the multiple facets and multiple views of each facet in a set of text documents to be summarized.
The extracted facets and views are then used to generate facet labels and select sentences for
different facet-view combinations.
2. Team members
3. Major Research Results
- Opinion integration as comparative summarization: Integrating and summarizing all the opinions expressed about an entity (e.g., a product or a person) are necessary to help people digest opinions. Opinions are generally expressed either
in a well-written review article (e.g., a product review) or fragmentally in many other articles (e.g., many blog articles). We framed the problem of opinion integration as a comparative summarization problem and developed general probabilistic approaches to integrate scattered opinions with a well-written review article, showing promising results. [Lu & Zhai WWW 08].
- Combining social network analysis with topic modeling: Social networks can be leveraged to
better infer topics in text. We have developed a regularized probabilistic latent semantic analysis model (NetPLSA) that can be applied to combine social network analysis with probabilistic topic modeling in a general way.
Experiment results show that social networks can indeed improve the quality of the extracted topic models.
[Mei et al. WWW 08].
- Rated aspect summarization: Web 2.0 applications have lead to the dramatic growth of user-generated content. We studied how to summarize a set of short comments on an entity and defined a novel
rated aspect summarization problem where we generate a summary with multiple aspects (facets) together with a rating for each aspect. Experiment results on data from eBay show promising results. [Yue et al. WWW 09].
- Comparative summary of contradictory opinions: Opinions about a topic are often mixed with both positive and negative comments. In order to help users digest such mixed/contradictory opinions, we defined a novel comparative summarization problem where we aim at comparing opinions about the same but with different sentiment polarities. We proposed an optimization framework and heuristic algorithms for solving this problem. Experiment results on product review data show that the proposed algorithms are effective
for generating a comparative summary of mixed opinions [Kim & Zhai 09].
- Qiaozhu Mei, Deng Cai, Duo Zhang, ChengXiang Zhai.
Topic Modeling with Network Regularization,
Proceedings of the World Wide Conference 2008 ( WWW'08), pages 101-110. (12% acceptance) pdf.
- Yue Lu, ChengXiang Zhai.
Opinion Integration Through Semi-supervised Topic
Proceedings of the World Wide Conference 2008 ( WWW'08), pages 121-130. ( 12% acceptance) pdf.
- Xu Ling, Qiaozhu Mei, ChengXiang Zhai, Bruce R. Schatz, Mining multi-faceted overviews of arbitrary topics in a text collection,
Proceedings of the 15th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD'08), pages 497-505, 2008.
( 20% acceptance)
- Duo Zhang, ChengXiang Zhai, Jiawei Han, Topic Cube: Topic Modeling for OLAP on Multidimensional Text Databases,
Proceedings of 2009 SIAM International Conference on Data Mining (SDM'09), pages 1123-1134, 2009. ( 16% acceptance)
- Yue Lu, ChengXiang Zhai, Neel Sundaresan, Rated Aspect Summarization of Short Comments,
Proceedings of the World Wide Conference 2009 ( WWW'09), pages 131-140.
( 12% acceptance) pdf
Hyun Duk Kim, ChengXiang Zhai, Generating Comparative Summaries of Contradictory Opinions in Text,
Proceedings of the 18th ACM International Conference on Information and Knowledge Management ( CIKM'09), to appear.
( full paper, 14.5% acceptance)
5. Funding Support
[ Team ]
[ Results ]
[ Publications ]
[ Funding ]
- National Science Foundation, grant IIS-0713571
- IBM UIMA Innovation Award
- Yahoo! Ph.D. Fellowship