辅导报告(Tutorials)
Daxin Jiang Microsoft Research Asia
Jian Pei Simon Fraser University
Hang Li Microsoft Research Asia
Jian Pei Simon Fraser University
Hang Li Microsoft Research Asia
|
Daxin Jiang is a Researcher at Microsoft Research Asia. His research focuses on data mining and information retrieval. He received Ph.D. in computer science from the State University of New York at Buffalo. He has published extensively in prestigious conferences and journals, and served as a PC member of many conferences. He received the Best Application Paper Award of SIGKDD'08 and the Runner-up for Best Application Paper Award of SIGKDD'04. Jian Pei is an Associate Professor and the Associate Director, Research, of the School of Computing Science, Simon Fraser University. His research focuses on data mining and analytic queries in databases. With prolific publications in refereed journals and conferences, he is the recipient of several prestigious awards. He is an associate editor of ACM Transactions on Knowledge Discovery from Data (TKDD) and IEEE Transactions on Knowledge and Data Engineering (TKDE). He has served regularly in the organization committees and the program committees of numerous international conferences and workshops. He is a senior member of both ACM and IEEE. Hang Li is a Senior Researcher and Research Manager at Microsoft Research Asia. His research areas include natural language processing, information retrieval, statistical machine learning, and data mining. He graduated from Kyoto University and holds a PhD in computer science from the University of Tokyo. Hang has about 80 publications in international conferences and journals. He is associate editor of ACM Transaction on Asian Language Information Processing and area editor of Journal for Computer and Science Technology, etc. His recent academic activities include senior PC member of SIGIR 2010, WSDM 2010, and KDD 2010, area chair of ACL 2010, and PC member of WWW 2010. |
|||
| 报告摘要: | ||||
| Huge amounts of search log data have been accumulated in various search engines. Currently, a commercial search engine receives billions of queries and collects tera-bytes of log data on any single day. Other than search log data, browse logs can be collected by client-side browser plug-ins, which record the browse information if users' permissions are granted. Such massive amounts of search/browse log data, on the one hand, provide great opportunities to mine the wisdom of crowds and improve search results as well as online advertisement. On the other hand, designing effective and efficient methods to clean, model, and process large scale log data also presents great challenges. In this tutorial, we focus on mining search and browse log data for search engines. We will start with an introduction of search and browse log data and an overview of frequently-used data summarization in log mining. We then elaborate how log mining applications enhance the five major components of a search engine, namely, query understanding, document understanding, query-document matching, user understanding, and monitoring and feedbacks. For each aspect, we will survey the major tasks, fundamental principles, and state-of-the-art methods. Finally, we will discuss the challenges and future trends of log data mining. |
||||
Wei Wang The University of New South Wales, Australia
|
Dr. Wei Wang is a Senior Lecturer in the School of Computer Science and Engineering, The University of New South Wales, Australia. His current research interests include integration of database and information retrieval technologies (DB+IR), data cleaning and integration, query processing and optimization, and spatial databases. He has published over fifty research papers in these areas in major international journal (TODS, VLDB J, TKDE) and conferences (SIGMOD, VLDB, ICDE, WWW). |
| 报告摘要: | |
| Similarity join between two sets of objects returns pairs of objects such that
their similarities are above a given threshold. Similarity join finds applications
in many areas, including near duplicate object detection and data
integration and cleansing. A key algorithmic challenge is how to perform the similarity join efficiently, as the naive algorithm that examines the similarity value between every possible pairs of objects incurs prohibitively high cost. The objectives of this talk are to provide an introduction and categorization to the similarity join in different applications domains, discuss the powerful algorithmic ideas in existing work, and suggest directions for future research. |
|
Irwin King Department of Computer Science and Engineering, The Chinese University of Hong Kong
|
Irwin King is with the Chinese University of Hong Kong. He received his B.Sc. degree in Engineering and Applied Science from California Institute of Technology, Pasadena and his M.Sc. and Ph.D. degree in Computer Science from the University of Southern California, Los Angeles. His research interests include machine learning, web intelligence & social computing, and multimedia processing. In these research areas, he has over 200 technical publications in journals and conferences. In addition, he has contributed over 20 book chapters/edited volumes and has over 30 research and applied grants. He is an Associate Editor of the IEEE Transactions on Neural Networks (TNN) and IEEE Computational Intelligence Magazine (CIM). He is also on the editorial board of several journals and book projects. He is a member of the Board of Governors, INNS and also a Vice-President and Governing Board Member of APNNA. |
| 报告摘要: | |
| With the advent of Web 2.0, Social Computing has emerged as one of the hot research topics recently. Social Computing involves the collecting, extracting, accessing, processing, computing, visualizing, etc. of social signals and information. More specifically, this tutorial places special emphases in machine learning, data mining, information retrieval, and other computational techniques involved in collective intelligence processing of social behavior data collected from blogs, wikis, click through data, query logs, tags, etc., and from areas such as social networks, social search, social media, social bookmarks, social news, social knowledge sharing, and social games. In this tutorial, I plan to give an introduction to Social Computing and elaborate on how the various characteristics and aspects are involved in the social platforms for collective intelligence. The topics include social network theory and modeling, graph mining, query log processing, learning to rank, recommender systems, human computation, etc. The tutorial is prepared for machine learning, web mining, and information retrieval researchers who are interested in computational approaches to social computing. | |


