数据库新技术专题
Irwin King Department of Computer Science and Engineering,The Chinese University of Hong Kong
|
Irwin King is with the Chinese University of Hong Kong. He received his B.Sc. degree in Engineering and Applied Science from California Institute of Technology, Pasadena and his M.Sc. and Ph.D. degree in Computer Science from the University of Southern California, Los Angeles. His research interests include machine learning, web intelligence & social computing, and multimedia processing. In these research areas, he has over 200 technical publications in journals and conferences. In addition, he has contributed over 20 book chapters/edited volumes and has over 30 research and applied grants. He is an Associate Editor of the IEEE Transactions on Neural Networks (TNN) and IEEE Computational Intelligence Magazine (CIM). He is also on the editorial board of several journals and book projects. He is a member of the Board of Governors, INNS and also a Vice-President and Governing Board Member of APNNA. |
| 报告摘要: | |
| The Web has changed the landscape of how humans interact socially. With the advent of Web 2.0, Social Computing has emerged as a new and innovative paradigm that changes the way we communicate, interact, and learn. Social Computing involves the investigation of collective intelligence by using computational techniques such as machine learning, data mining, natural language processing, etc. on social behavioral data collected from blogs, wikis, emails, instant messages, clickthrough data, query logs, social bookmarks, tags, etc. In this talk, I will first introduce Social Computing by outlining some of the unique characteristics and aspects that are found on the various social platforms. Applications in each of the platforms will be presented to further demonstrate the use of these new technologies to enhance and enrich our lives. Lastly, I will conclude with some current challenges and potential future promises of Social Computing. | |
Bin He IBM Almaden Research
|
Bin He is a Research Scientist at IBM Almaden Research. Bin got his Ph.D. in the Department of Computer Science at the University of Illinois at Urbana-Champaign in 2006. He also received the M.S. degree in Computer Science from the University of Illinois at Urbana-Champaign in 2002, and M.S. and B.S. degrees in Mathematics from Peking University, China in 2000 and 1998 respectively. Bin's research mainly focuses on large scale databases, data warehousing, and data integration. He led more than 30 publications in top data management conferences and journals, filed more than 20 patents, and received internal and external awards including IBM Outstanding Technical Achievement Award, IBM ASR Best Paper Runner Up, IBM Invention Achievement Awards, IBM Invention Plateau Awards, and ComputerWorld Horizon Award. |
| 报告摘要: | |
| Database evolution is the process of updating the schema of a database or data warehouse (schema evolution) and evolving the data to the updated schema (data evolution). It is often desired or necessitated when changes occur to the data or the query workload, the initial schema was not carefully designed, or more knowledge of the database is known and a better schema is concluded. The Wikipedia database, for example, has had more than 170 versions in the past 5 years [1]. Unfortunately, although much research has been done on the schema evolution part, data evolution has long been a prohibitively expensive process, which essentially evolve the data by executing SQL queries and re-constructing indexes. This prevents databases from being flexibly and frequently changed based on the need and forces schema designers, who cannot afford mistakes, to be highly cautious. Techniques that enable efficient data evolution will undoubtedly make life much easier. In this paper, we study the efficiency of data evolution, and discuss the techniques for data evolution on column oriented databases, which store each attribute, rather than each tuple, contiguously. We show that column oriented databases have a better potential than traditional row oriented databases for supporting data evolution, and propose a novel data-level data evolution framework on column oriented databases. Our approach, as suggested by experimental evaluations on real and synthetic data, is much more efficient than the query-level data evolution on both row and column oriented databases, which involves unnecessary access of irrelevant data, materializing intermediate results and re-constructing indexes. | |
文继荣 Microsoft Research Asia
|
文继荣博士目前是微软亚洲研究院的高级研究员和互联网数据管理组的负责人。他在中国人民大学获得工学学士和硕士学位。1999年, 他从中科院计算所获得博士学位,并在随后加入微软亚洲研究院。他的主要研究方向是互联网数据管理、信息检索(特别是互联网搜索),数据挖掘和机器学习。在微软亚洲研究院工作的10年中,文博士获得了超过50项与互联网搜索相关的美国专利,是基于网页块的搜索、深层互联网搜索、以及对象级搜索的主要发明人之一,其中一些成果已经被用于重要的微软产品中(例如Bing)。他已经在国际著名的会议和学报上发表了一百多篇论文,包括 WWW, SIGIR, VLDB, ICDE, CIDR, ICML, SIGKDD, AAAI, ACM Multimedia, ACM TOIS, IEEE TKDE等。他在相关的学术团体中也很活跃,担任过许多国际会议和研讨会的程序委员和主席。他在北京召开的第十七届国际万维网会议上担任 “WWW in China” 分会的联席主席。 |
| 报告摘要: | |
| 在过去十几年中,互联网搜索的主要目标是组织和管理content web(page, web graph, log等)中的数据和信息。近年来,我们一直致力于研究与今天本质不同的搜索技术,其中一个主要思路就是从“管理web中的数据”到“管理web中的知识”和“直接满足用户信息需求”的转变,与之相关联的就是从“Content Web” 到 “Knowledge Web” 和 “Query Web”的进化。我将报告我和我的研究组在这方面的一些思考和技术创新。 | |
Wei Wang The University of New South Wales, Australia
|
Dr. Wei Wang is a Senior Lecturer in the School of Computer Science and Engineering, The University of New South Wales, Australia. His current research interests include integration of database and information retrieval technologies (DB+IR), data cleaning and integration, query processing and optimization, and spatial databases. He has published over fifty research papers in these areas in major international journal (TODS, VLDB J, TKDE) and conferences (SIGMOD, VLDB, ICDE, WWW). |
| 报告摘要: | |
| Keyword query enables casual users to search XML documents easily
without much knowledge of the structure and the query syntax. However,
the inherent ambiguity of keyword query may result in generating a great
number of results that may be classified into different types. For
users, each result type implies a possible search intention. To improve
the performance of keyword query, it is desirable to efficiently work
out the most relevant result type from the keyword query. In this talk, we propose an estimation-based approach to compute the promising result types for a keyword query, which can help a user quickly narrow down to her specific information need. To speed up the computation, we designed efficient algorithms based on a set of new indexes built for the task. Finally, we present a set of experimental results that evaluate the proposed algorithms and show the potential of this work. |
|
Cuiping Li Renmin University of China
|
Dr. Cuiping Li is currently an Associate Professor of the Information School, Renmin University of China. She got her Ph.D. from the Institute of Computing Technology, the Chinese Academy of Sciences in 2003, and received her M.E. and B.E. from Xi'an Jiao Tong University in 1994 and 1997 respectively. Her current research interests include profit-based data mining, large scale information network analysis, and data warehousing. She has published over 30 research papers in major international conferences and journals including SIGMOD, VLDB, and KDD. |
| 报告摘要: | |
| Recently there has been a lot of interest in graph-based analysis. One of the most important aspects of graph-based analysis is to measure similarity between nodes in a graph. SimRank is a simple and influential measure of this kind, based on a solid graph theoretical model. However, existing methods on SimRank computation suffer from two limitations: 1) the computing cost can be very high in practice; and 2) they can only be applied on static graphs. In this talk, we propose to exploit the inherent parallelism and high memory bandwidth of graphics processing units (GPU) to accelerate the computation of SimRank on large graphs. Furthermore, based on the observation that SimRank is essentially a first-order Markov Chain, we propose to utilize the iterative aggregation techniques for uncoupling Markov chains to compute SimRank scores in parallel for large graphs. The iterative aggregation method can be applied on dynamic graphs. Moreover, it can handle not only the link-updating problem but also the node-updating problem. | |
