学术报告——Normalization of Duplicate Records from Multiple Sources

日期:2018-10-08

报告人:Weiyi Meng ProfessorDepartment of Computer Science State University of New York at BinghamtonU.S.A.





Data consolidation is a challenging issue in data integration. The value of data explodes when it is linked and fused with other data from numerous (Web) sources. The promise of Big Data hinges upon addressing several big data integration challenges, such as record linkage at scale, real-time data fusion, and integrating Deep Web. Although much work has been conducted on these problems, there is limited work on creating a uniform, standard record from a group of records corresponding to the same real-world entity. Such a record representation, referred to as normalized record, is important for both front-end and back-end applications. We refer to this task as record normalization. In this talk, I will introduce our recent work in formalizing the record normalization problem and present an in-depth analysis of normalization granularity levels (e.g., record, field, and field-value-component) and of normalization forms (e.g., typical versus complete). I will also introduce a comprehensive framework for computing the final normalized record. The proposed framework includes a large number of record normalization strategies.


Weiyi Meng is currently a professor and the chair of the Department of Computer Science of the State University of New York at Binghamton. He previously served as Associate Dean for Research and Graduate Studies of the Thomas J. Watson School of Engineering and Applied Science. He received his bachelor’s degree in mathematics from Sichuan University as a member of class 77. He received his MS and Ph.D. in computer science from University of Illinois at Chicago in 1988 and 1992, respectively. His research interests include metasearch engines, Web database integration systems, Internet-based information retrieval, information trustworthiness analysis, Web data quality, Web information extraction, sentiment analysis, and database management system. He is the co-author of three books “Deep Web Query Interface Understanding and Integration”, “Advanced Metasearch Engine Technology” and “Principles of Database Query Processing for Advanced Applications”. He has over 150 research publications. He has served as general chair and PC chair of several international conferences and served on the editorial boards of several journals.




