Tutorial Program

Tutorial 1 Monday, Mar 27 2000, afternoon (14:30 - 18:00)
Tuesday, Mar 28 2000, morning (9:00 - 12:30)
Content Based Image Retrieval Systems (CBIR):
Architecture, Query Processing and Indexing Issues

M.V. Ramakrishna (Monash University, Australia)

Abstract. Content Based Image Retrieval (CBIR) systems have received a lot of attention from the academic and commercial development community in recent years. The aim of such systems is to enable the users pose queries such as, ``retrieve images of sunset'', from a large image database. The CBIR systems need to extract image features, index them using appropriate structures and efficiently process user queries providing the required answers.

There are a few commercial systems such as QBIC from IBM. There is a need for development of new database tools to meet the requirements of the CBIR systems. In this tutorial, we will discuss the issues of image data modeling, query processing techniques, and the necessary high dimensional index structures. We present the current state of the art in each area and the directions for further research.


  1. Introduction, problems and requirements
  2. Existing CBIR systems, such as QBIC and Virage
  3. Image Features (such as color, texture and shape) used in CBIR systems
  4. Image data modeling, architecture of CBIR systems, four level data model used in the CHITRA system.
  5. Query Processing, basic techniques, nature of problem encountered, advanced optimisation, quality versus processing cost considerations, performance evaluation of query processing algorithms.
  6. Indexing of features, high dimensional index structures, R-tree based methods, inherent limitations of the techniques, emerging ntechniques, similarity measures.

M.V. Ramakrishna has received a Ph.D. in Computer Science from the University of Waterloo, Canada. Currently, he holds a position as a senior lecturer at Monash University, Australia. His research is concerned with image and multimedia databases and file structures. Current work focuses on modeling, querying and indexing of image databases. A prototype system for image retrieval based on contents is being built. The design of high dimensional indexing structures is a part of this project.

Tutorial 2 Monday, Mar 27 2000, afternoon (14:30 - 18:00)
Tuesday, Mar 28 2000, morning (9:00 - 12:30)
Data and Web Warehousing: Design and Maintenance

Mukesh Mohania (Western Michigan U, USA), Sanjay Kumar Madria (Purdue U, USA)

Abstract (Part I - Data Warehousing). Data Warehousing is a recent information technology that allows information to be easily and efficiently accessed for decision-making activities. The On-Line Analytical Processing (OLAP) tools are well-suited for complex data analysis, such as multidimensional data analysis, and decision support activities which access data from a separate repository, called a data warehouse, that selects data from many operational, legacy, and possibly heterogeneous data sources. In this part of tutorial, we review the current state of the art in the data warehousing technology. In particular, we start with the architecture of a data warehouse system, cleaning data for warehousing, and discuss the main steps in designing and maintaining a data warehouse. We further discuss the multidimensional model, multidimensional query languages, indexing, and implementation schemes. We also discuss the OLAP architecture, query operations and metadata repository. A number of technical issues for exploratory research are also discussed.

Abstract (Part II - Web Warehousing). Information on the WWW is important not only to individual users, but also to business organizations. The information is placed independently by different organization, thus, documents containing related information may appear at different web-sites. To provide the user with a powerful and friendly query mechanism, manipulation, resue and analysis of information on the web, the critical problem is to find effective ways to build web data models, query language, change detection system, knowledge discovery and web mining tools. In this part of the tutorial, we study the current web data models and query languages to deal with web data. The key objective is to design and implement a web warehouse that materializes and manages useful information from the web. In particular, we discuss building a web warehouse using database approach of managing and manipulating web data.

This tutorial is designed for academicians and researchers working in the area of data and web warehousing. It is to help computer and database professionals/business analysts, such as, database and system administrators, designers, project and technical managers, people involved in planning, designing, developing, implementing and administrating a data warehouse. It is also for students of computer and information science who are pursuing or planning to pursue a higher research degree. This course will also address many research problems related to data and web warehousing technology.

Mukesh Mohania received his Ph.D. in Computer Science & Engineering from Indian Institute of Technology, Mumbai, India in 1995. He worked as Research Fellow at Computer Science Department, University of Melbourne from July 1995 to March 1996. Presently, he is a senior lecturer at School of Computer and Information Science, University of South Australia. He has published in the areas of distributed deductive databases, data warehousing , mobile databases, and data mining. He has been associated with many journals as reviewer and editorial board member. He has also served as PC member for many international conferences in database systems. He is/has been PC chair for several workshops/conferences on data warehousing and data mining. He has offered tutorials on data warehousing in two international conferences and they were well attended and appreciated by attendees.

Sanjay Madria has received his Ph.D. in Computer science from Indian Institute of Technology, Delhi, India in 1995. He is currently visiting Assistant Professor, Department of Computer Science, Purdue University, West Lafayette, USA. In the past, he was with Center for Advanced Information Systems, Nanyang Technological University, Singapore. He was the workshop organizer and PC chair for "Internet Data Management" workshop at Florence, Italy held in Sept. 1999. He is guest-Editor of WWW Journal for Sp. Issues on Web data management. He is Program Chair for EC&WEB 2000 conference to be held in London, UK in Sept. 2000. He has given tutorials on web warehousing in many international conferences (ADBIS'99 and SBBD'99). He was invited panelist by NSF. He was also invited keynote speaker in Annual Computing Congress in Oct.99.

Tutorial 3 Monday, Mar 27 2000, afternoon (14:30 - 18:00)
Tuesday, Mar 28 2000, morning (9:00 - 12:30)
Knowledge Discovery in Databases and Data Mining:
Techniques, Methodologies and Experiences

Fosca Giannotti (CNUCE-CNR, Italy), Dino Pedreschi (U Pisa, Italy)

Abstract. The tutorial covers the basics of the knowledge discovery process, and the main data mining tools and algorithms employed in this process. Emphasis is placed on the integration of KDD/DM and database query languages, as well as the methodological problems posed by data analysis and knowledge extraction. The KDD paradigm is exemplified with two real world experiences form market basket analysis and fraud detection, including demos with commercially available systems. Finally, the open issues related to the convergence of knowledge discovery, data mining and databases are discussed, and a logic-based approach for designing a knowledge discovery support environment (KDSE) is highlighted. The tutorial is moderately advanced, with emphasis on methodology and practical experiences, as well as open research issues. Prior knowledge of KDD and data mining is not needed, although basic knowledge of data warehousing and OLAP would be beneficial.

Fosca Giannotti was born in 1958 in Italy. She graduated in Computer Science (Laurea in Scienze dell' Informazione, summa cum laude) in 1982, from the University of Pisa. From 1982 to 1985 she was a research assistant, Dip. Informatica, Univ. Pisa. From 1985 to 1989 she was a Senior Researcher at R&D Lab. of Sipe Optimization, Pisa and at R&D Lab. of Systems and Management, Pisa (responsible for the project Logiform.) In 1989/90 she was a visiting researcher of MCC, Austin, Texas, USA, involved in the LDL (Logic Database Language) project. She is currently a senior researcher at CNUCE, Institute of CNR (Italian National Research Council) in Pisa. Her current research interests include knowledge discovery and data mining, spatio-temporal reasoning, and database programming languages design, implementation, and formal semantics, especially in the field of logic database languages. She has been involved in several research projects both at national and international level, holding both management and research positions. She has taught classes on databases in universities in Italy and abroad, and is teaching in 1999 a course on data mining at the faculty of Economics at the University of Pisa, and a short course on data mining at the faculty of Engineering at the University of Bologna. She participated in the organization and in the scientific committee of various conferences in the area of Logic Programming and Databases.

Dino Pedreschi was born in 1958 in Italy, and holds a Ph.D. in Computer Science from the University of Pisa, obtained in 1987. He is currently an associate professor at the Dipartimento di Informatica of the University of Pisa, serving as the coordinator of the undergraduate studies in CS. He has been a visiting scientist and professor at the University of Texas at Austin (1989/90), at CWI Amsterdam (1993) and at UCLA (1995). He has a long-standing collaboration with K. R. Apt (CWI) on verification methods for logic programming, and with C. Zaniolo (UCLA) and V. S. Subrahmanian (Univ. of Maryland) on various topics of logic in databases. His current research interests are in logic in databases, and particularly in data analysis, in deductive databases, in the integration of data mining and database querying, in spatio-temporal reasoning, and in formal methods for deductive computing. He has taught classes on programming languages and databases in universities in Italy and abroad, and is collaborating with F. Giannotti in a course on data mining at the faculty of Economics at the University of Pisa. He participated in the scientific committee of various conferences in the area of Logic Programming and Databases, including the LID'96 Workshop on Logic in Databases, where he was program co-chair, and the LPNMR'99 Workshop on Logic Programming and Non Monotonic Reasoning, 1999.

Tutorial 4 Tuesday, Mar 28 2000, afternoon (14:30 - 18:00)
Using XML to Interoperate Distributed Data

Andrea Zisman (UCL, United Kingdom)

Abstract. The tutorial introduces the eXtensible Markup Language (XML) by presenting its main features and focuses on its use for interoperable autonomous database systems and distributed data in general. The objective of the tutorial is to introduce and develop the attendees' skills in XML and its related technologies, and to present different ways of using XML for supporting interoperability of distributed data. We aim to present technical details and an overview of related specifications: XML, XPointer, XLink, XSL, XML-QL, Namespace, DOM, and XML-Data. We also intend to present the main issues and challenges associated with multidatabase systems and distributed data, and the use of XML to alleviate these problems. On the completion of the tutorial attendees should be able to: understand and know about the new World Wide Web technology, create XML and DTD documents; build rules to allow the use of XSL stylesheets; learn how to construct links between XML documents using XLink and XPointer; and know how to use XML to deal with distributed data. The tutorial is intended for software practitioners, managers, teachers, researchers, and students in Computer Science at all levels.

Andrea Zisman is currently a Post Doctoral Research Fellow in the Software Systems Engineering Group, in the Department of Computer Science at University College London. In January 2000, she will be joining City University as a lecturer. She obtained her PhD from the Department of Computing at Imperial College, London, UK, in Interoperability of Distributed Databases. She also has a MSc degree in Applied Mathematics to Computer Science from University of Sao Paulo, Brazil, in B-Trees. She has published extensively in the area of Distributed Databases and B-Trees. Currently she is working on the problem of consistency management of distributed documents using XML and related technologies. She has presented tutorials about XML in Software Engineering conferences (e.g. RE 99 and ESEC/FSE 99).

Tutorial 5 Tuesday, Mar 28 2000, afternoon (14:30 - 18:00)
Geo-Spatially Referenced Digital Libraries

Michael Freeston (U Aberdeen, Scotland), Linda Hill (UC Santa Barbara, USA)

Tutorial 6 Tuesday, Mar 28 2000, afternoon (14:30 - 18:00)
Designing Spatio-Temporal Databases

Stefano Spaccapietra (EPFL, Switzerland), Christine Parent (U Lausanne, Switzerland),
Esteban Zimanyi (U Libre de Bruxelles, Belgium)

Abstract. Current spatio-temporal modeling approaches do not cope satisfactorily with designers' requirements. In this tutorial we first identify the goals of a spatio-temporal conceptual model: what properties have to be adhered to to make a good, powerful and effective conceptual model. Out of these criteria, we highlight the orthogonality principle, that allows to deal autonomously with the three modeling dimensions that are fundamental to GIS applications: the structural, spatial, and temporal dimensions. We show how obeying orthogonality allows building a model that achieves both simplicity (as concepts are independent from each other) and expressive power (as concepts may be freely associated). We describe all the spatio-temporal features of the model that is suggested. This model has been implemented and can be translated to operational models of existing products. The tutorial briefly describes the architecture we defined for providing users with a set of conceptual interfaces for defining and accessing spatio-temporal information systems. Finally, the tutorial reports on results of an experimentation which allowed us to assess the qualities of the model.

Stefano Spaccapietra is full professor at the Computer Science Department, Swiss Federal Institute of Technology, in Lausanne, Switzerland, where he chairs the database laboratory. He has been in an academic position since 1969, when he started teaching database systems at the university of Paris VI. He moved to the university of Burgundy, Dijon, in 1983 to occupy a professor position at the Institute of Technology. He got his PhD from the University of Paris VI, in 1978. Since he joined EPFL in 1988, he developed R&D activities on visual user interfaces, semantic interoperability, spatio-temporal data modelling and multimedia databases. He currently chairs the steering committees of IFIP Visual Database Systems conferences and chaired for 5 years the steering committees of Entity-Relationship conferences. He chairs the Database Group of the Swiss Informatics Society.

Christine Parent is a full professor at the Computer Science Department, University of Burgundy at Dijon (France). She is currently on leave and is an associate professor at the University in Lausanne. She got her PhD from the University of Paris VI, in 1987. Since 1983 she was deeply involved in the development of an extended entity relationship model (ERC+) and of associated data manipulation languages: an algebra, a calculus and a extended SQL one. She is currently working on two main areas: a cooperative design methodology relying on the integration of existing heterogeneous databases, and modeling of spatio-temporal databases. She recently co-authored several papers on schema integration methodologies and spatio-temporal database modeling. She gave several tutorials on "Advanced Entity-Relationship Databases" and on "Database Integration in Federated Databases".

Esteban Zimányi is professor at the Computer Science Department of the Université Libre de Bruxelles. He obtained a "Licence" (1988) and a doctorate (1992) in Computer Science at the Université Libre de Bruxelles. His research interests are related to advanced database models and cover the manipulation of imperfect information, as well as strategies for database design and methods for software development. During 1997, he was invited researcher and lecturer at the Ecole Polytechnique Fédérale de Lausanne where he worked on geographical and temporal databases.