Data Integration by Describing Sources with Constraint Databases
Xun Cheng,
Department of Computing Science,
University of California, Santa Barbara
Guozhu Dong,
Department of Computer Science and Engineering,
Wright State University,
gdong@cs.wright.edu
Tzekwan Lau,
Department of Computing Science,
University of California, Santa Barbara
Jianwen Su,
Department of Computing Science,
University of California, Santa Barbara,
su@cs.ucsb.edu
IEEE International Conference on Data Engineering (ICDE), Sydney, March, 1999.
Abstract
We develop a data integration approach for the efficient
evaluation of queries over autonomous source databases. The
approach is based on some novel applications and extensions
of constraint databases techniques. We assume the existence
of a global database schema. The contents of each data
source are described using a set of constraint tuples over
the global schema; each such tuple indicates possible
contributions from the source. The "source description
catalog" (SDC) of a global relation consists of its
associated constraint tuples. Such a way of description is
advantageous since it is flexible to add new sources and to
modify existing ones. In our framework, to evaluate a
conjunctive query over the global schema, a plan generator
first identifies relevant data sources by "evaluating" the
query against the SDCs using techniques of constraint query
evaluation; it then formulates an evaluation plan,
consisting of some specialized queries over different paths.
The evaluation of a query associated with a path is done by
a sequence of partial evaluations at data sources along the
path, similar to side-ways information passing of Datalog;
the partially evaluated queries travel along their
associated paths. Our SDC-based query planning is efficient
since it avoids the NP-complete query rewriting process. We
can achieve further optimization using techniques such as
emptiness test.