LegoDB Project

The LegoDB XML-to-Relational Mapping Engine

LegoDB is a cost-based XML storage mapping engine that explores a space of possible XML-to-relational mappings and selects the best mapping for a given application. LegoDB leverages current XML and relational technologies: 1) it models the target application with an XML Schema, XML data statistics, and an XQuery workload; 2) the space of configurations is generated through XML-Schema rewritings; and 3) the best among the derived configurations is selected using cost estimates obtained through a standard relational optimizer. Experiments show that the LegoDB mapping engine is very effective in practice and can lead to reductions of over 50% in the running times of queries compared to previous mapping techniques.

StatiX: Making XML Count

The availability of summary data for XML documents has many applications, from providing users with quick feedback about their queries, to cost-based storage design and query optimization. StatiX is a novel XML Schema-aware statistics framework that exploits the structure derived by regular expressions (which define elements in an XML Schema) to pinpoint places in the schema that are likely sources of structural skew, and builds concise, yet accurate, statistical summaries for XML data. StatiX leverages standard XML technology for gathering statistics, notably XML Schema validators, and it uses histograms to summarize both the structure and values in an XML document.

External Publications and Presentations

LegoDB: Customizing Relational Storage for XML Documents
(by Philip Bohannon, Juliana Freire, Jayant Haritsa, Maya Ramanath, Prasan Roy 
and Jerome Simeon)
Demo to appear in VLDB 2002.

StatiX: Making XML Count
(pdf version)

(by Juliana Freire, Jayant Haritsa, Maya Ramanath, Prasan Roy and Jerome Simeon)
Proceedings of SIGMOD 2002.

From XML Schema to Relations: A Cost-Based Approach to XML Storage

(by Philip Bohannon, Juliana Freire, Prasan Roy and Jerome Simeon).
Proceedings of ICDE 2002.

Presentation at Dagstuhl Workshop on Foundations of Semi-Structured Data, September, 2001 (pdf)
(by Juliana Freire)

People


Created by juliana@research.bell-labs.com
Last modified: Fri Jun 28 13:50:47 EDT 2002