----------------------------------------------------------- README for the source code published by Nikolaus Augsten on http://www.inf.unibz.it/~augsten/src. ----------------------------------------------------------- ========= CONTENTS: ========= This ZIP contains three Java projects: - approxlib: Library that contains tree distance algorithms, approximate join algorithms, matching algorithms, etc. The other two projects depend on this library. - test: Test classes for "approxlib". Uses JUnit 4 (http://www.junit.org). Depends on "approxlib". - experiments: programs used in the experiments of the publications [ABG-VLDB-05,ABG-VLDB-06,ABDG-ICDE-08] and some tools that we used to load, clean, and explore data. Depends on "approxlib". The source code of each project is contained in the directory "src". ====== SETUP: ====== We compiled the code with Java 1.5.0 by Sun (*) and we use the relational database MySQL 4.1.15 (http://dev.mysql.com). We access MySQL with the JDBC driver v3.0.11 (included in the ZIP). (*) Java 1.6 has a bug in the SAX parser that loads the whole XML into main memory before parsing it. This makes it unusable for our experiments, where XML files are too large for main memory. =========== REFERENCES: =========== [ABDG-ICDE-08] N. Augsten, M. Böhlen, C. Dyreson, and J. Gamper. Approximate Joins for Data Centric XML. In Proceedings of the International Conference on Data Engineering (ICDE-08), Cancún, Mexico, April 2008. IEEE Computer Society. [ABG-VLDB-06] N. Augsten, M. Böhlen, and J. Gamper. An incrementally maintainable index for approximate lookups in hierarchical data. In Proceedings of the 32th International Conference on Very Large Databases (VLDB-06), pages 247-258, Seoul, Korea, Sep. 2006. Morgan Kaufmann Publishers, Inc. [ABG-VLDB-05] N. Augsten, M. Böhlen, and J. Gamper. Approximate matching of hierarchical data using pq-grams. In Proceedings of the 31th International Conference on Very Large Databases (VLDB-05), pages 141-152, Trondheim, Norway, Aug.-Sep. 2005. Morgan Kaufmann Publishers, Inc.