S. Abiteboul, Buneman From Relations to Semistructured Data and XML, 2000.

, AGAVE -Architecture for Ge Visualization and Exchange

D. Ballou, S. Madnick, W. , and R. , Assuring Information [4] uence Markup Language, Available at Quality. Journal of Management Information Systems, vol.20, issue.3, pp.3-9, 2004.

B. Seq-http, bsml.org/ Buneman, P. Semistruc [5] tured Data. Proc. PODS '97, 1997.

S. Buneman, S. Davison, G. Hillebrand, and D. Suciu, A query language and op data, Proceedings of the ACM SIGMOD International Conference on Management of Data, pp.505-516, 1996.

D. Calvanese, G. De-giacomo, and M. Lenzerini, Mod and Querying Semi-Structured Data, Networking and Information Systems Journal, vol.2, issue.2, pp.253-273, 1999.

D. Bank and . Japan,

R. Embl-nucleotide-sequence-database, gov/Genbank/index.html itiwiriyawej, C. Element matching [12] ta Management [13], perative [16] a. Proc. of the Third IEEE Meta-Data [17] [18] the Eighth International [19] , [20] .nih.gov/RefSeq/ [11] Hammer, J. and Pluemp across xml sources using a multi-strategy clustering technique. Data and Knowledge Engineering (DKE), pp.48-297, 2004.

Y. W. Lee and D. M. Strong, Knowing-Why About Da Processes and Data Quality, Journal of Information Systems, vol.20, issue.3, pp.13-39, 2003.

Y. W. Lee, D. M. Strong, B. K. Kahn, W. , and R. Y. Aimq, A methodology for information quality asses Information & Management, pp.133-146, 2002.

J. Mchug, S. Abiteboul, R. Goldman, D. Quass, and J. Widom, Lore: A database management system for semistructured data, SIGMOD Record, vol.26, p.3, 1997.

M. Mecella, M. Scannapieco, A. Virgillito, T. Baldoni-catarci, and C. Batini, Managing Data Quality in Coo Information Systems, LNCS 2800, 2003.
DOI : 10.1007/3-540-36124-3_28

G. Mihaila, L. Raschid, and M. E. Vidal, Querying " quality of data " metadat Conference, pp.526-531, 1999.

P. Missier and C. Batini, A Multidimensional Model for Information Quality in Cooperative Information Systems, Proceedings of the Eighth International Conference on Information Quality, pp.25-40, 2003.

H. Müller, F. Naumann, and J. C. Freytag, Data Quality in Genome Databases, Proceedings of Conference on Information Quality, pp.269-284, 2003.

F. Naumann, J. C. Freytag, and U. Leser, Completeness of integrated information sources, Information Systems, vol.29, issue.7, pp.29-36, 2004.
DOI : 10.1016/j.is.2003.12.005
URL : https://edoc.hu-berlin.de/bitstream/18452/9840/1/20FEAMVhCTWek.pdf

. Ncbi-reference and . Sequences, Available at http://www.ncbi.nlm Communications he ACM 211- [23] i, R. The DaQuinCIS Architecture: a Platform for, 2002.

K. Orr, Data quality and systems theory, Communications of the ACM, vol.41, issue.2, pp.66-71, 1998.
DOI : 10.1145/269012.269023

L. L. Pipino, Y. W. Lee, W. , and R. Y. , Data Quality Assessment. Communications of t, vol.218

M. Scannapieco, A. Virgillito, M. Marchetti, and M. Mecella, Baldon Exchanging and Improving Data Quality in Cooperative

R. Y. Wang, M. P. Reddy, and H. B. Kon, Toward quality data: An attribute-based approach, Decision Support Systems, vol.13, issue.3-4, pp.349-372, 1995.
DOI : 10.1016/0167-9236(93)E0050-N

D. J. Abadi and D. Carney,

A. Alonso, F. Casati, H. Kuno, and V. Machiraju, Web Services: Concepts, Architectures and Applications, The VLDB Journal, pp.120-139, 2003.
DOI : 10.1007/978-3-662-10876-5

S. Babu, J. Widom-stefano-ceri, and J. Widom, [4] Apache Software Foundation. Axis Available at http://ws.apache.org/axis Deriving Production Rules for Incremental View Maintenance, Proc. 5th Intl. Workshop on the Design and Management of Data Warehouses New Developments In Oracle Data Warehousing. Available at Proc. VLDB, pp.109-120, 1991.

Y. Cui, J. Widom, H. Florescu, D. Shasha, D. Simon et al., Lineage tracing for general data warehouse transformations, Proc. ACM SIGMOD, pp.41-58, 2000.
DOI : 10.1007/s00778-002-0083-8

D. Gross and C. Harris, Fundamentals of Queuing Theory, 1998.

H. Gupta and I. S. Mumick, Incremental maintenance of aggregate and outerjoin expressions, Information Systems, vol.31, issue.6, 2004.
DOI : 10.1016/j.is.2004.11.011

A. Gupta and I. Singh-mumick, Maintenance of Materialized Views: Problems, Techniques, and Applications, pp.3-18, 1995.

Q. Jiang and S. Chakravarthy, Queueing analysis of relational operators for continuous data streams, Proceedings of the twelfth international conference on Information and knowledge management , CIKM '03, pp.271-278, 2003.
DOI : 10.1145/956863.956916

W. Labio, J. Yang, Y. Cui, H. Garcia-molina, and J. Widom, Performance Issues in Incremental Warehouse Maintenance, Proc. VLDB, pp.461-472, 2000.

W. Labio and H. Garcia-molina, Efficient Snapshot Differential Algorithms for Data Warehousing, Proc. VLDB, pp.63-74, 1996.

D. Lomet and J. Gehrke, Special Issue on Data Stream Processing, Data Engineering Bulletin, vol.26, issue.1, 2003.

W. Labio, J. L. Wiener, H. Garcia-molina, and V. Gorelik, Efficient Resumption of Interrupted Warehouse Loads, Proc. of ACM SIGMOD, pp.46-57, 2000.

, On-Time Data Warehousing with Oracle10g -Information at the Speed of your Business. An Oracle White Paper Available at http://www.oracle.com/ technology/products/bi/pdf/10gr1_twp_bi_ontime_etl.pdf [20] P. Graf. The Program Base Library. Publicly available through http, 2003.

. Vijayshankar-raman and M. Joseph, Hellerstein: Potter's Wheel. An Interactive Data Cleaning System, Proc. VLDB, pp.381-390, 2001.

C. White, Intelligent Business Strategies: Real-Time Data Warehousing Heats Up

, The fourth component employs formal models

R. Ananthakrishna, S. Chaudhuri, and V. Ganti, Eliminating Fuzzy Duplicates in Data Warehouses, VLDB, 2002.
DOI : 10.1016/B978-155860869-6/50058-5

I. Bhattacharya and L. Getoor, Iterative record linkage for cleaning and integration, Proceedings of the 9th ACM SIGMOD workshop on Research issues in data mining and knowledge discovery , DMKD '04, 2004.
DOI : 10.1145/1008694.1008697

M. Bilenko and R. Mooney, Adaptive duplicate detection using learnable string similarity measures, Proceedings of the ninth ACM SIGKDD international conference on Knowledge discovery and data mining , KDD '03
DOI : 10.1145/956750.956759

S. Chaudhuri, K. Ganjam, V. Ganti, and R. Motwani, Robust and efficient fuzzy match for online data cleaning, Proceedings of the 2003 ACM SIGMOD international conference on on Management of data , SIGMOD '03, 2003.
DOI : 10.1145/872757.872796

P. Christen, T. Churches, and J. X. Zhu, Probabilistic name and address cleaning and standardisation. The Australasian Data Mining Wshp, 2002.

W. W. Cohen and J. Richman, Learning to match and cluster large high-dimensional data sets for data integration, Proceedings of the eighth ACM SIGKDD international conference on Knowledge discovery and data mining , KDD '02, 2002.
DOI : 10.1145/775047.775116

X. Dong, A. Halevy, and J. Madhavan, Reference reconciliation in complex information spaces, Proceedings of the 2005 ACM SIGMOD international conference on Management of data , SIGMOD '05, 2005.
DOI : 10.1145/1066157.1066168

M. G. Elfeky and V. S. Verykios, On search enhancement of the record linkage process, KDD-2003 Wshp on Data Cleaning, Record Linkage, and Object Consolidation, 2003.

C. Faloutsos, K. Mccurley, and A. Tomkins, Fast discovery of connection subgraphs, Proceedings of the 2004 ACM SIGKDD international conference on Knowledge discovery and data mining , KDD '04, 2004.
DOI : 10.1145/1014052.1014068

I. Fellegi and A. Sunter, A Theory for Record Linkage, Journal of the American Statistical Association, vol.63, issue.328, pp.1183-1210, 1969.
DOI : 10.1126/science.130.3381.954

H. Garcia-molina, J. Ullman, and J. Widom, Database systems: the complete book, 2002.

L. Gravano, P. Ipeirotis, H. Jagadish, N. Koudas, S. Muthukrishnan et al., Approximate string joins in a database (almost) for free, VLDB01

M. Hernandez and S. Stolfo, The merge/purge prob-lem for large databases, SIGMOD, 1995.

D. Kalashnikov and S. Mehrotra, Exploiting relationships for domain-independent data cleaning, SIAM SDM, 2005.
DOI : 10.1137/1.9781611972757.24

D. V. Kalashnikov and S. Mehrotra, RelDC project

D. V. Kalashnikov and S. Mehrotra, Learning importance of relationships for reference disambiguation. UCI Technical Report RESCUE-04-23, 2004.

D. V. Kalashnikov, S. Mehrotra, and Z. Chen, Exploiting relationships for domain-independent data cleaning, SIAM International Conference on Data Mining (SIAM SDM 2005), 2005.
DOI : 10.1137/1.9781611972757.24

M. Lee, W. Hsu, and V. Kothari, Cleaning the spurious links in data, IEEE Intelligent Systems, 2004.

A. K. Mccallum, K. Nigam, and L. Ungar, Efficient clustering of high-dimensional data sets with application to reference matching, Proceedings of the sixth ACM SIGKDD international conference on Knowledge discovery and data mining , KDD '00, 2000.
DOI : 10.1145/347090.347123

M. Michalowski, S. Thakkar, and C. Knoblock, Exploiting secondary sources for automatic object consolidation, KDD-2003 Wshp on Data Cleaning, Record Linkage, and Object Consolidation, 2003.

A. E. Monge and C. P. Elkan, An efficient domain-independent algorithm for detecting approximately duplicate database records, In SIGMOD Wshp on Research Issues on Data Mining and Knowledge Discovery, 1997.

M. Neiling and S. Jurk, The object identification framework, KDD-2003 Wshp on Data Cleaning, Record Linkage, and Object Consolidation, 2003.

H. B. Newcombe, J. M. Kennedy, S. J. Axford, and A. P. James, Automatic Linkage of Vital Records: Computers can be used to extract "follow-up" statistics of families from files of routine records, Science, vol.130, issue.3381, pp.954-959, 1959.
DOI : 10.1126/science.130.3381.954

H. Pasula, B. Marthi, B. Milch, S. Russell, and I. Shpitser, Identity uncertainty and citation matching, Advances in Neural Processing Systems 15, 2002.

D. Quass and P. Starkey, Record linkage for genealogical databases, KDD-2003 Wshp on Data Cleaning, 2003.

L. D. Raedt, Three Companions for Data Mining in First Order Logic, Relational Data Mining, 2001.
DOI : 10.1007/978-3-662-04599-2_5

S. Sarawagi and A. Bhamidipaty, Interactive deduplication using active learning, Proceedings of the eighth ACM SIGKDD international conference on Knowledge discovery and data mining , KDD '02, 2002.
DOI : 10.1145/775047.775087
URL : http://www.it.iitb.ac.in/~sunita/papers/kdd02.pdf

C. E. Shannon, The Mathematical Theory of Communication, 1949.
DOI : 10.1063/1.3067010

J. Shawe-taylor and N. Cristianni, Kernel Methods for Pattern Analysis, 2004.
DOI : 10.1017/CBO9780511809682

J. Shi and J. Malik, Normalized cuts and image segmentation, IEEE Transactions on Pattern Analysis and Machine Intelligence, vol.22, issue.8, 2000.

S. Tejada, C. A. Knoblock, and S. Minton, Learning domain-independent string tranformation weights for high accuracy object identification, SIGKDD, 2002.
DOI : 10.1145/775047.775099
URL : http://www.cse.unsw.edu.au/~qzhang/papers/143.pdf

V. Verykios, G. V. Moustakides, and M. Elfeky, A Bayesian decision model for cost optimal record matching, The VLDB Journal The International Journal on Very Large Data Bases, vol.12, issue.1, pp.28-40, 2003.
DOI : 10.1007/s00778-002-0072-y
URL : http://www.ssp.ece.upatras.gr/moustakides/downloads/journals/db2003.pdf

S. White and P. Smyth, Algorithms for estimating relative importance in networks, Proceedings of the ninth ACM SIGKDD international conference on Knowledge discovery and data mining , KDD '03, 2003.
DOI : 10.1145/956750.956782
URL : http://www.datalab.uci.edu/papers/white_smyth.pdf

G. Wiederhold,

, Information sharing across private databases, Proceedings of ACM SIGMOD, pp.86-97, 2003.

R. Baxter, P. Christen, and T. Churches, A comparison of fast blocking methods for record linkage, Proceedings of 9th ACM SIGKDD Workshop on Data Cleaning, Record Linkage and Object Consolidation, 2003.

R. Canetti, U. Feige, O. Goldreich, and M. Naor, Adaptively secure multi-party computation, Proceedings of the twenty-eighth annual ACM symposium on Theory of computing , STOC '96, pp.639-648, 1996.
DOI : 10.1145/237814.238015

T. Churches and P. Christen, Some methods for blindfolded record linkage, BMC Medical Informatics and Decision Making, vol.2, issue.1, 2004.
DOI : 10.1186/1471-2288-2-12
URL : https://bmcmedinformdecismak.biomedcentral.com/track/pdf/10.1186/1472-6947-4-9

W. Cohen, P. Ravikumar, and S. Fienberg, A comparison of string distance metrics for matching names and records, KDD Workshop on Data Cleaning, Record Linkage, and Object Consolidation, 2003.

W. Cohen, P. Ravikumar, and S. E. Fienberg, A secure protocol for computing string distance metrics, Proceedings of ICDM Workshop on Privacy and Security Aspects of Data Mining, 2004.

W. Du and M. Atallah, Potocols for secure remote database access with approximate matching, 1st Workshop on Security and Privacy in E-Commerce, 2000.
DOI : 10.1007/978-1-4615-1467-1_6
URL : http://www.cerias.purdue.edu/ssl/techreports-ssl/2000-15.pdf

A. Evfimievski, J. Gehrke, and R. Srikant, Limiting privacy breaches in privacy preserving data mining, Proceedings of the twenty-second ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems , PODS '03, 2003.
DOI : 10.1145/773153.773174
URL : http://www.cs.cornell.edu/johannes/papers/2003/pods03-privacy.pdf

A. Evfimievski, R. Srikant, R. Agrawal, and J. Gehrke, Privacy preserving mining of association rules, Proceedings of 8th ACM SIGKDD, pp.217-228, 2002.
DOI : 10.1145/775047.775080
URL : http://www.almaden.ibm.com/cs/people/srikant/papers/kdd02.ps.gz

I. P. Fellegi and A. B. Sunter, A Theory for Record Linkage, Journal of the American Statistical Association, vol.63, issue.328, pp.1183-1210, 1969.
DOI : 10.1126/science.130.3381.954

L. Gravano, P. G. Ipeirotis, J. Jagadish, N. Koudas, S. Muthukrishnan et al., Approximate string joins in a database (almost) for free, Proceedings of 27th VLDB, pp.491-500, 2001.

L. Gravano, P. G. Ipeirotis, K. Koudas, and D. Srivastava, Text joins in an RDBMS for web data integration, Proceedings of the twelfth international conference on World Wide Web , WWW '03, 2003.
DOI : 10.1145/775152.775166
URL : http://www1.cs.columbia.edu/~gravano/Papers/2003/www03.pdf

A. Mauricio, S. J. Hernández, and . Stolfo, The merge/purge problem for large databases, Proceedings of ACM SIGMOD, pp.127-138, 1995.

H. Kargupta, S. Datta, Q. Wang, and K. Sivakumar, Random-data perturbation techniques and privacy-preserving data mining, Proceedings of ICDM, pp.160-164, 2003.
DOI : 10.1007/BF00353874

D. Malkhi, N. Nisan, B. Pinkas, and Y. Sella, Fairplay ? a secure two-party computation system, Proceedings of 11th USENIX Security Symposium, 2004.

H. Newcombe, J. Kennedy, S. Axford, and A. James, Automatic Linkage of Vital Records: Computers can be used to extract "follow-up" statistics of families from files of routine records, Science, vol.130, issue.3381, pp.954-959, 1959.
DOI : 10.1126/science.130.3381.954

H. Polat and W. Du, Privacy-preserving collaborative filtering using randomized perturbation techniques, Third IEEE International Conference on Data Mining, 2003.
DOI : 10.1109/ICDM.2003.1250993
URL : http://www.cis.syr.edu/~wedu/Research/./paper/icdm2003.pdf

C. Quantin, H. Bouzelat, F. Allaert, A. Benhamiche, J. Faivre et al., How to ensure data security of an epidemiological follow-up:quality assessment of an anonymous record linkage procedure, International Journal of Medical Informatics, vol.49, issue.1, pp.117-122, 1998.
DOI : 10.1016/S1386-5056(98)00019-7

C. Quantin, H. Bouzelat, and L. Dusserre, A computerized record hash coding and linkage procedure to warrant epidemiological follow-up data security, Studies in Health Technology and Informatics, vol.43, pp.339-342, 1997.

R. Rivest, Chaffing and winnowing: Confidentiality without encryption. MIT, Internal Paper, 1998.

, Automatic Text Processing, 1989.

S. Tejada, C. A. Knoblock, and S. Minton, Learning object identification rules for information integration, Information Systems, vol.26, issue.8, pp.607-633, 2001.
DOI : 10.1016/S0306-4379(01)00042-4

J. Vaidya and C. Clifton, Secure set intersection cardinality with application to associate rule mining, Journal of Computer Security, 2004.
DOI : 10.3233/jcs-2005-13401
URL : http://www.cerias.purdue.edu/ssl/techreports-ssl/2005-136.pdf

W. E. Winkler, Matching and record linkage, Business Survey Methods, pp.355-384, 1995.
DOI : 10.1002/wics.1317
URL : http://www.census.gov/srd/papers/pdf/rr93-8.pdf

L. Xiong, S. Chitti, and L. Liu, Topk Queries across Multiple Private Databases, 25th IEEE International Conference on Distributed Computing Systems (ICDCS'05), 2005.
DOI : 10.1109/ICDCS.2005.82

C. Andrew and . Yao, Protocols for secure computations, Proceedings of the 23rd Symposium on FOCS, pp.160-164, 1982.

R. Ananthakrishna, S. Chaudhuri, and V. Ganti, Eliminating Fuzzy Duplicates in Data Warehouses, VLDB, 2002.
DOI : 10.1016/B978-155860869-6/50058-5
URL : http://www.cs.ust.hk/vldb2002/VLDB2002-papers/S17P01.pdf

M. Bilenko, R. Mooney, W. Cohen, P. Ravikumar, and S. Fienberg, Adaptive name matching in information integration, IEEE Intelligent Systems, vol.18, issue.5, pp.16-23, 2003.
DOI : 10.1109/MIS.2003.1234765

V. R. Borkar, K. Deshmukh, and S. Sarawagi, Automatic Segmentation of Text into Structured Records, ACM SIGMOD, 2001.
DOI : 10.1145/376284.375682
URL : http://ranger.uta.edu/~alp/ix/readings/p175-borkar-auto-classify-text-into-structured-records.pdf

W. Cohen, P. Ravikumar, and S. Fienberg, A Comparison of String Distance Metrics for Name-matching tasks, IIWeb Workshop held in conjunction with IJCAI, 2003.

N. Cristianini and J. Shawe-taylor, An Introduction to Support Vector Machines, 2000.

J. M. Cruz, N. J. Klink, and T. Krichel, Personal Data in a Large Digital Library, 2000.

P. T. Davis, D. K. Elson, and J. L. Klavans, Methods for precise named entity matching in digital collections, 2003 Joint Conference on Digital Libraries, 2003. Proceedings., 2003.
DOI : 10.1109/JCDL.2003.1204852
URL : http://www.cs.columbia.edu/~delson/pubs/jcdl03.pdf

I. P. Fellegi and A. B. Sunter, A Theory for Record Linkage, Journal of the American Statistical Association, vol.63, issue.328, pp.1183-1210, 1969.
DOI : 10.1126/science.130.3381.954

L. Gravano, P. G. Ipeirotis, N. Koudas, and D. Srivastava, Text joins in an RDBMS for web data integration, Proceedings of the twelfth international conference on World Wide Web , WWW '03, 2003.
DOI : 10.1145/775152.775166
URL : http://www1.cs.columbia.edu/~gravano/Papers/2003/www03.pdf

H. Han, C. L. Giles, H. Zha, C. Li, and K. Tsioutsiouliklis, Two supervised learning approaches for name disambiguation in author citations, Proceedings of the 2004 joint ACM/IEEE conference on Digital libraries , JCDL '04, 2004.
DOI : 10.1145/996350.996419
URL : http://clgiles.ist.psu.edu/pubs/JCDL-2004-author-disambiguation.pdf

M. A. Hernandez and S. J. Stolfo, The Merge/Purge Problem for Large Databases, ACM SIGMOD, 1995.

Y. Hong, B. On, and D. Lee, System Support for Name Authority Control Problem in Digital Libraries: OpenDBLP Approach, ECDL, 2004.
DOI : 10.1007/978-3-540-30230-8_13

J. A. Hylton, Identifying and Merging Related Bibliographic Records

M. A. Jaro, Advances in Record-Linkage Methodology as Applied to Matching the, 1985.
DOI : 10.2307/2289924

, J. of the American Statistical Association, vol.84, issue.406, 1989.

R. P. Kelley, Blocking Considerations for Record Linkage Under Conditions of Uncertainty, Proc. of Social Statistics Section, pp.602-605, 1984.

S. Lawrence, C. L. Giles, and K. Bollacker, Digital libraries and autonomous citation indexing, Computer, vol.32, issue.6, pp.67-71, 1999.
DOI : 10.1109/2.769447
URL : http://www.neci.nj.nec.com/homepages/giles/papers/IEEE.Computer.DL-ACI.ps.Z

A. Mccallum, Efficient clustering of high-dimensional data sets with application to reference matching, Proceedings of the sixth ACM SIGKDD international conference on Knowledge discovery and data mining , KDD '00, 2000.
DOI : 10.1145/347090.347123

B. On, D. Lee, J. Kang, and P. Mitra, Comparative study of name disambiguation problem using a scalable blocking-based framework, Proceedings of the 5th ACM/IEEE-CS joint conference on Digital libraries , JCDL '05, 2005.
DOI : 10.1145/1065385.1065463
URL : http://nike.psu.edu/publications/jcdl05.pdf

H. Pasula, Advances in Neural Information Processing Systems, 2003.

M. M. Synman and M. Rensburg, Revolutionizing name authority control, Proceedings of the fifth ACM conference on Digital libraries , DL '00, 2000.
DOI : 10.1145/336597.336660

J. W. Warnner and E. W. Brown, Automated name authority control, Proceedings of the first ACM/IEEE-CS joint conference on Digital libraries , JCDL '01, 2001.
DOI : 10.1145/379437.379441

W. E. Winkler and Y. Thibaudeau, An Application of the Fellegi-Sunter Model of Record Linkage to the 1990 U.S. Decennial Census, 1991.

]. J. Albert, Algebraic properties of bag data types, International Conference on Very Large Data Bases, pp.211-219, 1991.

S. Bell and P. Brockhausen, Discovery of data dependencies in relational databases, ML-Net Familiarization Workshop, 1995.

S. Bergamaschi, S. Castano, M. Vincini, and D. Beneventano, Semantic integration of heterogeneous information sources, Data & Knowledge Engineering, vol.36, issue.3, pp.215-249, 2001.
DOI : 10.1016/S0169-023X(00)00047-1

M. Bilenko and R. J. Mooney, Employing trainable string similarity metrics for information integration

, Proc. of IJCAI-03 Workshop on Information Integration on the Web (IIWeb-03), pp.67-72, 2003.

M. A. Casanova, R. Fagin, and C. H. Papadimitriou, Inclusion dependencies and their interaction with functional dependencies, Proceedings of ACM Conference on Principles of Database Systems (PODS), pp.171-176, 1982.

W. W. Cohen, Integration of heterogeneous databases without common domains using queries based on textual similarity, ACM SIGMOD Record, vol.27, issue.2, pp.201-213, 1998.
DOI : 10.1145/276305.276323

W. W. Cohen and H. Hirsh, Joins that generalize: Text classification using WHIRL, Proc. of 4th Intl. Conf. on Knowl. Discovery and Data Mining (KDD), pp.1998-169

F. De-marchi, S. Lopes, and J. Petit, Efficient Algorithms for Mining Inclusion Dependencies, Proceedings of International Conference on Extending Database Technology (EDBT), pp.464-476, 2002.
DOI : 10.1007/3-540-45876-X_30
URL : https://hal.archives-ouvertes.fr/hal-00113375

F. De-marchi and J. Petit, Zigzag: a new algorithm for mining large inclusion dependencies in databases, Third IEEE International Conference on Data Mining, pp.27-34, 2003.
DOI : 10.1109/ICDM.2003.1250899

A. Doan, P. Domingos, and A. Halevy, Learning source description for data integration, Proceedings of the Third International Workshop on the Web and Databases (WebDB), pp.81-86, 2000.

P. A. Flach and I. Savnik, Database dependency discovery: a machine learning approach, AI Communications, vol.12, issue.3, pp.139-160, 1999.

A. Halevy and J. Madhavan, Corpus-based knowledge representation, Proc. of 18th Intl. Joint Conf. on Artificial Intelligence (IJCAI'03), pp.1567-1572, 2003.

Y. Huhtala, J. Kärkkäinen, P. Porkka, and H. Toivonen, Efficient discovery of functional and approximate dependencies using partitions, Proceedings 14th International Conference on Data Engineering, pp.392-401, 1998.
DOI : 10.1109/ICDE.1998.655802

J. Kang and J. F. Naughton, On schema matching with opaque column names and data values, Proceedings of the 2003 ACM SIGMOD international conference on on Management of data , SIGMOD '03, pp.205-216, 2003.
DOI : 10.1145/872757.872783

A. J. Knobbe and P. W. Adriaans, Discovering foreign key relations in relational databases, Proceedings of the Thirteenth European Meeting on Cybernetics and Systems Research, pp.961-966, 1996.

, Austrian Soc. Cybernetic Studies

A. Koeller and E. A. Rundensteiner, Discovery of high-dimensional inclusion dependencies, Proceedings 19th International Conference on Data Engineering (Cat. No.03CH37405), pp.683-685, 2003.
DOI : 10.1109/ICDE.2003.1260834

A. Koeller and E. A. Rundensteiner, Heuristic Strategies for Inclusion Dependency Discovery, Proc. of 3rd Intl. Conf. on Ontologies, Databases, and Applications of Semantics, pp.891-908, 2004.
DOI : 10.1007/978-3-540-30469-2_5

J. A. Larson, S. B. Navathe, and R. Elmasri, A theory of attributed equivalence in databases with application to schema integration, IEEE Transactions on Software Engineering, vol.15, issue.4, pp.449-463, 1989.
DOI : 10.1109/32.16605

W. Li and C. Clifton, SEMINT: A tool for identifying attribute correspondences in heterogeneous databases using neural networks, Data & Knowledge Engineering, vol.33, issue.1, pp.49-84, 2000.
DOI : 10.1016/S0169-023X(99)00044-0

W. Lim and J. Harrison, Discovery of constraints from data for information system reverse engineering, Proc. of Australian Software Engineering Conference (ASWEC '97), 1997.

J. C. Mitchell, Inference rules for functional and inclusion dependencies, Proceedings of the 2nd ACM SIGACT-SIGMOD symposium on Principles of database systems , PODS '83, pp.58-69, 1983.
DOI : 10.1145/588058.588067

E. Rahm and P. A. Bernstein, A survey of approaches to automatic schema matching, VLDB Journal: Very Large Data Bases, pp.334-350, 2001.
DOI : 10.1007/s007780100057

G. Salton, Automatic Text Processing: The Transformation, Analysis, and Retrieval of Information by Computer, 1989.

E. Schallehn, K. Sattler, and G. Saake, Efficient similarity-based operations for data integration, Data & Knowledge Engineering, vol.48, issue.3, pp.361-387, 2004.
DOI : 10.1016/j.datak.2003.08.004

Q. Wei and G. Chen, Efficient discovery of functional dependencies with degrees of satisfaction, International Journal of Intelligent Systems, vol.121, issue.11, pp.1089-1110, 2004.
DOI : 10.1007/978-1-4615-4068-7

, YMR186W (SM>0.06 ; 50% of genes not having CV high enough

, gene total) The reason other subclusters yielded low significance was because a majority of their genes had high average(CV) over their CAs, so most genes were assigned on the basis of categorical similarity rather than on the basis of numerical similarity. Thus the dominant factor in the significance metric was low and the overall result was low. We next needed to identify the CAs in these clusters with the highest average(CV) throughout the entire cluster, YBR029C (SM>1 ; 0% of genes not having CV high enough We identified the following CAs, for each of the subclusters listed above: 1) copper binding, avg(CV) 0.5 ; cytosol, avg(CV) 1.0 cell cycle

, avg(CV) 1.0 ; fatty acid biosynthesis, avg(CV) 1.0 ; vacuole (sensu Fungi), avg(CV) 0.8 ; vacuole inheritance, avg(CV) 0.8 ; thiol-disulfide exchange intermediate, avg(CV) 0.5 ; plasma membrane, avg(CV) 1.0 ; tricarboxylic acid cycle, p.avg

, cytoplasm, avg(CV), vol.1, issue.0

, 3-beta-glucan synthase, avg(CV) 0, p.55

, long-chain-fatty-acid-CoA-ligase, avg(CV) 0.55 ; lipid metabolism, avg(CV) 0.75 ; lipid particle, avg(CV) 1

, 7) nuclear membrane, avg(CV) 1

, avg(CV) 0.95 ; folic acid and derivative biosynthesis, avg(CV) 0.95 ; pantothenate biosynthesis, avg(CV) 0.8 ; allantoin catabolism, avg(CV) 0.8 ; purine nucleotide biosynthesis, avg(CV) 0.95 ; helicase, avg(CV) 0.5 ; spore wall assembly, avg(CV) 0.8 ; RAB-protein geranylgeranyltransferase, avg(CV) 0.55 ; protein amino acid geranylgeranylation, avg(CV) 1.0 ; RAB-protein geranylgeranyltransferase complex, glyoxylate cycle, avg(CV) 1.0 ; peroxisomal matrix, p.avg

, response to stress, avg(CV) 0, p.75

, phosphatidate cytidylyltransferase, avg(CV) 1.0 ; phosphatidylserine metabolism, avg(CV) 1.0 ; mitochondrion, avg(CV) 1

B. Andreopoulos, A. An, and X. Wang, BILCOM: Bi-level Clustering of Mixed Categorical and Numerical Biological Data, 2005.
DOI : 10.1504/ijdmb.2006.009920

B. Andreopoulos, A. An, and X. Wang, MULIC: Multi-Layer Increasing Coherence Clustering of Categorical Data Sets, 2004.

B. Andreopoulos, A. An, and X. Wang, Significance Metrics for Clusters of Mixed Numerical and Categorical Yeast Data, 2003.

B. Adryan and R. Schuh, Gene-Ontology-based clustering of gene expression data, Bioinformatics, vol.20, issue.16, pp.2851-2852, 2004.
DOI : 10.1093/bioinformatics/bth289

A. Ben-dor, R. Shamir, and Z. Yakhini, Clustering Gene Expression Patterns, Journal of Computational Biology, vol.6, issue.3-4, pp.281-297, 1999.
DOI : 10.1089/106652799318274

M. P. Brown, W. N. Grundy, D. Lin, N. Cristianini, C. W. Sugnet et al., Knowledge-based analysis of microarray gene expression data by using support vector machines, Proceedings of the National Academy of Sciences, vol.18, issue.12, pp.262-267, 2000.
DOI : 10.1128/MCB.18.12.7278

V. Cherepinsky, J. Feng, M. Rejali, and B. Mishra, Shrinkage-based similarity metric for cluster analysis of microarray data, Proceedings of the National Academy of Sciences, vol.106, issue.6, pp.9668-9673, 2003.
DOI : 10.1016/S0092-8674(01)00494-9

S. Dwight, M. Harris, K. Dolinski, C. Ball, G. Binkley et al., Saccharomyces Genome Database (SGD) provides secondary gene annotation using the Gene Ontology (GO), Nucleic Acids Research, vol.30, issue.1, pp.69-72, 2002.
DOI : 10.1093/nar/30.1.69

M. B. Eisen and P. O. Brown, [12] DNA arrays for analysis of gene expression, Methods Enzymol, vol.303, pp.179-205, 1999.
DOI : 10.1016/S0076-6879(99)03014-1

M. Eisen, P. Spellman, P. Brown, and D. Botstein, Cluster analysis and display of genome-wide expression patterns, Proceedings of the National Academy of Sciences, vol.24, issue.2, pp.14863-14871, 1998.
DOI : 10.1016/0092-8674(81)90326-3

D. Fasulo, An Analysis of Recent Work on Clustering Algorithms, 1999.

, Creating the gene ontology resource: design and implementation, The Gene Ontology Consortium Genome Research, vol.11, pp.1425-1433, 2001.

M. Goebel and L. Gruenwald, A survey of data mining and knowledge discovery software tools, ACM SIGKDD Explorations Newsletter, vol.1, issue.1, pp.20-33, 1999.
DOI : 10.1145/846170.846172

T. R. Golub, Molecular Classification of Cancer: Class Discovery and Class Prediction by Gene Expression Monitoring, Science, vol.286, issue.5439, pp.531-537, 1999.
DOI : 10.1126/science.286.5439.531

J. Grambeier and A. Rudolph, Techniques of Cluster Algorithms in Data Mining, Data Mining and Knowledge Discovery, vol.6, pp.303-360, 2002.

S. Guha, R. Rastogi, and K. Shim, Rock: A robust clustering algorithm for categorical attributes, Information Systems, vol.25, issue.5, pp.345-366, 2000.
DOI : 10.1016/S0306-4379(00)00022-3
URL : http://www.cs.uiuc.edu/class/fa05/cs591han/papers/guha99.pdf

J. A. Hartigan, Classification and Clustering, Journal of Marketing Research, vol.18, issue.4, 1975.
DOI : 10.2307/3151350

Z. Huang, Extensions to the k-Means Algorithm for Clustering Large Data Sets with Categorical Values, Data Mining and Knowledge Discovery, vol.2, issue.3, pp.283-304, 1998.
DOI : 10.1023/A:1009769707641

Z. Huang, Clustering Large Data Sets with Mixed Numeric and Categorical Values. Knowledge discovery and data mining: techniques and applications, 1997.

P. W. Lord, R. D. Stevens, A. Brass, and C. A. Goble, Investigating semantic similarity measures across the Gene Ontology: the relationship between sequence and annotation, Bioinformatics, vol.19, issue.10, pp.1275-83, 2003.
DOI : 10.1093/bioinformatics/btg153
URL : https://academic.oup.com/bioinformatics/article-pdf/19/10/1275/449739/btg153.pdf

C. Pasquier, F. Girardot, K. Jevardat-de-fombelle, and C. R. , THEA: ontology-driven analysis of microarray data, Bioinformatics, vol.20, issue.16, pp.2636-2643, 2004.
DOI : 10.1093/bioinformatics/bth295
URL : https://hal.archives-ouvertes.fr/hal-00170450

D. K. Slonim, P. Tamayo, J. P. Mesirov, T. R. Golub, and E. S. Lander, Class prediction and discovery using gene expression data, Proceedings of the fourth annual international conference on Computational molecular biology , RECOMB '00, pp.263-272
DOI : 10.1145/332306.332564
URL : http://18.52.0.92/~slonim/STMGL00.ps.gz

P. T. Spellman, by Microarray Hybridization, Molecular Biology of the Cell, vol.133, issue.12, pp.3273-97, 1998.
DOI : 10.1083/jcb.133.1.99

J. Stutz and P. Cheeseman, Bayesian Classification(AutoClass): Theory and results Advances in Knowledge Discovery and Data Mining, pp.153-180, 1995.

L. F. Wu, T. R. Hughes, A. P. Davierwala, M. D. Robinson, R. Stoughton et al., Large-scale prediction of Saccharomyces cerevisiae gene function using overlapping transcriptional clusters, Nature Genetics, vol.25, issue.3, pp.255-265, 2002.
DOI : 10.1038/75556

M. Kerr and G. Churchill, Bootstrapping cluster analysis: Assessing the reliability of conclusions from microarray experiments, Proceedings of the National Academy of Sciences, vol.7, issue.18, pp.8961-8966, 1998.
DOI : 10.1073/pnas.97.18.9834
URL : http://www.pnas.org/content/98/16/8961.full.pdf

L. Mcshane, M. Radmacher, B. Freidlin, R. Yu, M. Li et al., Methods for assessing reproducibility of clustering patterns observed in analyses of microarray data, Bioinformatics, vol.18, issue.11, pp.1462-1471, 2002.
DOI : 10.1093/bioinformatics/18.11.1462

]. K. Bennett, A. Demiriz, and R. Maclin, Exploiting unlabeled data in ensemble methods, Proceedings of the eighth ACM SIGKDD international conference on Knowledge discovery and data mining , KDD '02, pp.289-296, 2002.
DOI : 10.1145/775047.775090
URL : http://www.cse.unsw.edu.au/~qzhang/papers/264.pdf

R. Chellappa and A. Jain, Markov Random Fields: Theory and Application, Academic Pr, 1993.

A. Deshpande, C. Guestrin, S. Madden, J. Hellerstein, and W. Hong, Model-Driven Data Acquisition in Sensor Networks, Proc. of the 30th Int'l Conf. on Very Large Data Bases (VLDB 04), 2004.
DOI : 10.1016/B978-012088469-8.50053-X
URL : http://www.cs.virginia.edu/~son/cs851/papers/vldb04.amol.pdf

S. Kirkpatrick, C. Gelatt, and M. Vecchi, Optimization by simulated annealing, In Science, vol.220, issue.4598, 1983.
DOI : 10.1016/b978-0-08-051581-6.50059-3
URL : http://www.cs.virginia.edu/cs432/documents/sa-1983.pdf

K. Murphy, Y. Weiss, and M. Jordan, Loopy belief propagation for approximate inference: an empiricial study, Proc. Uncertainty in AI, 1999.

J. Pearl, Probabilistic reasoning in intelligent systems: networks of plausible inference, 1988.

C. Peterson and J. Anderson, A mean-field theory learning algorithm for neural networks, In Complex Systems, vol.1, 1987.

G. Salton and M. Mcgill, Introduction to modern information retrieval, 1983.

R. Schultz and R. Stevenson, A Bayesian approach to image expansion for improved definition, IEEE Transactions on Image Processing, vol.3, issue.3, pp.233-242, 1994.
DOI : 10.1109/83.287017

Y. Yang, X. Wu, and X. Zhu, Dealing with Predictive-but-Unpredictable Attributes in Noisy Data Sources, Proc. of the 8th European Conf. on Principles and Practice of Knowledge Discovery in Databases (PKDD 04), 2004.
DOI : 10.1007/978-3-540-30116-5_43
URL : https://link.springer.com/content/pdf/10.1007%2F978-3-540-30116-5_43.pdf

J. Yedidia, W. Freeman, and Y. Weiss, Generalized belief propagation, Advances in Neural Information Processing Systems (NIPS), pp.689-695, 2000.

X. Zhu, X. Wu, and Q. Chen, Eliminating class noise in large datasets, Proc. of the 20th Int'l Conf. Machine Learning (ICML 03), 2003.

D. Buttler, M. Coleman, T. Critchlow, R. Fileto, W. Han et al., Querying multiple bioinformatics information sources, ACM SIGMOD Record, vol.31, issue.4, pp.59-64, 2002.
DOI : 10.1145/637411.637421
URL : http://www.cc.gatech.edu/~buttler/pubs/10.Buttler.pdf

A. Rudra and E. Yeo, Issues in user perceptions of data quality and satisfaction in using a data warehouse-an Australian experience, Proceedings of the 33rd Annual Hawaii International Conference on System Sciences, 2000.
DOI : 10.1109/HICSS.2000.926904

I. N. Chengular-smith, D. P. Ballou, and H. L. Pazer, The impact of data quality information on decision making: an exploratory analysis, IEEE Transactions on Knowledge and Data Engineering, vol.11, issue.6, pp.853-864, 1999.
DOI : 10.1109/69.824597

R. A. Dillard, Using data quality measures in decision-making algorithms, IEEE Expert, vol.7, issue.6, pp.63-72, 1992.
DOI : 10.1109/64.180410

F. Naumann, From databases to information systems information quality makes the difference, presented at the International Conference on Information Quality, 2001.

M. Gertz, M. T. Ozsu, G. Saake, and K. U. Sattler, Report on the Dagstuhl Seminar, ACM SIGMOD Record, vol.33, issue.1, pp.127-132, 2004.
DOI : 10.1145/974121.974144

, Figure 4: Network of agreeing data sources

T. Critchlow, L. Liu, D. Buttler, D. Rocco, and C. Pu, Towards Automatic Discovery and Identification of Bioinformatics Web Interfaces, 2003.

, Special Issue on Sensor Network Technology and Sensor Data Management, SIGMOD Record, vol.32, 2003.

F. Donovan, Army to deploy hand-held devices to make every soldier into a sensor, 2004.

F. S. Collins, E. D. Green, A. E. Guttmacher, and M. S. Guyer, A vision for the future of genomics research, Nature, vol.3, issue.6934, pp.835-847, 2003.
DOI : 10.1126/science.285.5432.1359

M. Mecella, M. Scannapieco, A. Virgillito, R. Baldoni, T. Catarci et al., Managing Data Quality in Cooperative Information Systems, Lecture Notes in Computer Science 2519, pp.486-502, 2002.
DOI : 10.1007/3-540-36124-3_28

M. Scannapieco, A. Virgillito, C. Marchetti, M. Mecella, and R. Baldoni, The DaQuinCIS architecture: a platform for exchanging and improving data quality in cooperative information systems, Information Systems, vol.29, issue.7, pp.551-582, 2004.
DOI : 10.1016/j.is.2003.12.004

L. D. Santis, M. Scannapieco, and T. Catarci, Trusting Data Quality in Cooperative Information Systems, 2003.
DOI : 10.1007/978-3-540-39964-3_23

J. Widom, Trio: a system for integrated management of data, accuracy, and lineage, 2005.

G. A. Mihaila, L. Raschid, and M. Vidal, Using quality of data metadata for source selection and ranking, Third International Workshop on the Web and Databases, 2000.

G. A. Mihaila, L. Raschid, and M. Vidal, Source selection and ranking in the websemantics architecture using quality of data metadata, Advances in Computers, vol.55, pp.87-118, 2002.
DOI : 10.1016/S0065-2458(01)80027-9

M. Gertz, Managing data quality and integrity in federated databases," presented at IFIP TC11 Working Group 11.5, Second Working Conference on Integrity and Internal Control in Information Systems: Bridging Business Requirements and Research ResultsCompleteness of integrated information sources, Information Systems, vol.29, pp.583-615, 1998.
DOI : 10.1007/978-0-387-35396-8_11
URL : https://link.springer.com/content/pdf/10.1007%2F978-0-387-35396-8_11.pdf

F. Naumann, Quality-Driven Query Answering for Integrated Information Systems, Lecture Notes in Computer Science, vol.2261, p.166, 2002.
DOI : 10.1007/3-540-45921-9

A. Motro and I. Rakov, Estimating the quality of databases, Conference on Information Quality, 1996.
DOI : 10.1007/BFb0056011
URL : http://www.isse.gmu.edu/~ami/research/papers/fqas98.ps

M. Bobrowski, M. Marre, and D. Yankelevich, A homogeneous framework to measure data quality, 1999.

B. Pernici and M. Scannapieco, Data quality in web information systems, 2002.
DOI : 10.1007/3-540-45816-6_37
URL : http://www.dis.uniroma1.it/~dq/docs/PS_DQWIS.pdf

Y. W. Lee, D. M. Strong, B. K. Kahn, and R. Y. Wang, AIMQ: a methodology for information quality assessment, Information & Management, vol.40, issue.2, pp.133-146, 2004.
DOI : 10.1016/S0378-7206(02)00043-5
URL : http://mitiq.mit.edu/documents/publications/TDQMpub/AIMQ.pdf

S. Brin and L. Page, The anatomy of a large-scale hypertextual Web search engine, presented at 7th World Wide Web Conference, 1998.
DOI : 10.1016/S0169-7552(98)00110-X
URL : http://www.his.se/upload/51108/google.pdf

J. Hicklin, C. Moler, P. Webb, R. F. Boisvert, B. Miller et al., JAMA: Java Matrix Package, 2005.

J. Cho and A. Ntoulas, Effective Change Detection Using Sampling, VLDB Conference, 2002.
DOI : 10.1016/B978-155860869-6/50052-4
URL : http://oak.cs.ucla.edu/~cho/papers/cho-sampling.pdf

L. Cholvy and C. Garion, Querying several conflicting databases, ECSQARU-03 Workshop Uncertainity, Incompleteness, Imprecision, and Conflict in Multiple Data Sources, 2003.
DOI : 10.1145/176567.176571
URL : http://oatao.univ-toulouse.fr/522/1/Garion_522.pdf

J. Framework and D. Team, JUNG: Java Universal Network/Graph Framework, 2005.

S. Staab, P. Domingos, P. Mika, J. Golbeck, L. Ding et al., Social Networks Applied, IEEE Intelligent Systems, vol.20, issue.1, pp.80-93, 2005.
DOI : 10.1109/MIS.2005.16

, Author Index

A. Al-lawati, , p.9

A. An, , p.7

B. Andreopoulos, , p.7

J. Bugajski, , p.0

A. Cardenas and F. ,

Z. Chen, , p.7

F. Chu, , p.9

S. Embury,

H. Garcia-molina,

R. Grossman and L. , , p.0

J. Hammer, , p.6

D. V. Kalashnikov and .. , , p.7

J. Kang, , p.9

A. Karakasidis, , p.8

V. Keelara, , p.7

A. Koeller, , p.7

D. Lee, , p.9

A. Martinez, , p.6

P. Mcdaniel, , p.9

S. Mehrotra, , p.7

P. Missier,

. On and .. Byung-won, , p.9

S. Park, , p.9

D. Parker and .. Stott, , p.9

E. Pitoura, , p.8

R. Pon and K. ,

E. Sumner, , p.0

Z. Tang, , p.0

P. Vassiliadis, , p.8

X. Wang, , p.7

Y. Wang, , p.9

W. Winkler and E. ,

C. Zaniolo, , p.9