, Outlier Mining
, Handling Missing and Duplicate Data References Books BATINI, Carlo, SCANNAPIECO, Monica. Data Quality Concepts, Methodologies and Techniques. Data-Centric Systems and Applications, Change Detection, vol.3, 2006.
Outliers in statistical data, 1994. ,
Exploratory Data Mining and Data Cleaning, 2003. ,
Identification of Outliers, 1980. ,
, Data Quality and Record Linkage Techniques, 2007.
The Data Warehouse ETL Toolkit, 2004. ,
Driven Query Answering for Integrated Information Systems, Lecture Notes in Computer Science, vol.2261, 2002. ,
Exploratory Data Analysis, 1977. ,
, Data Quality. Advances in Database Systems, vol.23, 2002.
Anomaly Detection A Survey, ACM Computing Surveys, 2009. ,
Duplicate Record Detection A Survey, IEEE Transations on knowledge and Data Engineering (TKDE), vol.19, issue.1, pp.1-16, 2007. ,
Quantitative Data Cleaning for Large Databases White paper, United Nations Economic Commission for Europe Gonzalo. A Guided Tour to Approximate String Matching. ACM Comput. Surv, vol.33, issue.1, pp.31-88, 2001. ,
Overview of Record Linkage and Current Research Directions A Survey of Data Quality Issues in Cooperative Systems, Tech. Rep. of U.S. Census Bureau, 2004. ,
Record Linkage Similarity Measures and Algorithms, 2006. ,
Anomaly Detection A Tutorial, Tutorial SIAM Conf. on Data Mining, 2008. ,
Outlier Detection Techniques. Tutorial, PAKDD Telcordia's Database Reconciliation and Data Quality Analysis Tool, Proc. VLDB, pp.615-618, 2000. ,
Mining Database Structure; Or, How to Build a Data Quality Browser, Proc. SIGMOD, 2002. ,
Data Quality Mining -Making a Virtue of Necessity, Proc. Workshop DMKD, 2001. ,
Systematic Development of Data Mining-Based Data Quality Tools, Proc. VLDB 2003, pp.548-559, 2003. ,
Data Preparation and Screening, Chapter 3, Principles and Practice of Structural Equation Modeling, pp.45-62, 2005. ,
STATNOTES Topics in Multivariate Analysis Retrieved 10 Automatic Data Fusion with HumMer, Proc. VLDB. A Primitive Operator for Similarity Joins in Data Cleaning. Proc. ICDE, 1254. ,
Febrl an open source data cleaning, deduplication and record linkage system with a graphical user interface, pp.1065-1068, 2008. ,
Probabilistic name and address cleaning and standardization, Proc. Australasian Data Mining Workshop- Augustin. Declarative Data Cleaning Language, Model, and Algorithms, Proc. VLDB Conf, pp.371-380, 2001. ,
Real-World Data is Dirty Data Cleansing and the Merge/Purge Problem, Data Mining and Knowledge Discovery, vol.2, issue.1, pp.9-37, 1998. ,
Data Cleaning Problems and Current Approaches, Data Engineering Bulletin, vol.23, issue.4, pp.3-13, 2000. ,
Potter's Wheel: An Interactive Data Cleaning System, Proc. VLDB, pp.381-390, 2001. ,
ARKTOS A Tool For Data Cleaning and Transformation in Data Warehouse Environments, Bulletin of the Technical Committee on Data Engineering, vol.23, issue.4, pp.42-47, 2000. ,
Towards a Benchmark for ETL Workflows, Proc. QDB, pp.49-60, 2007. ,
XClean in Action (Demo), pp.259-262, 2007. ,
, References Record Linkage and duplicate detection
Eliminating Fuzzy Duplicates in Data Warehouses, Proc. of VLDB, pp.586-597, 2002. ,
Correlation clustering, Machine Learning, pp.89-113, 2004. ,
A Comparison of Fast Blocking Methods for Record Linkage, Proc. of the KDD'03 Workshop on Data Cleaning, Record Linkage and Object Consolidation, pp.27-29, 2003. ,
Iterative record linkage for cleaning and integration, Proceedings of the 9th ACM SIGMOD workshop on Research issues in data mining and knowledge discovery , DMKD '04, pp.11-18, 2004. ,
DOI : 10.1145/1008694.1008697
Collective entity resolution in relational data, ACM Transactions on Knowledge Discovery from Data, vol.1, issue.1, 2007. ,
DOI : 10.1145/1217299.1217304
Adaptive duplicate detection using learnable string similarity measures, Proceedings of the ninth ACM SIGKDD international conference on Knowledge discovery and data mining , KDD '03, pp.39-48, 2003. ,
DOI : 10.1145/956750.956759
Adaptive Product Normalization: Using Online Learning for Record Linkage in Comparison Shopping, Fifth IEEE International Conference on Data Mining (ICDM'05), pp.58-65, 2005. ,
DOI : 10.1109/ICDM.2005.18
Automatic Record Linkage using Seeded Nearest Neighbour and Support Vector Machine Classification, ACM SIGKDD Conf, 2008. ,
, Proc. of the 18th International Conf. on Data Engineering, pp.17-28, 2002.
Duplicate Record Detection A Survey, IEEE Trans. Know. Data Eng, vol.19, issue.1, pp.1-16, 2007. ,
A Theory for Record Linkage, Journal of the American Statistical Association, vol.64, pp.1183-1210, 1969. ,
, References Record Linkage and duplicate detection
Using q-grams in a DBMS for Approximate String Processing, IEEE Data Eng. Bull, vol.24, issue.4, pp.28-34, 2001. ,
Text joins for data cleansing and integration in an RDBMS, Proceedings 19th International Conference on Data Engineering (Cat. No.03CH37405), pp.729-731, 2003. ,
DOI : 10.1109/ICDE.2003.1260850
The Merge/Purge Problem for Large Databases, Proc. SIGMOD Conf pg 127-135, 1995. ,
A Knowledge-Based Approach for Duplicate Elimination in Data Cleaning, Inf. Syst, vol.26, issue.8, pp.585-606, 2001. ,
Interactive Entity Resolution in Relational Data: A Visual Analytic Tool and Its Evaluation, IEEE Transactions on Visualization and Computer Graphics, vol.14, issue.5, pp.999-1014, 2008. ,
DOI : 10.1109/TVCG.2008.55
Efficient clustering of high-dimensional data sets with application to reference matching, Proceedings of the sixth ACM SIGKDD international conference on Knowledge discovery and data mining , KDD '00, pp.169-178, 2000. ,
DOI : 10.1145/347090.347123
Matching Algorithms within a Duplicate Detection System, IEEE Data Eng. Bull, vol.23, issue.4, pp.14-20, 2000. ,
Learning domain-independent string transformation weights for high accuracy object identification, Proceedings of the eighth ACM SIGKDD international conference on Knowledge discovery and data mining , KDD '02, pp.350-359, 2002. ,
DOI : 10.1145/775047.775099
A Duplicate Detection Benchmark for XML (and Relational) Data, Proc. ACM SIGMOD 2006 Workshop on Information Quality in Information Systems, 2006. ,
Methods for Evaluating and Creating Data Quality, Inf. Syst, vol.29, issue.7, pp.531-550, 2004. ,
An Application of the Fellegi-Sunter Model of Record Linkage to the 1990 U.S. Decennial Census, U.S. Bureau of the Census, 1991. ,
Anastasios Conditional Functional Dependencies for Data Cleaning, Proc. ICDE, pp.746-755, 2007. ,
Extending Dependencies with Conditions, Proc. VLDB, pp.243-254, 2007. ,
Mining constraint violations, ACM Trans. Database Syst, vol.32, issue.1, p.6, 2007. ,
Fast Identication of Relational Constraint Violations, Proc. ICDE, 2007. ,
Anastasios Conditional functional dependencies for capturing data inconsistencies, TODS, issue.2, p.33, 2008. ,
Xibei Semandaq A Data Quality System Based on Conditional Functional Dependencies, p.8, 2008. ,
Discovering Conditional Functional Dependencies, Proc. ICDE, pp.1231-1234, 2009. ,
On generating near-optimal tableaux for conditional functional dependencies, PVLDB, vol.1, issue.1, pp.376-390, 2008. ,
Yunyue Checks and Balances Monitoring Data Quality Problems in Network Traffic Databases, Proc. VLDB 2003, pp.536-547 ,
A framework for diagnosing changes in evolving data streams, Proc. ACM SIGMOD, 2003. ,
Change (Detection) you can believe in Finding Distributional Shifts in Data streams, Proc. IDA'09, 2009. ,
An information-theoretic approach to detecting changes in multi-dimensional data streams Statistical change detection for multidimensional data, Proc. Interface'06 Proc. ACM SIGKDD'07, pp.667-676, 2006. ,
, References Outlier Detection, issue.12
Detecting anomalies in cross-classified streams a Bayesian approach, Know. Inf. Syst, vol.11, issue.1, pp.29-44, 2006. ,
Fast Outlier Detection in High Dimensional Spaces, Proc. Conf. on Principles of Data Mining and Knowledge Discovery, pp.15-26, 2002. ,
Mining distance-based outliers in near linear time with randomization and a simple pruning rule, Proc. KDD, 2003. ,
LOF Identifying Density-Based Local Outliers, Proc. of the 2000 ACM SIGMOD International Conf. on Management of Data, pp.93-104, 2000. ,
Outlier detection with the kernelized spatial depth function, IEEE Transactions on Pattern Analysis and Machine Intelligence, 2008. ,
Finding frequent items in data streams, Proc. VLDB, 2008. ,
Anomaly detection over noisy data using learned probability distributions, Proc. ICML, pp.255-262, 2000. ,
Outlier detection in high dimensions, Computational Statistics and Data Analysis, vol.52, pp.1694-1711, 2008. ,
Outlier detection in multivariate time series by projection pursuit, Journal of American Statistical Association, vol.101, issue.474, pp.654-669, 2006. ,
Odabk: An effective approach to detecting outlier in data stream Discovering cluster-based local outliers, Proc. Intl. Conf. on Mach. Learn. and Cybernetics, pp.1036-10419, 2003. ,
Outlier detection for skewed data, Journal of Chemometrics, vol.22, pp.235-246, 2007. ,
GLOF a new approach for mining local outlier, Mining Top-n Local Outliers in Large Databases. Proc. KDD, pp.157-162, 2001. ,
Detecting changes in data streams, Proc. VLDB 2004, pp.180-191, 2004. ,
, Outlier Detection, issue.22
Algorithms for Mining Distance-Based Outliers in Large Datasets, Proc. VLDB, pp.392-403, 1998. ,
Ddma-charts: Nonparametric multivariate moving average control charts based on data depth Advances in Statistical Analysis, pp.235-258, 2004. ,
Angle-Based Outlier Detection, Proc. ACM SIGKDD, 2008. ,
Robust estimates of location and dispersion for highdimensional data sets, Technometrics, vol.44, issue.4, pp.307-317, 2002. ,
LOCI: Fast outlier detection using the local correlation integral, Tech. Rep. Intel Research Lab, 2002. ,
Multivariate outlier detection and robust covariance matrix estimation, Technometrics, vol.43, issue.3, pp.286-310, 2001. ,
Efficient algorithms for mining outliers from large data sets, Proc. ACM SIGMOD, pp.427-438, 2000. ,
A fast algorithm for the minimum covariance determinant estimator, Technometrics, vol.41, issue.3, pp.212-223, 1999. ,
Unmasking Multivariate Outliers and Leverage Points, Journal of the American Statistical Association, vol.85, pp.633-639, 1990. ,
Kernel methods for pattern analysis, 2005. ,
A novel anomaly detection scheme based on principal component classifier, Proc. ICDM 20003, pp.353-365, 2003. ,
Continuous adaptive outlier detection on distributed data streams, HPCC, LNCS 4782, pp.74-85, 2007. ,
Online outlier detection in sensor data using non-parametric models, Proc. VLDB, pp.187-198, 2006. ,
Enhancing Effectiveness of Outlier Detections for Low Density Patterns, Proc. PAKDD 2002. LNAI 2336, 2002. ,
Spot: A system for detecting projected outliers from high-dimensional data streams, Proc. ICDE, pp.1628-1631, 2008. ,
The treatment of missing values and its effect in the classifier accuracy. Classification, Clustering and Data Mining Applications, pp.639-648, 2004. ,
An analysis of four missing data treatment methods for supervised learning, Applied Artificial Intelligence, vol.17, pp.519-533, 2003. ,
Maximum Likelihood from Incomplete Data via the EM Algorithm, Journal of the Royal Statistical Society, vol.39, pp.1-38, 1977. ,
Relative Information Completeness Impact of imputation of missing values on classification error for discrete data, Pattern Recognition, vol.41, issue.09, pp.3692-3705, 2008. ,
A SVM regression based approach to filling in missing values, Knowledge-Based Intelligent Information and Engineering Systems (KES05). LNCS 3683, pp.581-587, 2005. ,
Cleaning Disguised Missing Data A Heuristic Approach, Proc. KDD, 2007. ,
Towards Missing Data Imputation: A Study of Fuzzy K-means Clustering Method. Rough Sets and Current Trends in Computing, LNCS, vol.3066, 2004. ,
Statistical Analysis with Missing Data, Population (French Edition), vol.43, issue.6, 1987. ,
DOI : 10.2307/1533221
Missing Data A Gentle Introduction, 2007. ,
The problem of disguised missing data, SIGKDD Explorations, vol.8, issue.1, pp.83-92, 2006. ,
Analysis of Incomplete Multivariate Data, 1997. ,
Different approaches to fuzzy clustering of incomplete datasets, International Journal of Approximate Reasoning, vol.35, 2003. ,
Using association rules for completing missing data References Missing Values Allison, Proc. Hybrid Intelligent Systems Missing Data: Series: Quantitative Applications in the Social Sciences. Thousand Oaks, pp.236-241, 2002. ,
Multiple imputation for missing data: concepts and new development, Proceedings of the Twenty-fifth Annual SAS Users Group International Conference. SAS Institute, 2000. ,
Multiple Imputation for Missing Data, Sociological Methods & Research, vol.87, issue.3, pp.301-309, 2000. ,
DOI : 10.1080/01621459.1986.10478280