Data Preparation for Big Data Analytics: Methods and Experiences, Enterprise Big Data Engineering, Analytics, and Management, pp.157-170, 2016. ,
Research directions on data wrangling: Visualizations and transformations, Information Visualization, pp.271-288, 2011. ,
DOI : 10.1177/1473871611415994
URL : https://napier-surface.worktribe.com/202874/1/KandelEtAlDataWrangling.pdf
ActiveClean, Proceedings of the 2016 International Conference on Management of Data, SIGMOD '16, 2016. ,
DOI : 10.14778/1952376.1952378
pandas: a Foundational Python Library for DataAnalysis and Statistics, NEM (Networked & Electronic Media), 2011. ,
Survey on Programming Models and Environments for Cluster, Cloud, and Grid Computing that Defends Big Data, Procedia Computer Science ( 2nd International Symposium on Big Data and Cloud Computing (ISBCC'15), pp.517-523, 2015. ,
DOI : 10.1016/j.procs.2015.04.025
URL : https://doi.org/10.1016/j.procs.2015.04.025
Enterprise Data Analysis and Visualization: An Interview Study, IEEE Transactions on Visualization and Computer Graphics, vol.18, issue.12, pp.2917-2926, 2012. ,
DOI : 10.1109/TVCG.2012.219
URL : http://vis.stanford.edu/files/2012-EnterpriseAnalysisInterviews-VAST.pdf
Tabular Data Cleaning and Linked Data Generation with Grafterizer, ESWC, vol.59, issue.10, pp.134-139, 2016. ,
DOI : 10.1007/978-3-642-41338-4_11
DataGraft: One-stop-shop for open data management1, To appear in the Semantic Web Journal (SWJ) ? Interoperability, pp.1570-0844 ,
DOI : 10.3233/SW-150191
DataGraft: A Platform for Open Data Publishing, the Joint Proceedings of the 4th International Workshop on Linked Media and the 3rd Developers Hackshop, 2016. ,
DOI : 10.1007/978-3-319-47602-5_21
Comparison of Distributed Data- Parallelization Patterns for Big Data Analysis: A Bioinformatics Case Study, Proceedings of the Fourth International Workshop on Data Intensive Computing in the Clouds (DataCloud), 2013. ,
Twister, Proceedings of the 19th ACM International Symposium on High Performance Distributed Computing, HPDC '10, 2010. ,
DOI : 10.1145/1851476.1851593
Big-ETL: Extracting-Transforming-Loading Approach for Big Data, Proceedings of International Conference on Parallel and Distributed Processing Techniques and Applications, 2015. ,
DOI : 10.4018/ijdsst.2016100104
Big and Open Data Position Paper, 2013. ,
The dataflow model, Proceedings of the 41st International Conference on Very Large Data Bases Kohala Coast, pp.1792-1803, 2015. ,
DOI : 10.14778/2824032.2824076
Streaming versus batch processing of sensor data in a hazardous weather detection system, 2005 Second Annual IEEE Communications Society Conference on Sensor and Ad Hoc Communications and Networks, 2005. IEEE SECON 2005., 2005. ,
DOI : 10.1109/SAHCN.2005.1557074
Beyond Batch Processing: Towards Real-Time and Streaming Big Data, Computers, vol.3, issue.4, pp.117-129, 2014. ,
DOI : 10.1007/s11227-014-1185-y
URL : https://doi.org/10.3390/computers3040117
Data wrangling for big data: Towards a lingua franca for data wrangling, 2016. ,
Spark SQL, Proceedings of the 2015 ACM SIGMOD International Conference on Management of Data, SIGMOD '15, pp.1383-1394, 2015. ,
DOI : 10.1007/3-540-59451-5_2
Resilient Distributed Datasets, Proceedings of the 9th USENIX conference on Networked Systems Design and Implementation, pp.2-2, 2012. ,
DOI : 10.1145/2886107.2886110
Tabular Data Anomaly Patterns, 3rd International Conference on Big Data Innovations and Applications. Innovate-Data 2017, 2017. ,
SparkGalaxy: Workflow-based Big Data Processing, 2016. ,
Cleanix, ACM SIGMOD Record, vol.44, issue.4, pp.35-40, 2016. ,
DOI : 10.1007/978-3-642-14246-8_68
Performance Comparison of Map Reduce and Apache Spark on, International Journal of Computer Sciences and Engineering, vol.3, issue.11, 2015. ,