M. Atzmueller, S. Oussena, and T. Roth-berghofe, Data Preparation for Big Data Analytics: Methods and Experiences, Enterprise Big Data Engineering, Analytics, and Management, pp.157-170, 2016.

S. Kandel, J. Heer, C. Plaisant, J. Kennedy, F. V. Ham et al., Research directions on data wrangling: Visualizations and transformations, Information Visualization, pp.271-288, 2011.
DOI : 10.1177/1473871611415994

URL : https://napier-surface.worktribe.com/202874/1/KandelEtAlDataWrangling.pdf

S. Krishnan, M. J. Franklin, K. Goldberg, and E. Wu, ActiveClean, Proceedings of the 2016 International Conference on Management of Data, SIGMOD '16, 2016.
DOI : 10.14778/1952376.1952378

W. Mckinney, pandas: a Foundational Python Library for DataAnalysis and Statistics, NEM (Networked & Electronic Media), 2011.

C. J. Jackson, V. Vijayakumar, A. M. Quadir, and C. Bharathi, Survey on Programming Models and Environments for Cluster, Cloud, and Grid Computing that Defends Big Data, Procedia Computer Science ( 2nd International Symposium on Big Data and Cloud Computing (ISBCC'15), pp.517-523, 2015.
DOI : 10.1016/j.procs.2015.04.025

URL : https://doi.org/10.1016/j.procs.2015.04.025

S. Kandel, A. Paepcke, J. Hellerstein, and J. Heer, Enterprise Data Analysis and Visualization: An Interview Study, IEEE Transactions on Visualization and Computer Graphics, vol.18, issue.12, pp.2917-2926, 2012.
DOI : 10.1109/TVCG.2012.219

URL : http://vis.stanford.edu/files/2012-EnterpriseAnalysisInterviews-VAST.pdf

D. Sukhobok, N. Nikolov, A. Pultier, X. Ye, A. Berre et al., Tabular Data Cleaning and Linked Data Generation with Grafterizer, ESWC, vol.59, issue.10, pp.134-139, 2016.
DOI : 10.1007/978-3-642-41338-4_11

D. Roman, N. Nikolov, A. Putlier, D. Sukhobok, B. Elvesaeter et al., DataGraft: One-stop-shop for open data management1, To appear in the Semantic Web Journal (SWJ) ? Interoperability, pp.1570-0844
DOI : 10.3233/SW-150191

D. Roman, M. Dimitrov, N. Nikolov, A. Putlier, B. Elvesaeter et al., DataGraft: A Platform for Open Data Publishing, the Joint Proceedings of the 4th International Workshop on Linked Media and the 3rd Developers Hackshop, 2016.
DOI : 10.1007/978-3-319-47602-5_21

J. Wang, D. Crawl, I. Altintas, K. Tzoumas, and V. Markl, Comparison of Distributed Data- Parallelization Patterns for Big Data Analysis: A Bioinformatics Case Study, Proceedings of the Fourth International Workshop on Data Intensive Computing in the Clouds (DataCloud), 2013.

J. Ekanayake, H. Li, B. Zhang, T. Gunarathne, S. Bae et al., Twister, Proceedings of the 19th ACM International Symposium on High Performance Distributed Computing, HPDC '10, 2010.
DOI : 10.1145/1851476.1851593

M. Bala, O. Boussaid, and Z. Alimazighi, Big-ETL: Extracting-Transforming-Loading Approach for Big Data, Proceedings of International Conference on Parallel and Distributed Processing Techniques and Applications, 2015.
DOI : 10.4018/ijdsst.2016100104

A. Krukowski, Y. Kompatsiaris, and S. Papadopoulos, Big and Open Data Position Paper, 2013.

T. Akidau, R. Bradshaw, C. Chambers, S. Chernyak, R. Lax et al., The dataflow model, Proceedings of the 41st International Conference on Very Large Data Bases Kohala Coast, pp.1792-1803, 2015.
DOI : 10.14778/2824032.2824076

M. Sims, J. F. Kurose, and V. R. Lesser, Streaming versus batch processing of sensor data in a hazardous weather detection system, 2005 Second Annual IEEE Communications Society Conference on Sensor and Ad Hoc Communications and Networks, 2005. IEEE SECON 2005., 2005.
DOI : 10.1109/SAHCN.2005.1557074

S. Shahrivari, Beyond Batch Processing: Towards Real-Time and Streaming Big Data, Computers, vol.3, issue.4, pp.117-129, 2014.
DOI : 10.1007/s11227-014-1185-y

URL : https://doi.org/10.3390/computers3040117

T. Furche, G. Gottlob, B. Neumayr, and E. Sallinger, Data wrangling for big data: Towards a lingua franca for data wrangling, 2016.

M. Armbrust, R. S. Xin, C. Lian, Y. Huai, D. Liu et al., Spark SQL, Proceedings of the 2015 ACM SIGMOD International Conference on Management of Data, SIGMOD '15, pp.1383-1394, 2015.
DOI : 10.1007/3-540-59451-5_2

M. Zaharia, M. Chowdhury, T. Das, A. Dave, J. Ma et al., Resilient Distributed Datasets, Proceedings of the 9th USENIX conference on Networked Systems Design and Implementation, pp.2-2, 2012.
DOI : 10.1145/2886107.2886110

D. Sukhobok, N. Nikolov, and D. Roman, Tabular Data Anomaly Patterns, 3rd International Conference on Big Data Innovations and Applications. Innovate-Data 2017, 2017.

S. Riazi, SparkGalaxy: Workflow-based Big Data Processing, 2016.

H. Wang, M. Li, Y. Bu, J. Li, H. Gao et al., Cleanix, ACM SIGMOD Record, vol.44, issue.4, pp.35-40, 2016.
DOI : 10.1007/978-3-642-14246-8_68

M. Kaur and G. Dhaliwal, Performance Comparison of Map Reduce and Apache Spark on, International Journal of Computer Sciences and Engineering, vol.3, issue.11, 2015.