A. Alexandrov, A. Katsifodimos, G. Krastev, and V. Markl, Implicit parallelism through deep language embedding, SIGMOD Record, vol.45, issue.1, pp.51-58, 2016.

A. Alexandrov, G. Krastev, and V. Markl, Representations and optimizations for embedded parallel dataflow languages, ACM Trans. Database Syst, vol.44, issue.1, 2019.

P. Buneman, L. Libkin, D. Suciu, V. Tannen, and L. Wong, Comprehension syntax, SIGMOD Rec, vol.23, issue.1, pp.87-96, 1994.

P. Carbone, A. Katsifodimos, S. Ewen, V. Markl, S. Haridi et al., Apache flink TM : Stream and batch processing in a single engine, IEEE Data Eng. Bull, vol.38, issue.4, pp.28-38, 2015.

S. Chlyah, N. Gesbert, P. Genevès, and N. Layaïda, On the Optimization of Iterative Programmingwith Distributed Data Collections, 2020.

J. Dean and S. Ghemawat, Mapreduce: Simplified data processing on large clusters, 6th Symposium on Operating System Design and Implementation (OSDI 2004), pp.137-150, 2004.

L. Fegaras, An algebra for distributed big data analytics, Journal of Functional Programming, vol.27, p.27, 2017.

L. Fegaras and D. Maier, Optimizing object queries using an effective calculus, ACM Trans. Database Syst, vol.25, issue.4, pp.457-516, 2000.

L. Fegaras and M. H. Noor, Compile-time code generation for embedded dataintensive query languages, 2018 IEEE International Congress on Big Data, pp.1-8, 2018.

J. Gibbons, In: A List of Successes That Can Change the World -Essays Dedicated to Philip Wadler on the Occasion of His 60th Birthday, Lecture Notes in Computer Science, vol.9600, pp.132-151, 2016.

T. Grust, M. Mayr, J. Rittinger, and T. Schreiber, Ferry: Database-supported program execution, Proceedings of the 2009 ACM SIGMOD International Conference on Management of Data, pp.1063-1066, 2009.

M. P. For-informatics and T. P. University, YAGO: A high-quality knowledge base, 2019.

L. Jachiet, P. Genevès, N. Gesbert, and N. Layaïda, On the optimization of recursive relational queries: Application to graph queries, Proceedings of the ACM SIGMOD International Conference on Management of Data, 2020.
URL : https://hal.archives-ouvertes.fr/hal-01673025

J. Leskovec, Snap: Stanford large network dataset collection, 2019.

V. Markl, Emma is a quotation-based scala dsl for scalable data analysis, 2019.

E. Meijer, B. Beckman, and G. M. Bierman, LINQ: reconciling object, relations and XML in the .net framework, Proceedings of the ACM SIGMOD International Conference on Management of Data, p.706, 2006.

M. Odersky, S. Micheloud, N. Mihaylov, M. Schinz, E. Stenman et al., An overview of the scala programming language, Tech. rep, 2004.

P. Jones and S. L. , The Implementation of Functional Programming Languages (Prentice-Hall International Series in Computer Science), 1987.

A. Shkapsky, M. Yang, M. Interlandi, H. Chiu, T. Condie et al., Big data analytics with datalog queries on spark, Proceedings of the 2016 International Conference on Management of Data, SIGMOD Conference, pp.1135-1149, 2016.

V. Tannen, P. Buneman, and S. A. Naqvi, Structural recursion as a query language, Database Programming Languages: Bulk Types and Persistent Data. 3rd International Workshop, pp.9-19, 1991.

V. Tannen, P. Buneman, and A. Ohori, Data structures and data types for objectoriented databases, IEEE Data Eng. Bull, vol.14, issue.2, pp.23-27, 1991.

D. A. Turner, Recursion Equations as a Programming Language, pp.459-478, 2016.

P. Wadler, Comprehending monads, Mathematical Structures in Computer Science, vol.2, issue.4, pp.461-493, 1992.

M. Zaharia, R. S. Xin, P. Wendell, T. Das, M. Armbrust et al., Apache spark: a unified engine for big data processing, Commun. ACM, vol.59, issue.11, pp.56-65, 2016.

C. Zaniolo, M. Yang, A. Das, A. Shkapsky, T. Condie et al., Fixpoint semantics and optimization of recursive datalog programs with aggregates. Theory Pract, Log. Program, vol.17, issue.5-6, pp.1048-1065, 2017.

, A: R f ? R ? µ(R f , ?) ? µ(R, ?) (because µ(R, ?) = µ(R f R , ?) = µ(R f , ?) ? µ(R , ?)

. A-?-?(r-f, Let s ? A, s ? µ(R, ?) and c(? a (s)) = true So ?r ? R ?n ? N ? a (s) = ? a (r) (*) and s ? ? (n) ({r}) So c(? a (r)) = c(? a (s)) = true So r ? R f , which means distinct(? (n) ({r})) ? µ(R f , ?) (see fixpoint related proofs in Appendix C.1)

O. Repeat(x, count) = (R,empty,d,c) STEP (setDiff((SELECT DISTINCT (x, t) FROM (x,y) <-X, (z,t) <-R WHERE y == z), O), setUnion(O,X)

X. , O. , ). , and ). Step-(, SELECT DISTINCT (x, t,l+m) FROM (x,y,l) <-X, (z,t,m) <-R WHERE y == z).coalesce(p).cache(), setUnion(O,X), count, O.count()) UNTIL count == oldCount, UNTIL count == oldCount Shortest paths: REPEAT

). From-(x,y, City(n1,l1), cd) <-routes WHERE y == n1).coalesce(p).cache(), setUnion(O,X)

O. Repeat(x, count) = (startMovieNames,empty,d,c) STEP (setDiff((SELECT DISTINCT m FROM (lm,e) <-( SELECT (lu, x) from User(u,lu) <-userslocal, x <-X WHERE lu.map(_.name).contains(x)), Movie(m) <-lm ),O)