J. Abramatic, R. D. Cosmo, and S. Zacchiroli, Building the Universal Archive of Source Code, Commun. ACM, vol.61, pp.29-31, 2018.
URL : https://hal.archives-ouvertes.fr/hal-02157125

M. Biazzini and B. Baudry, May the fork be with you: novel metrics to analyze collaboration on GitHub, Proceedings of the 5th International Workshop on Emerging Trends in Software Metrics. ACM, pp.37-43, 2014.
URL : https://hal.archives-ouvertes.fr/hal-01085400

P. Boldi, A. Pietri, S. Vigna, and S. Zacchiroli, Ultra-Large-Scale Repository Analysis via Graph Compression, SANER 2020: The 27th IEEE International Conference on Software Analysis, Evolution and Reengineering, 2020.

P. Boldi and S. Vigna, The webgraph framework I: compression techniques, Proceedings of the 13th international conference on World Wide Web, pp.595-602, 2004.

P. Boldi and S. Vigna, The WebGraph Framework II: Codes For The World-Wide Web, Data Compression Conference, vol.528, pp.23-25, 2004.

L. Dabbish, C. Stuart, J. Tsay, and J. Herbsleb, Leveraging transparency, IEEE software, vol.30, pp.37-43, 2012.

L. Dabbish, C. Stuart, J. Tsay, and J. Herbsleb, Social coding in GitHub: transparency and collaboration in an open software repository, Proceedings of the ACM 2012 conference on computer supported cooperative work, pp.1277-1286, 2012.

R. Di-cosmo, M. Gruenpeter, and S. Zacchiroli, Identifiers for Digital Objects: the Case of Software Source Code Preservation, Proceedings of the 15th International Conference on Digital Preservation, 2018.
URL : https://hal.archives-ouvertes.fr/hal-01865790

R. Di, C. , and S. Zacchiroli, Software Heritage: Why and How to Preserve Software Source Code, Proceedings of the 14th International Conference on Digital Preservation, 2017.
URL : https://hal.archives-ouvertes.fr/hal-01590958

K. Fogel, Producing open source software: How to run a successful free software project, 2005.

G. Gousios, M. Pinzger, and A. Van-deursen, An exploratory study of the pull-based software development model, Proceedings of the 36th International Conference on Software Engineering, pp.345-355, 2014.

G. Gousios and D. Spinellis, GHTorrent: Github's data from a firehose, 9th IEEE Working Conference of Mining Software Repositories, pp.12-21, 2012.

G. Gousios, A. Zaidman, M. Storey, and A. Van-deursen, Work practices and challenges in pull-based development: the integrator's perspective, Proceedings of the 37th International Conference on Software Engineering, vol.1, pp.358-368, 2015.

, Open Source Systems: Long-Term Sustainability -8th IFIP WG 2.13 International Conference, vol.378, 2012.

J. Jiang, D. Lo, J. He, X. Xia, P. Singh-kochhar et al., Why and how developers fork what from whom in GitHub, Empirical Software Engineering, vol.22, pp.547-578, 2017.

E. Kalliamvakou, G. Gousios, K. Blincoe, L. Singer, M. Daniel et al., The promises and perils of mining GitHub, Proceedings of the 11th working conference on mining software repositories, pp.92-101, 2014.

A. Lima, L. Rossi, and M. Musolesi, Coding together at scale: GitHub as a collaborative social network, Eighth International AAAI Conference on Weblogs and Social Media, 2014.

Y. Ma, C. Bogart, S. Amreen, R. Zaretzki, and A. Mockus, World of code: an infrastructure for mining the universe of open source VCS data, Proceedings of the 16th International Conference on Mining Software Repositories, pp.143-154, 2019.

R. C. Merkle, A Digital Signature Based on a Conventional Encryption Function, Advances in Cryptology -CRYPTO '87, A Conference on the Theory and Applications of Cryptographic Techniques, vol.293, pp.369-378, 1987.

L. N. , .. Riehle, J. M. González-barahona, G. Robles, and K. M. , Hackers on Forking, Proceedings of The International Symposium on Open Collaboration, 2014.

I. Möslein, U. Schieferdecker, and . Cress, ACM, vol.6, pp.1-6

L. Nyman and M. Laakso, Notes on the History of Fork and Join, IEEE Annals of the History of Computing, vol.38, pp.84-87, 2016.

L. Nyman and T. Mikkonen, To Fork or Not to Fork: Fork Motivations in SourceForge Projects, IJOSSP, vol.3, pp.1-9, 2011.
URL : https://hal.archives-ouvertes.fr/hal-01570771

L. Nyman, T. Mikkonen, J. Lindman, and M. Fougère, Perspectives on Code Forking and Sustainability in Open Source Software, pp.274-279, 2012.
URL : https://hal.archives-ouvertes.fr/hal-01519038

S. Rohan-padhye, V. Mani, and . Sinha, A study of external community contribution to open-source projects on GitHub, Proceedings of the 11th Working Conference on Mining Software Repositories, pp.332-335, 2014.

A. Pietri, D. Spinellis, and S. Zacchiroli, The Software Heritage graph dataset: public software development under one roof, Proceedings of the 16th International Conference on Mining Software Repositories, MSR 2019, pp.138-142, 2019.

A. Rastogi and N. Nagappan, Forking and the Sustainability of the Developer Community Participation-An Empirical Investigation on Outcomes and Reasons, 2016 IEEE 23rd International Conference on Software Analysis, Evolution, and Reengineering (SANER), vol.1, pp.102-111, 2016.

R. Dhavleesh-rattan, M. Bhatia, and . Singh, Software clone detection: A systematic review, Information and Software Technology, vol.55, pp.1165-1199, 2013.

G. Robles, M. Jesús, and . González-barahona, A Comprehensive Study of Software Forks: Dates, Reasons and Outcomes, pp.1-14, 2012.
URL : https://hal.archives-ouvertes.fr/hal-01519079

G. Rousseau, R. D. Cosmo, and S. Zacchiroli, Growth and Duplication of Public Source Code over Time: Provenance Tracking at Scale, 2019.
URL : https://hal.archives-ouvertes.fr/hal-02158292

C. Kumar, R. , and J. Cordy, A survey on software clone detection research, 2007.

D. Spinellis, Version control systems, IEEE Software, vol.22, pp.108-109, 2005.

S. Stanciulescu, S. Schulze, and A. Wasowski, Forked and integrated variants in an open-source firmware project, 2015 IEEE International Conference on Software Maintenance and Evolution (ICSME), pp.151-160, 2015.

F. Thung, F. Tegawende, D. Bissyande, L. Lo, and . Jiang, Network structure of social coding in GitHub, 2013 17th European Conference on Software Maintenance and Reengineering, pp.323-326, 2013.
URL : https://hal.archives-ouvertes.fr/hal-00790772

J. Tsay, L. Dabbish, and J. Herbsleb, Influence of social and technical factors for evaluating contribution in GitHub, Proceedings of the 36th international conference on Software engineering, pp.356-366, 2014.

Y. Yu, H. Wang, V. Filkov, P. Devanbu, and B. Vasilescu, Wait for it: determinants of pull request evaluation latency on GitHub, 2015 IEEE/ACM 12th Working Conference on Mining Software Repositories, pp.367-371, 2015.

S. Zhou, B. Vasilescu, and C. Kästner, What the fork: a study of inefficient and efficient forking practices in social coding, Proceedings of the 2019 27th ACM Joint Meeting on European Software Engineering Conference and Symposium on the Foundations of Software Engineering, pp.350-361, 2019.