Tradeoff Exploration between Reliability, Power Consumption, and Execution Time, Proceedings of Computer Safety, Reliability and Security Conference (SAFECOMP), 2011. ,
DOI : 10.1109/24.24570
URL : https://hal.archives-ouvertes.fr/hal-00655478
Source code and data for tri-criteria scheduling, " http://gaupy.org/ tri-criteria-scheduling ,
Complexity and Approximation, 1999. ,
DOI : 10.1007/978-3-642-58412-1
URL : https://hal.archives-ouvertes.fr/hal-00906941
Energy-aware partitioning for multiprocessor real-time systems, Proceedings International Parallel and Distributed Processing Symposium, pp.113-121, 2003. ,
DOI : 10.1109/IPDPS.2003.1213225
Faulttolerant platforms for automotive safety-critical applications Architectures and Synthesis for Embedded Systems, Proc. of Int. Conf. on Compilers, pp.170-177, 2003. ,
Speed scaling to manage energy and temperature, Journal of the ACM, vol.54, issue.1, pp.1-39, 2007. ,
DOI : 10.1145/1206035.1206038
URL : http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.550.7426
An asynchronous power aware and adaptive NoC based circuit, Proceedings of the Symposium on VLSI Circuits, pp.190-191, 2008. ,
A Localized Power Control mixing hopping and Super Cut-Off techniques within a GALS NoC, 2008 IEEE International Conference on Integrated Circuit Design and Technology and Tutorial ,
DOI : 10.1109/ICICDT.2008.4567241
Silent error detection in numerical time-stepping schemes, International Journal of High Performance Computing Applications, vol.29, issue.4, 1312. ,
DOI : 10.1177/1094342014532297
Exascale computing study: Technology challenges in achieving exascale systems, Defense Advanced Research Projects Agency Information Processing Techniques Office (DARPA IPTO), Tech. Rep, 2008. ,
Scheduling Divisible Loads in Parallel and Distributed Systems, 1996. ,
Petascale computing: Impact on future nasa missions, pp.29-46, 2007. ,
Algorithm-based fault tolerance applied to high performance computing, Journal of Parallel and Distributed Computing, vol.69, issue.4, pp.410-416, 2009. ,
DOI : 10.1016/j.jpdc.2008.12.002
Unified model for assessing checkpointing protocols at extreme-scale, Concurrency and Computation: Practice and Experience, 2013. ,
DOI : 10.1002/cpe.3173
URL : https://hal.archives-ouvertes.fr/hal-00696154
Checkpointing strategies for parallel jobs, Proceedings of 2011 International Conference for High Performance Computing, Networking, Storage and Analysis on, SC '11, 2011. ,
DOI : 10.1145/2063384.2063428
URL : https://hal.archives-ouvertes.fr/hal-00738504
A Flexible Checkpoint/Restart Model in Distributed Systems, International Conference on Parallel Processing and Applied Mathematics (PPAM), ser. LNCS, pp.206-215978, 2010. ,
DOI : 10.1007/978-3-642-14390-8_22
URL : https://hal.archives-ouvertes.fr/hal-00788926
Complexity Analysis of Checkpoint Scheduling with Variable Costs, IEEE Transactions on Computers, vol.62, issue.6, 2012. ,
DOI : 10.1109/TC.2012.57
URL : https://hal.archives-ouvertes.fr/hal-00788101
Improving the Computing Efficiency of HPC Systems Using a Combination of Proactive and Preventive Checkpointing, 2013 IEEE 27th International Symposium on Parallel and Distributed Processing, pp.501-512, 2013. ,
DOI : 10.1109/IPDPS.2013.74
Coordinated checkpoint versus message log for fault tolerant MPI, Proceedings IEEE International Conference on Cluster Computing CLUSTR-03, pp.242-250, 2003. ,
DOI : 10.1109/CLUSTR.2003.1253321
Convex Optimization, 2004. ,
Soft error vulnerability of iterative linear algebra methods, Proceedings of the 22nd annual international conference on Supercomputing , ICS '08, pp.155-164, 2008. ,
DOI : 10.1145/1375527.1375552
Soft Real-Time Systems: Predictability vs, Efficiency. Springer series in Computer Science, 2005. ,
PREVENTIVE MIGRATION VS. PREVENTIVE CHECKPOINTING FOR EXTREME SCALE SUPERCOMPUTERS, Parallel Processing Letters, vol.21, issue.02, pp.111-132, 2011. ,
DOI : 10.1142/S0129626411000126
URL : https://hal.archives-ouvertes.fr/hal-00945068
Toward Exascale Resilience, International Journal of High Performance Computing Applications, vol.23, issue.4, pp.374-388, 2009. ,
DOI : 10.1177/1094342009347767
Proactive management of software aging, IBM Journal of Research and Development, vol.45, issue.2, pp.311-332, 2001. ,
DOI : 10.1147/rd.452.0311
Jouletrack: A web based tool for software energy profiling, Design Automation Conference, pp.220-225, 2001. ,
Distributed snapshots: determining global states of distributed systems, ACM Transactions on Computer Systems (TOCS), pp.63-75, 1985. ,
DOI : 10.1145/214451.214456
Reducing Power with Performance Constraints for Parallel Sparse Applications, 19th IEEE International Parallel and Distributed Processing Symposium, p.8, 2005. ,
DOI : 10.1109/IPDPS.2005.378
Multiprocessor energy-efficient scheduling for real-time tasks, Proceedings of International Conference on Parallel Processing, pp.13-20, 2005. ,
Energy-Efficient Scheduling for Real-Time Systems on Dynamic Voltage Scaling (DVS) Platforms, 13th IEEE International Conference on Embedded and Real-Time Computing Systems and Applications (RTCSA 2007), pp.28-38, 2007. ,
DOI : 10.1109/RTCSA.2007.37
Random graph generation for scheduling simulations, Proceedings of the 3rd International ICST Conference on Simulation Tools and Techniques, 2010. ,
DOI : 10.4108/ICST.SIMUTOOLS2010.8667
URL : https://hal.archives-ouvertes.fr/hal-00471255
Introduction to algorithms, 2009. ,
A higher order estimate of the optimum checkpoint interval for restart dumps, Future Generation Computer Systems, vol.22, issue.3, pp.303-312, 2006. ,
DOI : 10.1016/j.future.2004.11.016
Soft errors issues in low-power caches, IEEE Transactions on Very Large Scale Integration (VLSI) Systems, vol.13, issue.10, pp.1157-1166, 2005. ,
DOI : 10.1109/TVLSI.2005.859474
Energy considerations in checkpointing and fault tolerance protocols, IEEE/IFIP International Conference on Dependable Systems and Networks Workshops (DSN 2012), pp.1-6, 2012. ,
DOI : 10.1109/DSNW.2012.6264670
URL : https://hal.archives-ouvertes.fr/hal-00748006
Ecofit: A framework to estimate energy consumption of fault tolerance protocols for HPC applications, Proceedings of the IEEE International Symposium on Cluster Computing and the Grid (CCGrid, pp.522-529, 2013. ,
The International Exascale Software Project: a Call To Cooperative Action By the Global High-Performance Community, International Journal of High Performance Computing Applications, vol.23, issue.4, pp.309-322, 2009. ,
DOI : 10.1177/1094342009347714
Revisiting the Double Checkpointing Algorithm, 2013 IEEE International Symposium on Parallel & Distributed Processing, Workshops and Phd Forum, 2013. ,
DOI : 10.1109/IPDPSW.2013.11
URL : https://hal.archives-ouvertes.fr/hal-00925168
Divisible load, " in Scheduling for Parallel Processing, ser. Computer Communications and Networks, pp.301-365, 2009. ,
A survey of rollback-recovery protocols in message-passing systems, ACM Computing Surveys, vol.34, issue.3, pp.375-408, 2002. ,
DOI : 10.1145/568522.568525
Evaluating the viability of process replication reliability for exascale systems, Proceedings of 2011 International Conference for High Performance Computing, Networking, Storage and Analysis on, SC '11, 2011. ,
DOI : 10.1145/2063384.2063443
Detection and correction of silent data corruption for large-scale high-performance computing, Proceedings of the ACM/IEEE conference on SuperComputing (SC, 2012. ,
Predicting computer system failures using support vector machines, Proceedings of the First USENIX conference on Analysis of system logs. USENIX Association, 2008. ,
Taming of the Shrew: Modeling the Normal and Faulty Behaviour of Large-scale HPC Systems, 2012 IEEE 26th International Parallel and Distributed Processing Symposium, 2012. ,
DOI : 10.1109/IPDPS.2012.107
Fault prediction under the microscope: A closer look into HPC systems, 2012 International Conference for High Performance Computing, Networking, Storage and Analysis, 2012. ,
DOI : 10.1109/SC.2012.57
Computers and Intractability; A Guide to the Theory of NP- Completeness, 1990. ,
Performance-constrained Distributed DVS Scheduling for Scientific Applications on Power-aware Clusters, ACM/IEEE SC 2005 Conference (SC'05), p.34, 2005. ,
DOI : 10.1109/SC.2005.57
On the Optimum Checkpoint Interval, Journal of the ACM, vol.26, issue.2, pp.259-270, 1979. ,
DOI : 10.1145/322123.322131
Performance of rollback recovery systems under intermittent failures, Communications of the ACM, vol.21, issue.6, pp.493-499, 1978. ,
DOI : 10.1145/359511.359531
Optimum checkpoints with age dependent failures, Acta Informatica, vol.27, issue.6, pp.519-531, 1990. ,
DOI : 10.1007/BF00277388
Reliability versus performance for critical applications, Journal of Parallel and Distributed Computing, vol.69, issue.3, pp.326-336, 2009. ,
DOI : 10.1016/j.jpdc.2008.11.002
URL : https://hal.archives-ouvertes.fr/hal-00753169
Energy dissipation in general purpose microprocessors, IEEE Journal of Solid-State Circuits, vol.31, issue.9, pp.1277-1284, 1996. ,
DOI : 10.1109/4.535411
Methods for power optimization in SOC-based data flow systems, ACM Transactions on Design Automation of Electronic Systems, vol.14, issue.3, pp.1-38, 2009. ,
DOI : 10.1145/1529255.1529260
Uncoordinated Checkpointing Without Domino Effect for Send-Deterministic MPI Applications, 2011 IEEE International Parallel & Distributed Processing Symposium, pp.989-1000, 2011. ,
DOI : 10.1109/IPDPS.2011.95
URL : https://hal.archives-ouvertes.fr/hal-01121937
Improving cluster availability using workstation validation, SIGMETRICS Perf. Eval. Rev, vol.30, issue.1, 2002. ,
DOI : 10.1145/511399.511362
URL : http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.18.8437
Modeling and tolerating heterogeneous failures in large parallel systems, Proceedings of 2011 International Conference for High Performance Computing, Networking, Storage and Analysis on, SC '11, 2011. ,
DOI : 10.1145/2063384.2063444
Fault-tolerant iterative methods via selective reliability, Sandia National Laboratories, 2011. ,
On the choice of checkpoint interval using memory usage profile and adaptive time series analysis, Proceedings of the Pacific Rim Internation Symposium on Dependable Computing (PRDC), 2001. ,
Profile-based optimization of power performance by using dynamic voltage scaling on a PC cluster, Proceedings 20th IEEE International Parallel & Distributed Processing Symposium, 2006. ,
DOI : 10.1109/IPDPS.2006.1639597
Algorithm-based fault tolerance for matrix operations, IEEE Transactions on Computers, vol.33, issue.6, pp.518-528, 1984. ,
Cosmic rays don't strike twice, ACM SIGARCH Computer Architecture News, vol.40, issue.1, pp.111-122, 2012. ,
DOI : 10.1145/2189750.2150989
Leakage aware dynamic voltage scaling for real-time embedded systems, Proceedings of the 41st annual conference on Design automation , DAC '04, pp.275-280, 2004. ,
DOI : 10.1145/996566.996650
Optimizing HPC Fault-Tolerant Environment: An Analytical Approach, 2010 39th International Conference on Parallel Processing, pp.525-534, 2010. ,
DOI : 10.1109/ICPP.2010.80
An LSI for VDD-Hopping and MPEG4 system based on the chip, Proceedings of the International Symposium on Circuits and Systems (ISCAS), 2001. ,
Superposition of renewal processes and an application to multi-server queues, Statistics & Probability Letters, vol.76, issue.17, pp.1914-1924, 2006. ,
DOI : 10.1016/j.spl.2006.04.041
Power Aware Scheduling of Bag-of-Tasks Applications with Deadline Constraints on DVS-enabled Clusters, Seventh IEEE International Symposium on Cluster Computing and the Grid (CCGrid '07), pp.541-548, 2007. ,
DOI : 10.1109/CCGRID.2007.85
Exascale computing trends: Adjusting to the " new normal " in computer architecture, 2013. ,
Software rejuvenation: Analysis, module and applications, International Symposium on Fault-Tolerant Computing (FTCS), p.381, 1995. ,
The Failure Trace Archive: Enabling Comparative Analysis of Failures in Diverse Distributed Systems, 2010 10th IEEE/ACM International Conference on Cluster, Cloud and Grid Computing, pp.398-407, 2010. ,
DOI : 10.1109/CCGRID.2010.71
URL : https://hal.archives-ouvertes.fr/inria-00433523
Battery-driven system design: a new frontier in low power design, Proceedings of ASP-DAC/VLSI Design 2002. 7th Asia and South Pacific Design Automation Conference and 15h International Conference on VLSI Design, pp.261-267, 2002. ,
DOI : 10.1109/ASPDAC.2002.994932
Leakage-Aware Multiprocessor Scheduling, Journal of Signal Processing Systems, vol.74, issue.8, pp.73-88, 2009. ,
DOI : 10.1007/s11265-008-0176-8
Fault-aware runtime strategies for high-performance computing, IEEE Transactions on Parallel and Distributed Systems, vol.20, issue.4, pp.460-473, 2009. ,
Failure Prediction in IBM BlueGene/L Event Logs, Seventh IEEE International Conference on Data Mining (ICDM 2007), pp.583-588, 2007. ,
DOI : 10.1109/ICDM.2007.46
A variational calculus approach to optimal checkpoint placement, IEEE Transactions on Computers, pp.699-708, 2001. ,
An optimal checkpoint/restart model for a large scale high performance computing system, Proceedings of the International Parallel and Distributed Processing Symposium (IPDPS), 2008. ,
When is multi-version checkpointing needed?, Proceedings of the 3rd Workshop on Fault-tolerance for HPC at extreme scale, FTXS '13, 2013. ,
DOI : 10.1145/2465813.2465821
The Use of Triple-Modular Redundancy to Improve Computer Reliability, IBM Journal of Research and Development, vol.6, issue.2, pp.200-209, 1962. ,
DOI : 10.1147/rd.62.0200
The interplay of power management and fault recovery in real-time systems, IEEE Transactions on Computers, vol.53, issue.2, 2003. ,
DOI : 10.1109/TC.2004.1261830
Assessing Energy Efficiency of Fault Tolerance Protocols for HPC Systems, 2012 IEEE 24th International Symposium on Computer Architecture and High Performance Computing, 2012. ,
DOI : 10.1109/SBAC-PAD.2012.12
A power supply selector for energy-and area-efficient local dynamic voltage scaling, " in Integrated Circuit and System Design. Power and Timing Modeling , Optimization and Simulation, pp.556-565, 2007. ,
The internet begins with coal, Environment and Climate News, 1999. ,
Probability and Computing: Randomized Algorithms and Probabilistic Analysis, 2005. ,
DOI : 10.1017/CBO9780511813603
Design, modeling, and evaluation of a scalable multi-level checkpointing system, Proceedings of the ACM/IEEE conference on SuperComputing (SC, pp.1-11, 2010. ,
Hiding Checkpoint Overhead in HPC Applications with a Semi-Blocking Algorithm, 2012 IEEE International Conference on Cluster Computing, 2012. ,
DOI : 10.1109/CLUSTER.2012.82
Numerical Optimization, 2006. ,
DOI : 10.1007/b98874
Software energy reduction techniques for variable-voltage processors, IEEE Design & Test of Computers, vol.18, issue.2, pp.31-41, 2001. ,
DOI : 10.1109/54.914613
Modeling the Impact of Checkpoints on Next-Generation Systems, 24th IEEE Conference on Mass Storage Systems and Technologies (MSST 2007), pp.30-46, 2007. ,
DOI : 10.1109/MSST.2007.4367962
Modeling the Impact of Checkpoints on Next-Generation Systems, 24th IEEE Conference on Mass Storage Systems and Technologies (MSST 2007), pp.30-46, 2007. ,
DOI : 10.1109/MSST.2007.4367962
Fault-aware job scheduling for BlueGene/L systems, Proceedings of the International Parallel and Distributed Processing Symposium (IPDPS), pp.64-73, 2004. ,
DOI : 10.1109/ipdps.2004.1302991
Distribution-Free Checkpoint Placement Algorithms Based on Min-Max Principle, IEEE Transactions on Dependable and Secure Computing, vol.3, issue.2, pp.130-140, 2006. ,
DOI : 10.1109/TDSC.2006.22
Processor Allocation and Checkpoint Interval Selection in Cluster Computing Systems, Journal of Parallel and Distributed Computing, vol.61, issue.11, p.1590, 2001. ,
DOI : 10.1006/jpdc.2001.1757
Scheduling and voltage scaling for energy/reliability trade-offs in fault-tolerant time-triggered embedded systems, Proceedings of the 5th IEEE/ACM international conference on Hardware/software codesign and system synthesis , CODES+ISSS '07, pp.233-238, 2007. ,
DOI : 10.1145/1289816.1289873
Energy efficient scheduling techniques for real-time embedded systems, 2004. ,
Speed scaling of tasks with precedence constraints, Theory of Computing Systems, pp.67-80, 2008. ,
A 1 PB/s file system to checkpoint three million MPI tasks, Proceedings of the 22nd international symposium on High-performance parallel and distributed computing, HPDC '13, pp.143-154, 2013. ,
DOI : 10.1145/2493123.2462908
Scheduling parallel programs assuming preallocation, 1995. ,
On the complexity of scheduling checkpoints for computational workflows, " in Workshop on Fault-Tolerance for HPC at Extreme Scale (FTXS), 2012. ,
Introduction to Probability Models, Tenth Edition, 2009. ,
Principles of mathematical analysis, international Series in Pure and Applied Mathematics, 1976. ,
Self-stabilizing iterative solvers, Proceedings of the Workshop on Latest Advances in Scalable Algorithms for Large-Scale Systems, ScalA '13, 2013. ,
DOI : 10.1145/2530268.2530272
Exascale software study: Software challenges in extreme scale systems, 2009. ,
Design and modeling of a non-blocking checkpointing system, 2012 International Conference for High Performance Computing, Networking, Storage and Analysis, 2012. ,
DOI : 10.1109/SC.2012.46
A large-scale study of failures in high-performance computing systems, Proceedings of the International Conference on Dependable Systems and Networks (DSN), pp.249-258, 2006. ,
Exascale Computing Technology Challenges, Internation Conference on High Performance Computing for Computational Science (VECPAR), ser, pp.1-25, 2011. ,
DOI : 10.1109/MM.2009.5
Models and algorithms for reliability-oriented task-allocation in redundant distributed-computer systems, IEEE Transactions on Reliability, vol.38, issue.1, pp.16-27, 1989. ,
DOI : 10.1109/24.24570
Temperature-aware microarchitecture, ACM Transactions on Architecture and Code Optimization, vol.1, issue.1, pp.94-125, 2004. ,
DOI : 10.1145/980152.980157
Deadline Scheduling for Real-Time Systems: EDF and Related Algorithms, 1998. ,
DOI : 10.1007/978-1-4615-5535-3
On the Optimum Checkpoint Selection Problem, SIAM Journal on Computing, vol.13, issue.3, pp.630-649, 1984. ,
DOI : 10.1137/0213039
Modeling Coordinated Checkpointing for Large-Scale Supercomputers, 2005 International Conference on Dependable Systems and Networks (DSN'05), pp.812-821, 2005. ,
DOI : 10.1109/DSN.2005.67
Towards Energy Aware Scheduling for Precedence Constrained Parallel Tasks in a Cluster with DVFS, 2010 10th IEEE/ACM International Conference on Cluster, Cloud and Grid Computing, pp.368-377, 2010. ,
DOI : 10.1109/CCGRID.2010.19
Checkpoint space reclamation for uncoordinated checkpointing in message-passing systems, IEEE Transactions on Parallel and Distributed Systems, vol.6, issue.5, pp.546-554, 1995. ,
DOI : 10.1109/71.382324
Overcoming The Difficulties Created by the Volatile Nature of Desktop Grids Through Understanding, Prediction and Redundancy, 2009. ,
On-Line and Off-Line DVS for Fixed Priority with Preemption Threshold Scheduling, 2009 International Conference on Embedded Software and Systems, pp.273-280, 2009. ,
DOI : 10.1109/ICESS.2009.50
A scheduling model for reduced CPU energy, Proceedings of IEEE 36th Annual Foundations of Computer Science, p.374, 1995. ,
DOI : 10.1109/SFCS.1995.492493
A first order approximation to the optimum checkpoint interval, Communications of the ACM, vol.17, issue.9, pp.530-531, 1974. ,
DOI : 10.1145/361147.361115
Practical online failure prediction for BlueGene/P: Period-based vs event-driven, Proceedings of the International Conference on Dependable Systems and Networks Workshops, pp.259-264, 2011. ,
DOI : 10.1109/dsnw.2011.5958823
Energy-aware adaptive checkpointing in embedded real-time systems, 2003 Design, Automation and Test in Europe Conference and Exhibition, p.10918, 2003. ,
DOI : 10.1109/DATE.2003.1253723
Task scheduling and voltage selection for energy minimization, Proceedings of the 39th conference on Design automation , DAC '02, pp.183-188, 2002. ,
DOI : 10.1145/513918.513966
A scalable double in-memory checkpoint and restart scheme towards exascale, IEEE/IFIP International Conference on Dependable Systems and Networks Workshops (DSN 2012), 2012. ,
DOI : 10.1109/DSNW.2012.6264677
FTC-Charm++: An in-memory checkpoint-based fault tolerant runtime for Charm++ and MPI, Cluster Computing, 2004. ,
Reliability-aware scalability models for high performance computing, 2009 IEEE International Conference on Cluster Computing and Workshops, 2009. ,
DOI : 10.1109/CLUSTR.2009.5289177
A practical failure prediction with location and lead time for BlueGene/P, Proceedings of the International Conference on Dependable Systems and Networks Workshops, pp.15-22, 2010. ,
Reliability-aware dynamic energy management in dependable embedded real-time systems, Real-Time and Embedded Technology and Applications Symposium, pp.397-407, 2006. ,
DOI : 10.1145/1880050.1880062
Energy management for real-time embedded systems with reliability requirements, Proceedings of the 2006 IEEE/ACM international conference on Computer-aided design , ICCAD '06, pp.528-534, 2006. ,
DOI : 10.1145/1233501.1233608
The effects of energy management on reliability in real-time embedded systems, Proceedings of the IEEE/ACM International Conference on Computer- Aided Design (ICCAD), pp.35-40, 2004. ,
Reclaiming the energy of a schedule: models and algorithms, Publications Articles in international refereed journals Concurrency and Computation: Practice and Experience, pp.1505-1523, 2013. ,
DOI : 10.1002/cpe.2889
URL : https://hal.archives-ouvertes.fr/inria-00584944
Power-aware replica placement in tree networks with multiple servers per client, Sustainable Computing: Informatics and Systems, 2014. ,
URL : https://hal.archives-ouvertes.fr/hal-01059365
On the number of binary-minded individuals required to compute 1 ,
Checkpointing algorithms and fault prediction, Journal of Parallel and Distributed Computing, vol.74, issue.2, pp.2048-2064, 2014. ,
DOI : 10.1016/j.jpdc.2013.10.010
URL : https://hal.archives-ouvertes.fr/hal-00788313
Speed Scaling to Manage Temperature, Articles in international refereed conferences Theory and Practice of Algorithms in (Computer) Systems (TAPAS), pp.9-20, 2011. ,
DOI : 10.1007/978-3-642-19754-3_4
URL : https://hal.archives-ouvertes.fr/hal-00786200
Energy-aware scheduling under reliability and makespan constraints, 2012 19th International Conference on High Performance Computing, 2012. ,
DOI : 10.1109/HiPC.2012.6507482
URL : https://hal.archives-ouvertes.fr/hal-00763384
Brief announcement, Proceedings of the 23rd ACM symposium on Parallelism in algorithms and architectures, SPAA '11, pp.135-136, 2011. ,
DOI : 10.1145/1989493.1989512
URL : https://hal.archives-ouvertes.fr/hal-00857268
Optimal Checkpointing Period: Time vs. Energy, Performance Modeling, Benchmarking and Simulation of High Performance Computer Systems (PMBS), ser. LNCS, 2013. ,
DOI : 10.1007/978-3-319-10214-6_10
URL : https://hal.archives-ouvertes.fr/hal-00926199
On the Combination of Silent Error Detection and Checkpointing, 2013 IEEE 19th Pacific Rim International Symposium on Dependable Computing, 2013. ,
DOI : 10.1109/PRDC.2013.10
URL : https://hal.archives-ouvertes.fr/hal-00836871
Power-aware replica placement in tree networks with multiple servers per client, Proceedings of Euro-Par: Parallel Processing, 2014. ,
URL : https://hal.archives-ouvertes.fr/hal-01059365
Energy-aware checkpointing of divisible tasks with soft or hard deadlines, 2013 International Green Computing Conference Proceedings, pp.1-8, 2013. ,
DOI : 10.1109/IGCC.2013.6604467
URL : https://hal.archives-ouvertes.fr/hal-00857244
Implementing a Systolic Algorithm for QR Factorization on Multicore Clusters with PaRSEC, Workshop on Productivity and Performance (PROPER), ser, 2013. ,
DOI : 10.1007/978-3-642-54420-0_64
URL : https://hal.archives-ouvertes.fr/hal-00879248
Checkpointing strategies with prediction windows Approximation algorithms for energy, reliability and makespan optimization problems, Proceedings of the Pacific Rim Internation Symposium on Dependable Computing (PRDC, 2012. ,
Reclaiming the energy of a schedule: models and algorithms, Concurrency and Computation: Practice and Experience, vol.24, issue.9, 2011. ,
DOI : 10.1002/cpe.2889
URL : https://hal.archives-ouvertes.fr/inria-00584944
Power-aware replica placement in tree networks with multiple servers per client INRIA, Rapport de recherche RR-8474, 2014. ,
Energy-aware checkpointing of divisible tasks with soft or hard deadlines, 2013 International Green Computing Conference Proceedings, 2013. ,
DOI : 10.1109/IGCC.2013.6604467
URL : https://hal.archives-ouvertes.fr/hal-00857244
Energy-aware scheduling under reliability and makespan constraints, 2012 19th International Conference on High Performance Computing, 2012. ,
DOI : 10.1109/HiPC.2012.6507482
URL : https://hal.archives-ouvertes.fr/hal-00763384
Impact of fault prediction on checkpointing strategies INRIA, Tech. Rep. RR-8023 this report is rendered obsolete by RR-8237 and RR-8239 which cover the integrality of this report in a more precise fashion, 2012. ,
Co-scheduling algorithms for high-throughput workload execution, Journal of Scheduling, vol.23, issue.2, 2013. ,
DOI : 10.1109/DATE.2012.6176641
URL : https://hal.archives-ouvertes.fr/hal-01252366
Scheduling the I/O of HPC applications under congestion INRIA, Rapport de recherche RR-8519, 2014. ,