$0' + ,
,
,
,
,
,
,
,
TRANQUIL: a language for an array processing computer, AFIPS. ACM, pp.57-73, 1969. ,
, Compilers: Principles, Techniques, and Tools, 2006.
Barrier inference, POPL, pp.342-354, 1998. ,
SSA is functional programming, SIGPLAN Notices, vol.33, pp.17-20, 1998. ,
An adaptive performance modeling tool for gpu architectures, pp.105-114, 2010. ,
A study of replacement algorithms for a virtual storage computer, IBM Systems Journal, vol.5, pp.78-101, 1966. ,
Vcode: A data-parallel intermediate language, FMPC. ACM, pp.471-480, 1990. ,
Memory system on Fusion APUs, AMD Fusion Developer Summit. AMD, 2011. ,
Control structures for data-parallel SIMD languages: semantics and implementation, Future Generation Computer Systems, vol.8, pp.363-378, 1992. ,
The Illiac IV system, Proceedings of the IEEE 60, vol.4, pp.369-388, 1972. ,
, Rematerialization. In PLDI. ACM, pp.311-321, 1992.
, Efficient oblivious parallel sorting on the MasPar MP-1. ICSS 1, 0200.
Fast copy coalescing and live-range identification, PLDI. ACM, pp.25-32, 2002. ,
Static memory access pattern analysis on a massively parallel GPU ,
A control-structure splitting optimization for gpgpu, Computing frontiers, pp.147-150, 2009. ,
GPU-quicksort: A practical quicksort algorithm for graphics processors, Journal of Experimental Algorithmics, vol.14, pp.4-24, 2009. ,
Rodinia: A benchmark suite for heterogeneous computing, pp.44-54, 2009. ,
Automatic construction of sparse data flow evaluation graphs, POPL. ACM, pp.55-66, 1991. ,
Dynamic detection of uniform and affine vectors in GPGPU computations, HPPC, pp.46-55, 2009. ,
URL : https://hal.archives-ouvertes.fr/hal-00396719
Automatic discovery of linear restraints among variables of a program, POPL. ACM, pp.84-96, 1978. ,
Divergence analysis and optimizations, PACT. IEEE, pp.320-329, 2011. ,
Profiling divergences in GPU applications, Concurrency and Computation: Practice and Experience, vol.25, pp.775-789, 2013. ,
Efficiently computing static single assignment form and the control dependence graph, TOPLAS, vol.13, pp.451-490, 1991. ,
A singleprogram-multiple-data computational model for epex/fortran, Parallel Computing, vol.7, pp.11-24, 1988. ,
Ocelot, a dynamic optimization framework for bulk-synchronous applications in heterogeneous systems, PACT. IEEE, pp.354-364, 2010. ,
Formal specification of parallel SIMD execution, Theo. Comp. Science, vol.169, pp.39-65, 1996. ,
The program dependence graph and its use in optimization, TOPLAS, vol.9, pp.319-349, 1987. ,
Some computer organizations and their effectiveness, IEEE Trans. Comput. C, vol.21, 1972. ,
Dynamic warp formation and scheduling for efficient GPU control flow, pp.407-420, 2007. ,
Understanding throughput-oriented architectures, Commun. ACM, vol.53, pp.58-66, 2010. ,
Addressing gpu on-chip shared memory bank conflicts using elastic pipeline, International Journal of Parallel Programming, vol.41, pp.400-429, 2013. ,
Variance analysis for translating CUDA code for execution by a general purpose processor, 2009. ,
Optimal register allocation for SSA-form programs in polynomial time, Information Processing Letters, vol.98, pp.150-155, 2006. ,
Reducing branch divergence in gpu programs, GPGPU-4. ACM, vol.3, pp.1-3, 2011. ,
Whole-function vectorization, CGO. IEEE, pp.141-150, 2011. ,
Improving performance of opencl on cpus, CC, pp.1-20, 2012. ,
POMP, or how to design a massively parallel machine with small developments, PARLE, pp.83-100, 1991. ,
URL : https://hal.archives-ouvertes.fr/hal-01166357
Wavefront array processor: Language, architecture, and applications, IEEE Trans. Comput, vol.31, pp.1054-1066, 1982. ,
Performance in GPU architectures: Potentials and distances, WDDD. IEEE, pp.75-81, 2011. ,
Glypnir-a programming language for Illiac IV, Commun. ACM, vol.18, pp.157-164, 1975. ,
Openmp to gpgpu: a compiler framework for automatic translation and optimization, PPoPP. ACM, pp.101-110, 2009. ,
Exploring the tradeoffs between programmability and efficiency in data-parallel accelerators, ISCA. ACM, pp.129-140, 2011. ,
Convergence and scalarization for data-parallel architectures, CGO. ACM, pp.1-11, 2013. ,
Extending a c-like language for portable simd programming, PPOPP. ACM, pp.65-74, 2012. ,
Dynamic warp subdivision for integrated branch and memory divergence tolerance, ISCA. ACM, pp.235-246, 2010. ,
The octagon abstract domain, Higher Order Symbol. Comput, vol.19, pp.31-100, 2006. ,
Ip routing processing with graphic processors, pp.93-98, 2010. ,
The gpu computing era, IEEE Micro, vol.30, pp.56-69, 2010. ,
Computer Organization and Design, (Patterson and Hennessy) 4th Ed, 2009. ,
Principles of program analysis, 2005. ,
The program dependence web: a representation supporting control-, data-, and demand-driven interpretation of imperative languages, PLDI. ACM, pp.257-271, 1990. ,
, , 2011.
A language for array and vector processors, TOPLAS, vol.1, pp.177-195, 1979. ,
ISPC: a SPMD compiler for high-performance cpu programming, 2012. ,
Linear scan register allocation, TOPLAS, vol.21, pp.895-913, 1999. ,
EigenCFA: Accelerating flow analysis with GPUs, 2011. ,
Optimization principles and application performance evaluation of a multithreaded gpu using cuda, pp.73-82, 2008. ,
Programming model for a heterogeneous x86 platform, PLDI. ACM, pp.431-440, 2009. ,
Adaptive inputaware compilation for graphics engines, 2012. ,
Divergence analysis with affine constraints, SBAC-PAD, pp.137-146, 2012. ,
URL : https://hal.archives-ouvertes.fr/hal-00650235
Spill code placement for simd machines, SBLP. SBC, pp.12-26, 2012. ,
Cudalign: using gpu to accelerate the comparison of megabase genomic sequences, PPoPP. ACM, pp.137-146, 2010. ,
User-input dependence analysis via graph reachability, 2008. ,
Efficient compilation of fine-grained SPMD-threaded programs for multicore CPUs, pp.111-119, 2010. ,
The parboil report, 2012. ,
Efficient building and placing of gating functions, PLDI. ACM, pp.47-55, 1995. ,
Program slicing, ICSE. IEEE, pp.439-449, 1981. ,
A GPGPU compiler for memory optimization and parallelism management, PLDI. ACM, pp.86-97, 2010. ,
Streamlining GPU applications on the fly: thread divergence elimination through runtime thread-data remapping, ICS. ACM, pp.115-126, 2010. ,
On-the-fly elimination of dynamic irregularities for GPU computing, ASPLOS. ACM, pp.369-380, 2011. ,
A quantitative performance analysis model for GPU architectures, HPCA. ACM, pp.382-393, 2011. ,