W. Handler, A. Bode, G. Fritsch, W. Henning, and J. Volkert, A tightly coupled and hierarchical multiprocessor architecture, Computer Physics Communications, pp.87-93, 1985.
DOI : 10.1016/0010-4655(85)90139-0

A. Van-tiborg and L. Wittie, Wave scheduling -decentralized scheduling of task forces in multicomputers, IEEE Trans. Computers, pp.835-844, 1984.

H. Sullivan, T. Bashkow, and D. Klappholz, A large-scale, homogeneous , fully distributed parallel machine, ii, Proc. 4th Ann. Int'I Symp. Computer Architecture, pp.118-124, 1977.

. Watson, The design and prototyping of the pasm reconfigurable parallel processing system, Parallel Computing: Paradigms and Applications, pp.78-114, 1996.

G. J. Lipovski and M. Malek, Parallel computing: Theory and comparisons, 1987.

P. Duclos, F. Boeri, M. Auguin, and G. Giraudon, Image processing on a SIMD/SPMD architecture: OPSILA, [1988 Proceedings] 9th International Conference on Pattern Recognition, pp.430-433, 1988.
DOI : 10.1109/ICPR.1988.28259

X. Xu, A hierarchically-controlled simd machine for 2d dct on fpgas, SOC Conference, pp.276-279, 2005.

X. Wang and S. G. Ziavras, Exploiting mixed-mode parallelism for matrix operations on the HERA architecture through reconfiguration, IEEE Proceedings, Computers and Digital Techniques, pp.249-260, 2006.
DOI : 10.1049/ip-cdt:20045136

H. Krichene, M. Baklouti, P. Marquet, J. L. Dekeyser, and M. Abid, Broadcast with mask on a massively parallel processing on a chip, 2012 International Conference on High Performance Computing & Simulation (HPCS), pp.275-280, 2012.
DOI : 10.1109/HPCSim.2012.6266924

URL : https://hal.archives-ouvertes.fr/hal-00688418

R. H. Arpaci, D. E. Culler, A. Krishnamurthy, S. G. Steinberg, and K. Yelick, Empirical evaluation of the CRAY-T3D: a compiler perspective, The 22nd annual international symposium on Computer architecture (ISCA95), pp.320-331, 1995.

D. S. Pierre, M. C. Wells, S. W. Wong-chan, R. Yang, and . Zak, The network architecture of the connection machine CM-5, pp.145-158, 1996.

D. K. Panda, Fast barrier synchronization in wormhole k-ary n-cube networks with multidestination worms, The 1st IEEE Symposium on High-Performance Computer Architecture (HPCA95), 0200.
DOI : 10.1016/0167-739X(95)00026-O

S. L. Scott, Synchronization and communication in the T3E multiprocessor, The 7th international conference on Architectural support for programming languages and operating systems:ASPLOS-VII,N e w York, pp.26-36, 1996.

R. Haskell and D. M. Hanna, A VHDL--Forth Core for FPGAs, Microprocessors and Microsystems, vol.28, issue.3, pp.115-125, 2004.
DOI : 10.1016/j.micpro.2004.01.002

M. Baklouti, A rapid design method of a massively parallel system on chip: from modeling to fpga implementation, 2010.
URL : https://hal.archives-ouvertes.fr/tel-00527894

P. Chen, K. Dai, D. Wu, J. Rao, and X. Zou, Parallel Algorithms for FIR Computation Mapped to ESCA Architecture, 2010 WASE International Conference on Information Engineering, pp.123-126, 2010.
DOI : 10.1109/ICIE.2010.37

J. Oberg and P. Ellervee, Revolver: a high-performance MIMD architecture for collision free computing, Proceedings. 24th EUROMICRO Conference (Cat. No.98EX204), pp.301-308, 1998.
DOI : 10.1109/EURMIC.1998.711814

C. Xiaoyi, Y. Qingdong, and L. Peng, Data bypassing architecture and circuit design for 32-bit digital signal processor, Journal of Electronics, 2005.