N. Adami, R. Leonardi, and P. Migliorati, An overview of multi-modal techniques for the characterization of sport programmes, Proc. of SPIE-VCIP'03, pp.1296-1306, 2003.

A. A. Alatan, A. N. Akansu, and W. Wolf, Multi-modal dialog scene detection using hidden Markov models for content-based multimedia indexing, Multimedia Tools and Applications, vol.14, issue.2, pp.137-151, 2001.
DOI : 10.1023/A:1011395131992

J. Assfalg, M. Bertini, A. Del-bimbo, W. Nunziati, and P. Pala, Soccer highlights detection and recognition using HMMs, Proceedings. IEEE International Conference on Multimedia and Expo, pp.825-828, 2002.
DOI : 10.1109/ICME.2002.1035909

N. Babaguchi, Y. Kawai, and T. Kitahashi, Event based video indexing by intermodal collaboration, Proceedings of IEEE International Conference on Multimedia Computing and Systems (ICMCS'99), pp.782-786, 1999.
DOI : 10.1109/6046.985555

N. Babaguchi, Y. Kawai, and T. Kitahashi, Event based indexing of broadcasted sports video by intermodal collaboration, IEEE Transactions on Multimedia, vol.4, issue.1, pp.68-75, 2002.
DOI : 10.1109/6046.985555

N. Babaguchi and N. Nitta, Intermodal collaboration: a strategy for semantic content analysis for broadcasted sports video, Proceedings 2003 International Conference on Image Processing (Cat. No.03CH37429), pp.13-16, 2003.
DOI : 10.1109/ICIP.2003.1246886

T. M. Bae, S. H. Jin, and Y. M. Ro, Video Segmentation Using Hidden Markov Model with Multimodal Features, Proceedings of the International Conference on Image and Video Retrieval, pp.401-409, 2004.
DOI : 10.1007/978-3-540-27814-6_48

S. Bengio, An asynchronous hidden Markov model for audio-visual speech recognition, Advances in Neural Information Processing Systems, NIPS 15, pp.1237-1244, 2003.

A. Berger, S. D. Pietra, and V. J. Della-pietra, A maximum entropy approach to natural language processing, Computational Linguistics, vol.22, issue.1, pp.39-71, 1996.

M. Betser and G. Gravier, Multiple events tracking in sound tracks, 2004 IEEE International Conference on Multimedia and Expo (ICME) (IEEE Cat. No.04TH8763), 2004.
DOI : 10.1109/ICME.2004.1394646

M. Betser, G. Gravier, and R. Gribomval, Extraction of information from video sound tracks -Can we detect simultaneous events ?, Proc. of Conference on Content-Based Multimedia Indexing, pp.71-78, 2003.
URL : https://hal.archives-ouvertes.fr/inria-00576209

J. Boreczky and L. Wilcox, A hidden Markov model framework for video segmentation using audio and image features, Proceedings of the 1998 IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP '98 (Cat. No.98CH36181), pp.3741-3744, 1998.
DOI : 10.1109/ICASSP.1998.679697

H. Bourlard and S. Dupont, A new ASR approach based on independent processing and recombination of partial frequency bands, Proc. ICSLP '96, pp.426-429, 1996.

P. Bouthemy, M. Gelgon, and F. Ganansia, A unified approach to shot change detection and camera motion characterization, IEEE Transactions on Circuits and Systems for Video Technology, vol.9, issue.7, pp.1030-1044, 1999.
DOI : 10.1109/76.795057
URL : https://hal.archives-ouvertes.fr/hal-00450210

M. Brand, N. Oliver, and A. Pentland, Coupled hidden Markov models for complex action recognition, Proceedings of IEEE Computer Society Conference on Computer Vision and Pattern Recognition, pp.994-999, 1997.
DOI : 10.1109/CVPR.1997.609450
URL : http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.31.611

R. Brunelli, O. Mich, and C. M. Modena, A Survey on the Automatic Indexing of Video Data,, Journal of Visual Communication and Image Representation, vol.10, issue.2, pp.78-112, 1999.
DOI : 10.1006/jvci.1997.0404

J. Calic, N. Campbell, S. Dasiopoulou, and Y. Kompatsiaris, An overview of multimodal video representation for semantic analysis, Proceedings of the European Workshop on the Integration of Knowledge, Semantics and Digital Media Technologies, 2005.

P. Chang, M. Han, and Y. Gong, Extract highlights from baseball game video with hidden Markov models, Proceedings. International Conference on Image Processing, pp.609-612, 2002.
DOI : 10.1109/ICIP.2002.1038097

Y. Chang, W. Zeng, I. Kamel, and R. Alonso, Integrated image and speech analysis for content-based video indexing, Proceedings of the International Conference on Multimedia Computing and Systems, pp.306-313, 1996.

S. Chen, M. Shyu, M. Chen, and C. Zhang, A multimodal data mining framework for soccer goal detection based on decision tree logic, Proceedings of the IEEE International Conference on Multimedia and Expo, pp.265-268, 2004.
DOI : 10.1504/IJCAT.2006.012001

F. Coldefy, P. Bouthemy, M. Betser, and G. Gravier, Tennis video abstraction from audio and visual cues, IEEE 6th Workshop on Multimedia Signal Processing, 2004., 2004.
DOI : 10.1109/MMSP.2004.1436457
URL : http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.59.6305

R. Dahyot, A. Kokaram, N. Rea, and H. Denman, Joint audio visual retrieval for tennis broadcasts, 2003 IEEE International Conference on Acoustics, Speech, and Signal Processing, 2003. Proceedings. (ICASSP '03)., pp.561-564, 2003.
DOI : 10.1109/ICASSP.2003.1199536

J. N. Darroch and D. Ratcliff, Generalized Iterative Scaling for Log-Linear Models, The Annals of Mathematical Statistics, vol.43, issue.5, pp.1470-1480, 1972.
DOI : 10.1214/aoms/1177692379

N. Dimitrova, L. Agnihorti, and G. Wei, Video classification based on HMM using text and faces, Proceedings of the European Signal Processing Conference, 2000.

S. Eickeler, A. Kosmala, and G. , Hidden Markov model based continuous online gesture recognition, Proceedings. Fourteenth International Conference on Pattern Recognition (Cat. No.98EX170), pp.1206-1208, 1998.
DOI : 10.1109/ICPR.1998.711914
URL : http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.319.6156

S. Eickeler and S. Muller, Content-based video indexing of TV broadcast news using hidden Markov models, 1999 IEEE International Conference on Acoustics, Speech, and Signal Processing. Proceedings. ICASSP99 (Cat. No.99CH36258), pp.2997-3000, 1999.
DOI : 10.1109/ICASSP.1999.757471

A. M. Ferman and A. M. Tekalp, Probabilistic analysis and extraction of video content, Proceedings of the IEEE International Conference on Image Processing (ICIP 99), 1999.

S. Fine, Y. Singer, and N. Tishby, The hierarchical hidden Markov model: Analysis and applications, Machine Learning, pp.41-62, 1998.

S. Fischer, R. Lienhart, and W. Effelsberg, Automatic recognition of film genres, Proceedings of the third ACM international conference on Multimedia , MULTIMEDIA '95, pp.295-304, 1995.
DOI : 10.1145/217279.215283

U. Gargi, R. Kasturi, and S. Strayer, Performance characterization of video-shot-change detection methods, IEEE Transactions on Circuits and Systems for Video Technology, vol.10, issue.1, pp.1-13, 2000.
DOI : 10.1109/76.825852

F. Gers, J. Schmidhuber, and F. Cummins, Learning to Forget: Continual Prediction with LSTM, Neural Computation, vol.3, issue.10, pp.2451-2471, 2000.
DOI : 10.1162/neco.1990.2.4.490

X. Gibert, L. Huipingand, and D. Doermann, Sports video classification using HMMS, 2003 International Conference on Multimedia and Expo. ICME '03. Proceedings (Cat. No.03TH8698), 2003.
DOI : 10.1109/ICME.2003.1221624
URL : http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.331.1181

H. Glotin, D. Vergyri, C. Neti, G. Potamianos, and J. Luettin, Weighting schemes for audio-visual fusion in speech recognition, 2001 IEEE International Conference on Acoustics, Speech, and Signal Processing. Proceedings (Cat. No.01CH37221), 2001.
DOI : 10.1109/ICASSP.2001.940795

Y. Gong, M. Han, W. Hua, and W. Xu, Maximum entropy model-based baseball highlight detection and classification, Computer Vision and Image Understanding, vol.96, issue.2, pp.181-199, 2004.
DOI : 10.1016/j.cviu.2004.02.002

M. Han, W. Hua, W. Xu, and Y. Gong, An integrated baseball digest system using maximum entropy method, Proceedings of the tenth ACM international conference on Multimedia , MULTIMEDIA '02, pp.347-350, 2002.
DOI : 10.1145/641007.641081

A. Hanjalic, R. L. Lagendijk, and J. Biemond, Automated high-level movie segmentation for advanced video-retrieval systems, IEEE Transactions on Circuits and Systems for Video Technology, vol.9, issue.4, pp.580-588, 1999.
DOI : 10.1109/76.767124

S. Haykin, Neural Networks: A Comprehensive Foundation, 1999.

S. Hochreiter, Y. Bengio, P. Frasconi, and J. Schmidhuber, Gradient flow in recurrent nets: the difficulty of learning long-term dependencies, A Field Guide to Dynamical Recurrent Neural Networks, 2001.

S. Hochreiter and J. Schmidhuber, Long Short-Term Memory, Neural Computation, vol.4, issue.8, pp.1735-1780, 1997.
DOI : 10.1016/0893-6080(88)90007-X

W. Hsu, S. Chang, C. Huangi, L. Kennedy, C. Lin et al., Discovery and fusion of salient multi-modal features towards news story segmentation, IS&T/SPIE Symposium on Electronic Imaging: Science and Technology -SPIE Storage and Retrieval of Image/Video Database, 2004.
DOI : 10.1117/12.533037

W. H. Hsu and S. Chang, Generative, discriminative, and ensemble learning on multi-modal perceptual fusion toward news video story segmentation, 2004 IEEE International Conference on Multimedia and Expo (ICME) (IEEE Cat. No.04TH8763), pp.1091-1094, 2004.
DOI : 10.1109/ICME.2004.1394400
URL : http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.233.5493

J. Huang, Z. Liu, Y. Wang, Y. Chen, and E. K. Wong, Integration of multimodal features for video scene classification based on HMM, 1999 IEEE Third Workshop on Multimedia Signal Processing (Cat. No.99TH8451), pp.53-58, 1999.
DOI : 10.1109/MMSP.1999.793797

Q. Huang, Z. Liu, A. Rosenberg, D. Gibbon, and B. Shahraray, Automated generation of news content hierarchy by integrating audio, video, and text information, Proceedings of the IEEE International Conference On acoustics, speech, and signal processing, pp.3025-3028, 1999.

U. Iurgel, R. Meermeier, S. Eickeler, and G. , New approaches to audio-visual segmentation of TV news for automatic topic retrieval, 2001 IEEE International Conference on Acoustics, Speech, and Signal Processing. Proceedings (Cat. No.01CH37221), pp.1397-1400, 2001.
DOI : 10.1109/ICASSP.2001.941190

R. Jasinschi, N. Dimitrova, T. Mcgee, L. Agnihotri, J. Zimmerman et al., Integrated multimedia processing for topic segmentation and classification, Proceedings 2001 International Conference on Image Processing (Cat. No.01CH37205), 2001.
DOI : 10.1109/ICIP.2001.958127
URL : http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.10.5638

V. Kettnaker, Time-dependent HMMs for visual intrusion detection, 2003 Conference on Computer Vision and Pattern Recognition Workshop, pp.1-8, 2003.
DOI : 10.1109/CVPRW.2003.10035

E. Kijak, Structuration Multimodal des Vidéos de Sports par Modèles Stochastiques, 2003.

E. Kijak, G. Gravier, P. Gros, L. Oisel, and F. Bimbot, HMM based structuring of tennis videos using visual and audio cues, 2003 International Conference on Multimedia and Expo. ICME '03. Proceedings (Cat. No.03TH8698), pp.309-312, 2003.
DOI : 10.1109/ICME.2003.1221310

E. Kijak, G. Gravier, L. Oisel, and P. Gros, Audiovisual integration for tennis broadcast structuring. Multimedia Tools and Applications, pp.289-311, 2006.
DOI : 10.1007/s11042-006-0031-5
URL : http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.107.3587

K. Kim, J. Choi, N. Kim, and P. K. Kim, Extracting Semantic Information from Basketball Video Based on Audio-Visual Features, CIVR '02: Proceedings of the International Conference on Image and Video Retrieval, pp.278-288, 2002.
DOI : 10.1007/3-540-45479-9_30

K. Lang, A. Waibel, and G. Hinton, A time-delay neural network architecture for isolated word recognition, Neural Networks, vol.3, issue.1, pp.23-43, 1990.
DOI : 10.1016/0893-6080(90)90044-L

Y. Lecun, A theoretical framework for back-propagation, Proceedings of the 1988 Connectionist Models Summer School, pp.21-28, 1988.

R. Leonardi, P. Migliorati, and M. Prandini, Semantic Indexing of Soccer Audio-Visual Sequences: A Multimodal Approach Based on Controlled Markov Chains, IEEE Transactions on Circuits and Systems for Video Technology, vol.14, issue.5, pp.634-643, 2004.
DOI : 10.1109/TCSVT.2004.826751

M. Lew, N. Sebe, C. Djeraba, and R. Jain, Content-based multimedia information retrieval, ACM Transactions on Multimedia Computing, Communication, and Applications, pp.1-19, 2006.
DOI : 10.1145/1126004.1126005

W. N. Lie and C. K. Su, News video classification based on multi-modal information fusion, Proceedings of the International Conference on Image Processing, pp.1213-1216, 2005.

R. Lienhart, RELIABLE TRANSITION DETECTION IN VIDEOS: A SURVEY AND PRACTITIONER'S GUIDE, International Journal of Image and Graphics, vol.01, issue.03, pp.469-486, 2001.
DOI : 10.1142/S021946780100027X

W. Lin and A. Hauptmann, News video classification using SVM-based multimodal classifiers and combination strategies, Proceedings of the tenth ACM international conference on Multimedia , MULTIMEDIA '02, pp.323-326, 2002.
DOI : 10.1145/641007.641075
URL : http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.19.8225

Z. Liu and Q. Huang, Detecting news reporting using audio/visual information, Proceedings of the International Conference on Image Processing, pp.324-328, 1999.

Z. Liu and Y. Wang, Major cast detection in video using both audio and visual information, Proceedings of the IEEE International Conference on Acoustics, Speech, and Signal Processing, 2001.

C. Lu, M. Drew, and J. Au, Classification of summarized videos using hidden markov models on compressed chromaticity signatures, Proceedings of the ninth ACM international conference on Multimedia , MULTIMEDIA '01, pp.479-482, 2001.
DOI : 10.1145/500141.500217

J. Makhoul, F. Kubala, T. Leek, D. Liu, L. Nguyen et al., Speech and language technologies for audio indexing and retrieval, Proceedings of the IEEE, pp.1338-1353, 2000.
DOI : 10.1109/5.880087
URL : http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.208.1129

J. Makhoul, T. Starner, R. Schwartz, and G. Chou, On-line cursive handwriting recognition using hidden Markov models and statistical grammars, Proceedings of the workshop on Human Language Technology , HLT '94, pp.432-436, 1994.
DOI : 10.3115/1075812.1075912

I. Mccowan, D. Gatica-perez, S. Bengio, G. Lathoud, M. Barnard et al., Automatic analysis of multimodal group actions in meetings, IEEE Transactions on Pattern Analysis and Machine Intelligence, vol.27, issue.3, pp.305-317, 2005.
DOI : 10.1109/TPAMI.2005.49

K. Murphy, Dynamic Bayesian Networks: Representation, Inference and Learning, 2002.

J. Nam, M. Alghoniemy, and A. Tewfik, Audio-visual content-based violent scene characterization, Proceedings 1998 International Conference on Image Processing. ICIP98 (Cat. No.98CB36269), pp.353-357, 1998.
DOI : 10.1109/ICIP.1998.723496

M. Naphade and T. S. Huang, A probabilistic framework for semantic video indexing, filtering, and retrieval, IEEE Transactions on Multimedia, vol.3, issue.1, pp.141-151, 2001.
DOI : 10.1109/6046.909601

A. Nefian, A Hidden Markov Model-Based Approach for Face Detection and Recognition, 1999.

A. Nefian, L. H. Liang, X. Pi, X. X. Liu, and K. Murphy, Dynamic Bayesian Networks for Audio-Visual Speech Recognition, EURASIP Journal on Advances in Signal Processing, vol.2002, issue.11, pp.1274-1288, 2002.
DOI : 10.1155/S1110865702206083
URL : http://doi.org/10.1155/s1110865702206083

S. Nepal, U. Srinivasan, and G. Reynolds, Automatic detection of 'Goal' segments in basketball videos, Proceedings of the ninth ACM international conference on Multimedia , MULTIMEDIA '01, pp.261-269, 2001.
DOI : 10.1145/500141.500181

N. Oliver, A. Garg, and E. Horvitz, Layered representations for learning and inferring office activity from multiple sensory channels, Computer Vision and Image Understanding, vol.96, issue.2, pp.163-180, 2004.
DOI : 10.1016/j.cviu.2004.02.004
URL : http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.151.4490

N. Oliver and E. Horvitz, A Comparison of HMMs and Dynamic Bayesian Networks for Recognizing Office Activities, Proceedings of User Modeling, pp.199-209, 2005.
DOI : 10.1007/11527886_26

M. Ostendorf, V. Digalakis, and O. Kimball, From HMM's to segment models: a unified view of stochastic modeling for speech recognition, IEEE Transactions on Speech and Audio Processing, vol.4, issue.5, pp.360-378, 1996.
DOI : 10.1109/89.536930

E. Osuna, R. Freund, and F. Girosi, Training support vector machines: an application to face detection, Proceedings of IEEE Computer Society Conference on Computer Vision and Pattern Recognition, pp.130-136, 1997.
DOI : 10.1109/CVPR.1997.609310

L. Peshkin and G. Mikhail, Segmentation of yeast DNA using hidden Markov models, Bioinformatics, vol.15, issue.12, pp.960-966, 2000.
DOI : 10.1093/bioinformatics/15.12.980

M. Petkovic, V. Mihajlovic, W. Jonker, and S. Djordjevic-kajan, Multi-modal extraction of highlights from TV Formula 1 programs, Proceedings. IEEE International Conference on Multimedia and Expo, pp.817-820, 2002.
DOI : 10.1109/ICME.2002.1035907

M. Petkovic, Z. Zivkovic, and W. Jonker, Recognizing strokes in tennis videos using hidden Markov models, Proceedings of the IASTED International Conference Visualization, Imaging and Image Processing, 2001.

S. Pfeiffer, R. Lienhart, and W. Effelsberg, Scene determination based on video and audio features, Multimedia Tools and Applications, vol.15, issue.1, pp.59-81, 2001.
DOI : 10.1023/A:1011315803415

D. Q. Phung, T. V. Duong, S. Venkatesh, and H. H. Bui, Topic transition detection using hierarchical hidden Markov and semi-Markov models, Proceedings of the 13th annual ACM international conference on Multimedia , MULTIMEDIA '05, pp.11-20, 2005.
DOI : 10.1145/1101149.1101153
URL : http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.394.2646

G. Potamianos, C. Neti, G. Gravier, A. Garg, and A. W. Senior, Recent advances in the automatic recognition of audiovisual speech, Proceedings of the IEEE, pp.1306-1326, 2003.

W. Qi, L. Gu, H. Jiang, X. R. Chen, and H. Zhang, Integrating visual, audio and text analysis for news video, Proceedings of the International Conference on Image Processing, pp.520-523, 2000.

L. Rabiner, A Tutorial on Hidden Markov Models and Selected Applications in Speech Recognition, Proceedings of the IEEE, pp.257-285, 1989.
DOI : 10.1016/B978-0-08-051584-7.50027-9

M. Roach, J. Mason, L. Xu, and F. Stentiford, Recent trends in video analysis : a taxonomy of video classification problems, Proceedings of the International Conference on Internet and Multimedia Systems and Applications, 2002.

F. Rosenblatt, The perceptron: A probabilistic model for information storage and organization in the brain., Psychological Review, vol.65, issue.6, pp.386-408, 1958.
DOI : 10.1037/h0042519

P. J. Rousseeuw and A. M. Leroy, Robust Regression and Outlier Detection, 1987.
DOI : 10.1002/0471725382

Y. Rubner, J. Puzicha, C. Tomasi, and J. Buhmann, Empirical Evaluation of Dissimilarity Measures for Color and Texture, Computer Vision and Image Understanding, vol.84, issue.1, pp.25-43, 2001.
DOI : 10.1006/cviu.2001.0934

Y. Rui, T. S. Huang, and S. Mehrotra, Constructing table-of-content for videos, Multimedia Systems, vol.7, issue.5, pp.359-368, 1999.
DOI : 10.1007/s005300050138

D. Rumelhart, G. Hinton, and R. Williams, Learning representations by back-propagating errors, Nature, vol.85, issue.6088, pp.533-536, 1986.
DOI : 10.1038/323533a0

D. A. Sadlier and N. E. Connor, Event detection in field sports video using audio-visual features and a support vector Machine, IEEE Transactions on Circuits and Systems for Video Technology, vol.15, issue.10, pp.1225-1233, 2005.
DOI : 10.1109/TCSVT.2005.854237

C. Saraceno and R. Leonardi, Identification of story units in audio-visual sequences by joint audio and video processing, Proceedings 1998 International Conference on Image Processing. ICIP98 (Cat. No.98CB36269), 1998.
DOI : 10.1109/ICIP.1998.723500

S. Satoh, Y. Nakamura, and T. Kanade, Name-It: naming and detecting faces in news videos, IEEE Multimedia, vol.6, issue.1, pp.22-35, 1999.
DOI : 10.1109/93.752960

N. Sebe, M. Lew, and A. Smeulders, Video retrieval and summarization, Computer Vision and Image Understanding, vol.92, issue.2-3, pp.141-145, 2003.
DOI : 10.1016/j.cviu.2003.08.003

T. Sejnowski and C. Rosenberg, Parallel networks that learn to pronounce english text, Complex Systems, vol.1, pp.145-168, 1987.

A. Smeulders, M. Worring, S. Santini, A. Gupta, and R. Jain, Content-based image retrieval at the end of the early years, IEEE Transactions on Pattern Analysis and Machine Intelligence, vol.22, issue.12, pp.1349-1380, 2000.
DOI : 10.1109/34.895972

C. Snoek and M. Worring, Multimedia event-based video indexing using time intervals, IEEE Transactions on Multimedia, vol.7, issue.4, pp.638-647, 2005.
DOI : 10.1109/TMM.2005.850966
URL : http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.196.5607

C. Snoek and M. Worring, Multimodal video indexing: A review of the state-ofthe-art . Multimedia Tools and Applications, pp.5-35, 2005.

C. Snoek, M. Worring, and A. Smeulders, Early versus late fusion in semantic video analysis, Proceedings of the 13th annual ACM international conference on Multimedia , MULTIMEDIA '05, pp.399-402, 2005.
DOI : 10.1145/1101149.1101236

T. Starner and A. Pentland, Real-time american sign language recognition from video using hidden Markov models, ISCV '95: Proceedings of the International Symposium on Computer Vision, p.265, 1995.
DOI : 10.1109/iscv.1995.477012
URL : http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.153.6485

G. Sudhir, J. C. Lee, and A. K. Jain, Automatic classification of tennis video for high-level content-based retrieval, Proceedings 1998 IEEE International Workshop on Content-Based Access of Image and Video Database, pp.81-90, 1998.
DOI : 10.1109/CAIVD.1998.646036

E. Trentin and M. Gori, A survey of hybrid ANN/HMM models for automatic speech recognition, Neurocomputing, vol.37, issue.1-4, pp.91-126, 2001.
DOI : 10.1016/S0925-2312(00)00308-8

A. Tritschler, A segmentation-enabled speech recognition application using the BIC criterion, 1998.

B. T. Truong, C. Dorai, and S. Venkatesh, New enhancements to cut, fade, and dissolve detection processes in video segmentation, Proceedings of the eighth ACM international conference on Multimedia , MULTIMEDIA '00, pp.219-227, 2000.
DOI : 10.1145/354384.354481

S. Tsekeridou and I. Pitas, Content-based video parsing and indexing based on audio-visual interaction, IEEE Transactions on Circuits and Systems for Video Technology, vol.11, issue.4, pp.522-535, 2001.
DOI : 10.1109/76.915358

V. N. Vapnik, The Nature of Statistical Learning Theory, 1995.

F. Wang, Y. Ma, H. Zhang, and J. Li, A generic framework for semantic sports video analysis using dynamic Bayesian networks, Proceedings of the 11th International Conference on Multimedia Modeling (MMM 2005), pp.115-122, 2005.

P. Wang, R. Cai, and S. Yang, Tennis Video Analysis Based on Transformed Motion Vectors, Proceedings of the Third International Conference on Image and Video Retrieval, pp.79-87, 2004.
DOI : 10.1007/978-3-540-27814-6_13

Y. Wang, Z. Liu, and J. Huang, Multimedia content analysis-using both audio and visual clues, IEEE Signal Processing Magazine, vol.17, issue.6, pp.12-36, 2000.
DOI : 10.1109/79.888862

R. J. Williams and D. Zipser, Gradient-based learning algorithms for recurrent networks and their computational complexity, Theory, Architectures and Applications, pp.433-486

W. Wolf, Hidden Markov model parsing of video programs, 1997 IEEE International Conference on Acoustics, Speech, and Signal Processing, pp.2609-2611, 1997.
DOI : 10.1109/ICASSP.1997.595323

J. Xi, X. Hua, X. Chen, L. Wenyin, and H. Zhang, A video text detection and recognition system, Proceedings of ICME, pp.1080-1083, 2001.

L. Xie, S. Chang, A. Divakaran, and H. Sun, Unsupervised discovery of multilevel statistical video structures using hierarchical hidden Markov models, Proceedings of the IEEE Intl. Conf. Multimedia and Expo (ICME), 2003.

L. Xie, P. Xu, S. Chang, A. Divakaran, and H. Sun, Structure analysis of soccer video with domain knowledge and hidden Markov models, Pattern Recognition Letters, vol.25, issue.7, pp.767-775, 2004.
DOI : 10.1016/j.patrec.2004.01.005

Z. Xiong, Audio-visual sports highlights extraction using Coupled Hidden Markov Models, Pattern Analysis and Applications, vol.10, issue.2, pp.62-71, 2005.
DOI : 10.1007/s10044-005-0244-7

G. Xu, Y. F. Ma, H. J. Zhang, and S. Q. Yang, An HMM-based framework for video semantic analysis, IEEE Transactions on Circuits and Systems for Video Technology, vol.15, issue.11, pp.1422-1433, 2005.
DOI : 10.1109/TCSVT.2005.856903

B. L. Yeo and B. Liu, Rapid scene analysis on compressed video, IEEE Trans. on Circuits and Systems for Video Technology, vol.5, issue.6, pp.533-544, 1995.

M. Yeung, B. Yeo, and B. Liu, Extracting Story Units from Long Programs for Video Browsing and Navigation, Proceedings of International Conference on Multimedia Computing and Systems, 1996.
DOI : 10.1016/B978-155860651-7/50117-0

R. Zabih, J. Miller, and K. Mai, A feature-based algorithm for detecting and classifying scene breaks, Proceedings of the third ACM international conference on Multimedia , MULTIMEDIA '95, pp.189-200, 1995.
DOI : 10.1145/217279.215266

D. Zhang, D. Gatica-perez, S. Bengio, I. Mccowan, and G. Lathoud, Modeling Individual and Group Actions in Meetings: A Two-Layer HMM Framework, 2004 Conference on Computer Vision and Pattern Recognition Workshop, pp.117-124, 2004.
DOI : 10.1109/CVPR.2004.399

H. J. Zhang, A. Kankanhalli, and S. Smoliar, Automatic partitioning of full-motion video, Multimedia Systems, vol.1, issue.1, pp.10-28, 1993.
DOI : 10.1007/BF01210504

D. Zhong and S. Chang, Structure analysis of sports video using domain models, IEEE International Conference on Multimedia and Expo, 2001. ICME 2001., 2001.
DOI : 10.1109/ICME.2001.1237820