7-8 January, 2024 • Langkawi, Malaysia

Keynote speakers:

Invited speaker:

From Visual Perception to Interpretable Visual Knowledge

Prof. Lin Feng

Speaker: Prof. Lin Feng, Nanyang Technological University, Singapore

Abstract: We use the problem with the fine-grained recognition of human actions as an example to briefly describe how visual perception can be translated into visual knowledge. In general, data-driven machine learning is about induction of general rules from specific observed cases. When the number of cases such as human actions increases, induction, or inductive learning, often becomes computationally prohibitive. Therefore, transduction, or transductive learning, from labelled to unlabelled actions is a computationally affordable approximation. First, Context-Free Grammar & Push-Down Automaton is introduced for automatically generating and labelling task-specific human action videos. The capability of automatically generating large amount of hierarchically labelled videos as training samples is the key to success in deep learning. Secondly, a Spatio-Temporal DNN architecture is designed to be trained with the above hierarchically labelled videos, to provide discriminating and complementary semantic features. And finally, a Transductive Inferencing Digraph is developed to fully exploit the above discriminating features extracted from the labelled and unlabelled action videos. This digraph addresses the capability of transductive learning with incomplete and dynamic graph data, in contrast to the conventional machine learning with complete and static datasets. It will output the discrimination of subtle motions between the similar action videos and their correlations.

Biography: Dr Lin Feng is an Associate Professor with School of Computer Science and Engineering, and the Senate Chair of Academic Council, Nanyang Technological University. His research interest includes biomedical informatics and artificial intelligence. He has published about 300 research papers.

Current Status of AI Support in Medical Imaging and Diagnosis

Prof. Hiroshi Fujita

Speaker: Prof. Hiroshi Fujita, Gifu University, Japan

Abstract: With AI “deep learning” technology, which is a type of “machine learning” (learning functions and rules) in which computers learn by themselves, the accuracy of image recognition has reached a level that exceeds that of humans. Computer-aided diagnosis of medical images, so-called CAD, has rapidly entering the mainstream of practical medicine. Especially in the detection of breast cancer by mammography (breast imaging), it is a part of daily clinical work. In this case, the computer output is used as a “second opinion” to help the doctor interpret the image. However, recent powerful AI technologies, including deep learning, have taken CAD development and performance to the next level, traditional CAD has diversified, and even autonomous diagnostic AI is emerging. This is sometimes called AI-CAD and is gradually shifting from a mere R & D level to a commercialization level, verification at the actual clinical stage, and insurance reimbursement stage. In this talk, we would like to examine and discuss the current state of AI-CAD, and the problems that need to be solved in order to make AI-CAD more practical in clinical practice.

References: H.Fujita, “AI-based computer-aided diagnosis (AI-CAD): The latest review to read first,” Radiological Physics and Technology, vol.13, no.1, pp.6-19, 2020.

G.Lee and H.Fujita (Eds.), “Deep Learning in Medical Image Analysis: Challenges and Applications,” Springer, 2020.

Biography: Prof. Hiroshi Fujita received Ph.D. degree from Nagoya University in 1983. He was a visiting researcher at the K.Rossmann Radiologic Image Laboratory, University of Chicago, in 1983-1986. He became an associate professor in 1991 and a professor in 1995 in the Faculty of Engineering, Gifu University. He has been a professor and chair of intelligent image information since 2002 at the Graduate School of Medicine, Gifu University. He is now a Research Professor of Gifu University. He is a member of the Society for Medical Image Information (Honorary President), the Institute of Electronics, Information and Communication Engineers (IEICE, Fellow), and the Japan Society for Medical Imaging Technology (Honorary member). His research interests include computer-aided diagnosis system, image analysis/processing/evaluation in medicine. Received numerous awards such as the Medical Imaging Information Society Award (2018), RSNA (2001, 6 others), SPIE (1995, 8 others), etc. He has co-published over 1000 papers in Journals, Proceedings, Book chapters and Scientific Magazines.

Gradient centralization and feature gradient decent for deep neural network optimization

Prof. Lei Zhang

Speaker: Prof. Lei Zhang, The Hong Kong Polytechnic University, Hong Kong

Abstract: The normalization methods are very important for the effective and efficient training of deep neural networks (DNNs). Many popular normalization methods operate on weights, such as weight normalization and weight standardization. We propose a very simple yet effective DNN optimization technique, namely gradient centralization (GC), which operates on the gradients of weights directly. GC simply centralizes the gradient vectors to have zero mean. It can be easily embedded into the current gradient based optimization algorithms with just one line of code. GC demonstrates various desired properties, such as accelerating the training process, improving the generalization performance, and the compatibility for fine-tuning pre-trained models. On the other hand, existing DNN optimizers such as stochastic gradient descent (SGD) mostly perform gradient descent on weight to minimize the loss, while the final goal of DNN model learning is to obtain a good feature space for data representation. Instead of performing gradient descent on weight, we propose a method, namely feature SGD (FSGD), to approximate the output feature with one-step gradient descent for linear layers. FSGD only needs to store an additional second-order statistic matrix of input features, and use its inverse to adjust the gradient descent of weight. FSGD demonstrates much better generalization performance than SGD in classification tasks.

Biography: Prof. Lei Zhang (M’04, SM’14, F’18) joined the Department of Computing, The Hong Kong Polytechnic University, as an Assistant Professor in 2006. Since July 2017, he has been a Chair Professor in the same department. His research interests include Computer Vision, Image and Video Analysis, Pattern Recognition, and Biometrics, etc. Prof. Zhang has published more than 200 papers in those areas. As of 2021, his publications have been cited more than 65,000 times in literature. Prof. Zhang is a Senior Associate Editor of IEEE Trans. on Image Processing, and is/was an Associate Editor of IEEE Trans. on Pattern Analysis and Machine Intelligence, SIAM Journal of Imaging Sciences, IEEE Trans. on CSVT, and Image and Vision Computing, etc. He is listed as a “Clarivate Analytics Highly Cited Researcher” consecutively from 2015 to 2020. More information can be found in his homepage http://www4.comp.polyu.edu.hk/~cslzhang/.

Towards Reliable Point Cloud Quality Assessment

Prof. Joao Ascenso

Speaker: Prof. Joao Ascenso, Superior Técnico, Portugal

Abstract: Nowadays, 3D visual representation models such as light fields and point clouds are very popular due to their capability to represent the real world in a more complete, realistic and immersive way, enabling new and more advanced visual experiences. The point cloud representation model is able to efficiently represent the surface of objects and even entire scenes by means of a set of 3D points and associated attributes and is increasingly being used in autonomous cars and augmented reality applications. Nowadays, research and standardization efforts in point cloud processing have become much more intense, especially in the fields of coding, scene classification, object detection, semantic segmentation, super-resolution, reconstruction and so on.

In this context, quality assessment of point cloud data is fundamental to evaluate the impact and performance of several processing steps in a point cloud based communication system, notably denoising, coding and rendering. In this keynote, some of the recent subjective quality assessment studies will be presented and their main findings discussed. Moreover, some of the key advances on objective quality metrics for point cloud data will be surveyed, highlighting their importance in the design of efficient PC based systems, thus paving the way for better and more immersive experiences. New research directions on point cloud quality assessment will be discussed, with respect to topics such as no-reference objective quality assessment, perceptual optimization of point cloud coding solutions and quality assessment for other forms of point cloud data.

Biography: João Ascenso is a professor at the department of Electrical and Computer Engineering of Instituto Superior Técnico, University of Lisbon and is with the Multimedia Signal Processing Group of Instituto de Telecomunicações, Lisbon, Portugal. João Ascenso received the E.E., M. Sc. and Ph.D in Electrical and Computer Engineering from Instituto Superior Técnico, in 1999, 2003 and 2010, respectively. He coordinates the IT participation in several national and international research projects, in the areas of coding, analysis and description of visual information. He is also very active in the ISO/IEC MPEG and JPEG standardization activities and currently chairs the JPEG AI ad-hoc group that targets the evaluation and development of learning-based image compression. He has published more than 100 papers in international conference and journals and has more than 3400 citations over 35 papers (h-index of 26). He is an associate editor of IEEE Transactions on Multimedia, IEEE Transactions on Image Processing and was an associate editor of the IEEE Signal Processing Letters. He is an elected member of the IEEE Multimedia Signal Processing Technical Committee. He acts as member of the Organizing Committees of well-known international conferences, such as IEEE ICME 2020, IEEE MMSP 2020, IEEE ISM 2020, among others. He has received two Best Paper Awards at 31st Picture Coding Symposium 2015, Cairns, Australia and at IEEE International Conference on Multimedia and Expo 2020, Shanghai, China. His current research interests include visual coding, quality assessment, light-fields, point clouds and holography processing, indexing and searching of multimedia content and visual sensor networks.

UHDTV – Present and Future

Prof. Yoshiaki Shishikui

Speaker: Prof. Yoshiaki Shishikui, Meiji University, Japan

Abstract: The era of Ultra High-Definition (UHD) TV, or Super Hi-Vision, has arrived. By 2020 more than half the TV sets shipped worldwide were 4K-UHDTV capable. There are more than 150 4K-UHDTV services available and the journey to 8K-UHDTV continues. 8K broadcasting has already been launched in Japan. In Tokyo 2020, various competition scenes were delivered to homes at 8K every day. UHDTV is a major part of the future of television with additional applications in other areas, such as medicine and surveillance.

The design of 8K-UHDTV targets the realization of the “ultimate 2D image.” the 8K-UHD system parameters were determined based on psychophysical evidence of the perceptual limits of human vision. However, the benefits offered by 8K for daily TV viewing and the degree to which the design goals have been achieved have not been sufficiently verified. Recent studies investigate the psychological effects induced by 8K-UHDTV images through subjective evaluation experiments and validate the hypothesis of viewers experiencing strong psychological effects when watching 8K videos. These studies should elucidate the new values delivered by UHDTV, provide better understanding of the potential of UHDTV services, and enable optimization at each stage of the UHDTV ecosystem.

Biography: Yoshiaki Shishikui received B.S., M.S., and Ph.D. degrees in electrical engineering from the University of Tokyo, Tokyo, Japan, in 1981, 1983, and 1997, respectively.

He joined NHK (Japan Broadcasting Corporation), Tokyo, in 1983. From 1986 to 2014, he worked at NHK Science and Technology Research Laboratories, where was engaged in research on digital signal processing, picture coding, HDTV broadcasting systems, IPTV systems, advanced data broadcasting systems, and UHDTV research activities. He led the Super Hi-Vision public viewing project at the London 2012 Olympics. From 2001 to 2003, he was with NHK Engineering Services Inc. on loan, where he helped develop video archives and video-on-demand systems. In April 2014, he was appointed Professor in the Department of Frontier Media Science of the School of Interdisciplinary Mathematical Sciences at Meiji University. Prof. Shishikui is a fellow of IEICE Japan, ITE Japan, SMPTE and a senior member of IEEE. He was actively involved in standardization activities at SMPTE and ISO-IEC (MPEG).

Data Analytics for Intelligent Transportation

Speaker: Prof. Lap-Pui Chau, Nanyang Technological University

Abstract: .


Lap-Pui Chau received the Bachelor degree from Oxford Brookes University, and the Ph.D. degree from The Hong Kong Polytechnic University, in 1992 and 1997, respectively. He is Assistant Chair (Academic) of School of Electrical and Electronic Engineering, Nanyang Technological University. His research interests include fast visual signal processing algorithms, light-field imaging, video analytics for intelligent transportation system, and human motion analysis. He is an IEEE Fellow.

He was general chairs and program chairs for some international conferences. Besides, he served as associate editors for several IEEE journals.