Keynote Speakers

4-6 January, 2022 • Hong Kong

Keynote speakers:

Invited speaker:

Gradient centralization and feature gradient decent for deep neural network optimization

Speaker: Prof. Lei Zhang, The Hong Kong Polytechnic University, Hong Kong

Abstract: The normalization methods are very important for the effective and efficient training of deep neural networks (DNNs). Many popular normalization methods operate on weights, such as weight normalization and weight standardization. We propose a very simple yet effective DNN optimization technique, namely gradient centralization (GC), which operates on the gradients of weights directly. GC simply centralizes the gradient vectors to have zero mean. It can be easily embedded into the current gradient based optimization algorithms with just one line of code. GC demonstrates various desired properties, such as accelerating the training process, improving the generalization performance, and the compatibility for fine-tuning pre-trained models. On the other hand, existing DNN optimizers such as stochastic gradient descent (SGD) mostly perform gradient descent on weight to minimize the loss, while the final goal of DNN model learning is to obtain a good feature space for data representation. Instead of performing gradient descent on weight, we propose a method, namely feature SGD (FSGD), to approximate the output feature with one-step gradient descent for linear layers. FSGD only needs to store an additional second-order statistic matrix of input features, and use its inverse to adjust the gradient descent of weight. FSGD demonstrates much better generalization performance than SGD in classification tasks.

Biography: Prof. Lei Zhang (M’04, SM’14, F’18) joined the Department of Computing, The Hong Kong Polytechnic University, as an Assistant Professor in 2006. Since July 2017, he has been a Chair Professor in the same department. His research interests include Computer Vision, Image and Video Analysis, Pattern Recognition, and Biometrics, etc. Prof. Zhang has published more than 200 papers in those areas. As of 2021, his publications have been cited more than 65,000 times in literature. Prof. Zhang is a Senior Associate Editor of IEEE Trans. on Image Processing, and is/was an Associate Editor of IEEE Trans. on Pattern Analysis and Machine Intelligence, SIAM Journal of Imaging Sciences, IEEE Trans. on CSVT, and Image and Vision Computing, etc. He is listed as a “Clarivate Analytics Highly Cited Researcher” consecutively from 2015 to 2020. More information can be found in his homepage

Current Status of AI Support in Medical Imaging and Diagnosis

Speaker: Prof. Hiroshi Fujita, Gifu University, Japan

Abstract: With AI “deep learning” technology, which is a type of “machine learning” (learning functions and rules) in which computers learn by themselves, the accuracy of image recognition has reached a level that exceeds that of humans. Computer-aided diagnosis of medical images, so-called CAD, has rapidly entering the mainstream of practical medicine. Especially in the detection of breast cancer by mammography (breast imaging), it is a part of daily clinical work. In this case, the computer output is used as a “second opinion” to help the doctor interpret the image. However, recent powerful AI technologies, including deep learning, have taken CAD development and performance to the next level, traditional CAD has diversified, and even autonomous diagnostic AI is emerging. This is sometimes called AI-CAD and is gradually shifting from a mere R & D level to a commercialization level, verification at the actual clinical stage, and insurance reimbursement stage. In this talk, we would like to examine and discuss the current state of AI-CAD, and the problems that need to be solved in order to make AI-CAD more practical in clinical practice.

References: H.Fujita, “AI-based computer-aided diagnosis (AI-CAD): The latest review to read first,” Radiological Physics and Technology, vol.13, no.1, pp.6-19, 2020.

G.Lee and H.Fujita (Eds.), “Deep Learning in Medical Image Analysis: Challenges and Applications,” Springer, 2020.

Biography: Prof. Hiroshi Fujita received Ph.D. degree from Nagoya University in 1983. He was a visiting researcher at the K.Rossmann Radiologic Image Laboratory, University of Chicago, in 1983-1986. He became an associate professor in 1991 and a professor in 1995 in the Faculty of Engineering, Gifu University. He has been a professor and chair of intelligent image information since 2002 at the Graduate School of Medicine, Gifu University. He is now a Research Professor of Gifu University. He is a member of the Society for Medical Image Information (Honorary President), the Institute of Electronics, Information and Communication Engineers (IEICE, Fellow), and the Japan Society for Medical Imaging Technology (Honorary member). His research interests include computer-aided diagnosis system, image analysis/processing/evaluation in medicine. Received numerous awards such as the Medical Imaging Information Society Award (2018), RSNA (2001, 6 others), SPIE (1995, 8 others), etc. He has co-published over 1000 papers in Journals, Proceedings, Book chapters and Scientific Magazines.

From Visual Perception to Interpretable Visual Knowledge

Speaker: Prof. Lin Feng, Nanyang Technological University, Singapore

Abstract: We use the problem with the fine-grained recognition of human actions as an example to briefly describe how visual perception can be translated into visual knowledge. In general, data-driven machine learning is about induction of general rules from specific observed cases. When the number of cases such as human actions increases, induction, or inductive learning, often becomes computationally prohibitive. Therefore, transduction, or transductive learning, from labelled to unlabelled actions is a computationally affordable approximation. First, Context-Free Grammar & Push-Down Automaton is introduced for automatically generating and labelling task-specific human action videos. The capability of automatically generating large amount of hierarchically labelled videos as training samples is the key to success in deep learning. Secondly, a Spatio-Temporal DNN architecture is designed to be trained with the above hierarchically labelled videos, to provide discriminating and complementary semantic features. And finally, a Transductive Inferencing Digraph is developed to fully exploit the above discriminating features extracted from the labelled and unlabelled action videos. This digraph addresses the capability of transductive learning with incomplete and dynamic graph data, in contrast to the conventional machine learning with complete and static datasets. It will output the discrimination of subtle motions between the similar action videos and their correlations.

Biography: Dr Lin Feng is an Associate Professor with School of Computer Science and Engineering, and the Senate Chair of Academic Council, Nanyang Technological University. His research interest includes biomedical informatics and artificial intelligence. He has published about 300 research papers.

UHDTV – Present and Future

Speaker: Prof. Yoshiaki Shishikui, Meiji University, Japan

Abstract: The era of Ultra High-Definition (UHD) TV, or Super Hi-Vision, has arrived. By 2020 more than half the TV sets shipped worldwide were 4K-UHDTV capable. There are more than 150 4K-UHDTV services available and the journey to 8K-UHDTV continues. 8K broadcasting has already been launched in Japan. In Tokyo 2020, various competition scenes were delivered to homes at 8K every day. UHDTV is a major part of the future of television with additional applications in other areas, such as medicine and surveillance.

The design of 8K-UHDTV targets the realization of the “ultimate 2D image.” the 8K-UHD system parameters were determined based on psychophysical evidence of the perceptual limits of human vision. However, the benefits offered by 8K for daily TV viewing and the degree to which the design goals have been achieved have not been sufficiently verified. Recent studies investigate the psychological effects induced by 8K-UHDTV images through subjective evaluation experiments and validate the hypothesis of viewers experiencing strong psychological effects when watching 8K videos. These studies should elucidate the new values delivered by UHDTV, provide better understanding of the potential of UHDTV services, and enable optimization at each stage of the UHDTV ecosystem.

Biography: Yoshiaki Shishikui received B.S., M.S., and Ph.D. degrees in electrical engineering from the University of Tokyo, Tokyo, Japan, in 1981, 1983, and 1997, respectively.

He joined NHK (Japan Broadcasting Corporation), Tokyo, in 1983. From 1986 to 2014, he worked at NHK Science and Technology Research Laboratories, where was engaged in research on digital signal processing, picture coding, HDTV broadcasting systems, IPTV systems, advanced data broadcasting systems, and UHDTV research activities. He led the Super Hi-Vision public viewing project at the London 2012 Olympics. From 2001 to 2003, he was with NHK Engineering Services Inc. on loan, where he helped develop video archives and video-on-demand systems. In April 2014, he was appointed Professor in the Department of Frontier Media Science of the School of Interdisciplinary Mathematical Sciences at Meiji University. Prof. Shishikui is a fellow of IEICE Japan, ITE Japan, SMPTE and a senior member of IEEE. He was actively involved in standardization activities at SMPTE and ISO-IEC (MPEG).