Academic Report of Dr. Zhao He from Monash University, Australia

Date:2021-07-07 Click:

Title: Short Text Analysis and Topic Structure Mining Based on Metadata Topic Model

Abstract: As a kind of probabilistic generative model, the topic model is mainly applied to discrete data and assumes that the input data is generated by a number of “latent factors”. In text analysis, these “latent factors” usually contain some specific meanings, and each meaning can be explained by a set of specific terms, so these “latent factors” can also be called “topics”. Therefore, the term frequency in an article can be regarded as being generated by a number of topics with different meanings, and the proportion of each topic in the article is also different. In the latest two decades, topic models have been widely used and achieved great success in Machine Learning, Data Mining, and Natural Language Processing.


However, traditional topic models only rely on the information of term frequency in the text to mine the topic, which limits the application of these models in short text analysis and topic structure discovery. In specific, the Internet, Social Networks, Mobile Applications, and so on have generated a large number of short text data in recent years, such as Micro-blog, Product Reviews, News Headlines, and so on. In these short texts, due to insufficient information of term frequency, the traditional topic model may not be able to mine meaningful topics. On the other hand, many existing models make assumptions about the independence of multiple topics. However, it is easy for us to find that there are semantic relevance and even structure between different topics. In order to mine topic structure, we usually need to increase the complexity of the model, and then the model training requires more information of term frequency. In the above two fields, balancing the conflict between the complexity of the model and the richness of information of term frequency is the key issue that we need to solve. In addition to the information of term frequency, there are a large number of different types of metadata in the text generated on the Internet, such as the author, category, and time of the article, as well as Word Similarity and Word Embedding. These metadata information can be used to enrich the information of term frequency information and help us to solve the problems of topic models in the above-mentioned fields.


In order to improve the performance and interpretability of topic models in short text analysis and topic structure mining by using a variety of metadata, we have proposed several theoretical methods in our research. These methods have achieved good results and a wide range of applications in text analysis based on topic models, such as text classification, clustering, and visualization, and have been published in ICML, NeurIPS, ACL, ICDM. In this report, I will systematically introduce our work in this area.


Biography: Dr. Zhao He is currently a researcher at the School of IT of Monash University, Australia. He obtained his Bachelor’s Degree and Master’s Degree from Nankai University and Nanjing University respectively, and his Doctoral degree from Monash University in 2019. His main research interests are Machine Learning Based on Statistics, especially Bayesian Modeling and Statistical Inference for Large-scale Complex Data, and its applications in Natural Language Processing, Graph Models, Collaborative Filtering, and Computer Vision. His research is devoted to completing the representation learning of complex data, uncertainty analysis, and the understanding of its generation mechanism and dynamic changes in an automated manner. Currently, he focuses on using Deep Learning to improve the performance, efficiency, robustness, and scalability of probabilistic modeling and inference on big data. His achievements have been published in first-class Machine Learning, Natural Language Processing, Data Mining conferences, such as ICML, NeurIPS, ACL, AISTATS, ICDM. He is a program committee member and journal reviewer of several international conferences, such as ICML, NeurIPS, ICLR, AISTATS, AAAI, IEEE Transactions on Pattern Analysis and Machine Intelligence, Machine Learning Journal, etc.

Time: 10: 00-11: 30, Thursday, November 28, 2019

Venue: Room 601, Administration Building, Central Campus, Jilin University

Organizer: School of Artificial Intelligence, Jilin University