Oceania Stata Conference 2024 – Virtual

1 February 2024

Oceania Stata Conference 2024

Oceania Stata Conference Presentations

Speakers in order of presentation

Giovanni Cerulli

Running Machine Learning in Stata: Performance and usability evaluation

This presentation provides a comprehensive survey reviewing machine learning (ML) commands in Stata. It will systematically categorize and summarize the available ML commands in Stata and evaluate their performance and usability for different tasks such as classification, regression, clustering, and dimension reduction. The presentation also provides examples of how to use these commands with real-world datasets and compare their performance. This review aims to help researchers and practitioners choose appropriate ML methods and related Stata tools for their specific research questions and datasets and to improve the efficiency and reproducibility of ML analyses using Stata. It concludes by discussing some limitations and future directions for ML research in Stata.

About the speaker

Giovanni Cerulli is researcher director at IRCrES-CNR, Research Institute on Sustainable Economic Growth, National Research Council of Italy, Unit of Rome. His research interest is in applied econometrics, with a special focus on causal inference, program evaluation, and machine learning applied to various fields of the social and epidemiological sciences. Giovanni has developed original causal inference models, such as dose-response and treatment models with social interaction providing Stata implementation. He has developed around twenty Stata commands for casual inference and machine learning working also on Stata/Python/R integration for this purpose. Giovanni is author of the book Econometric Evaluation of Socio-Economic Programs: Theory and Applications (Springer, 2015; second edition 2022), and of the forthcoming book Fundamentals of Supervised Machine Learning: with Applications in Python, R, and Stata (Springer). He has published his papers in several high-quality scientific journals, and is currently editor-in-chief of the International Journal of Computational Economics and Econometrics.


Mark Schaffer

pystacked and ddml: Machine learning for prediction and causal inference in Stata

About the speaker

Mark Schaffer is Professor of Economics at Heriot-Watt University, Edinburgh, UK. Prof. Schaffer graduated magna cum laude from Harvard University in 1982, and holds degrees in economics from Stanford University (MA, 1985) and the London School of Economics (MSc, 1983 and PhD, 1990). His fields of research include transition and emerging economies, labour markets, applied econometrics, economic history, quantitative criminology and energy economics.

Prof. Schaffer is also a Fellow of the Royal Society of Edinburgh and of the Royal Society of Arts, Manufactures and Commerce, and a Research Fellow of the Centre for Economic Policy Research in London and IZA, the Institute for the Study of Labor. He has worked in the past for organizations such as the IMF, the World Bank, EBRD, the United Nations, and the Department for International Development of the UK Government.


Meghan Cain

Bayesian model averaging

Are you unsure which predictors to include in your model? Rather than choosing one model, aggregate results across all candidate models to account for model uncertainty with Bayesian model averaging (BMA). Which predictors are important given the observed data? Which models are more plausible? How do predictors relate to each other across different models? BMA can answer these questions and many more.

Stata 18 introduced the bma suite of commands to perform BMA in linear regression models. In this talk, you will learn how to explore influential models, make inferences, and obtain better predictions with BMA. I will demonstrate the utility of BMA for any researcher—Bayesian, frequentist, and everyone in between! No prior knowledge of the Bayesian framework is required.

About the speaker

Meghan Cain is the Assistant Director, Educational Services at StataCorp LLC. She earned her PhD in quantitative psychology from the University of Notre Dame, where her research focused on structural equation modeling, multilevel modeling, and Bayesian statistics. At Stata, she develops and presents training on these and other topics. She also conducts webinars, works with developers to produce Stata documentation, and contributes to Stata blogs.


Sectoral reallocation and income growth in the labour market during the COVID-19 pandemic

This paper investigates the effects of the COVID-19 pandemic on the labour market in New Zealand. Utilizing a comprehensive administrative dataset, we delve into the intricacies of labour reallocation during the pandemic, while establishing links between these reallocations and two distinct measures of income growth. Our findings reveal that COVID-19 presented as an atypical and relatively persistent reallocation shock to the New Zealand labour market. Notably, the surge in job-to-job transitions primarily stemmed from transitions between industries, rather than those within industries. Moreover, it is these between-industry transitions that exhibited a positive correlation with overall income growth in the labour market.

Marea Sing

About Marea Sing

Marea is the acting manager of the Modelling team in the RBNZ’s Economics Directorate. She has an MPhil in Economics from the University of Oxford, and a MIT in Big Data Science from the University of Pretoria. Her career has focussed on forecasting and modelling for monetary and macroprudential policy.

Guanyu Zheng

About Guanyu Zheng

Guanyu Zheng is a Principal Analyst at Ministry of Business, Innovation and Employment. He is currently working towards a Philosophy of Masters degree in Economics at Auckland University of Technology. He specialises in applied econometrics using administrative data on firm performance and labour market dynamics.

Arul Earnest

Machine Learning Techniques to Predict Timeliness of Care among Lung Cancer Patients

Delays in the assessment, management, and treatment of lung cancer patients may adversely impact prognosis and survival. This study is the first to use machine learning techniques to predict the quality and timeliness of care among lung cancer patients, utilising data from the Victorian Lung Cancer Registry (VLCR) between 2011 and 2022, in Victoria, Australia.

About the speaker

Professor Earnest is a senior biostatistician with the Biostatistics Unit & deputy head, Clinical Outcomes data Reporting and Research Program (CORRP) at Monash University, where he leads the analytics group for several clinical registries. His research interests include Bayesian spatio-temporal models and machine learning. He enjoys conducting workshops in Stata.


Andrew Gray

ChatGPT and other large language models: How useful are they to statisticians using Stata?

Some statisticians, including Stata users, are already using ChatGPT and other LLMs, either for answers to questions about statistics, code generation, or data processing (e.g., sentiment analysis). Some researchers may already be using the technology to automatically perform their analyses. This presentation explores these four uses through examples and brief case studies.

About the speaker

Andrew Gray is a biostatistician in the Biostatistics Centre, University of Otago, where he collaborates on a wide range of health-related research projects as well as pursuing his own research. Prior to this, Andrew worked in a knowledge engineering research group in the Department of Information Science, University of Otago.


Nyi Nyi Naing

Beauty of STATA: Relevant and plausible

STATA software makes it easy for users in medical and health sciences research fields because of its easy data transfer from other databases, competent intermediate and advanced statistical methods by both common and menu options, relevant and meaningful output for making inferences, interpretation and conclusion for both interventional (clinical and community trials), and observational studies (Cohort, Case-control and Cross-sectional studies as examples). It is also applicable and friendly to determine minimum required sample size with appropriate power for those studies. Various regression methods, general linear models and cross-sectional time series are frequently used by these researchers. Step by step procedures of statistical analyses using STATA are taught to academic staff in universities, researchers at research institutes, clinicians and health personnel at ministry of health, biostatisticians, epidemiologists and pharmaceutical companies staff from the levels of basic, intermediate to advanced. The favourite features of STATA based on feedback by users include log file, do file and ado file. Output of epidemiological studies are much superior to those of other software in terms of relevance and biological plausibility. The regular added features of STATA in new versions make the users more adhered to the software because of up-to-date applications to our particular field of research.

About the speaker

Nyi Nyi Naing works as a lecturer at faculty of medicine. He is medically trained and specialized in public health medicine then sub-specialized in Biostatistics. His core teaching includes medical statistics, research methodology and statistical software application in medical research. He utilizes STATA software in his teaching and research.


Facilitated Panel Session - Teaching Stata

Stata, a globally recognized software, is pivotal in teaching statistics and data analysis across diverse university disciplines, including biostatistics, economics, econometrics, epidemiology, health sciences, and social sciences. This panel session offers a unique opportunity to delve into the experiences of three distinguished lecturers who have extensively utilized Stata in their teaching endeavors for many years.

Tai Bee Choo

About Tai Bee Choo

A/Prof Tai Bee Choo is an Associate Professor at the Saw Swee Hock School of Public Health with a joint appointment at the Yong Loo Lin School of Medicine, National University of Singapore. She is the course director of several courses which use STATA as the statistical tool of application.

Siew-Pang Chan

About Siew-Pang Chan

Dr Siew-Pang Chan holds postgraduate degrees in Decision Sciences, Medical Statistics, Industrial Engineering and Financial Engineering. He has taught Analytics, Calculus and Statistics in Singapore, Australia and the United States. As an ardent enthusiast of Stata, he has 25 years of experience in using the all-purpose statistics software package.

Chris Erwin

About Chris Erwin

Dr. Erwin is an economist and lecturer at Auckland University of Technology. His research focuses on applied microeconomics, including the economics of higher education and labor economics. He is currently using Stata to teach Economic Policy Evaluation at AUT and has used Stata in other economics courses in both New Zealand and the United States.

Mark Chatfield

Nice log (and log-like) scaled axes

In this presentation, Mark will show how to i) create graph commands which nicely label a log-scaled axis and ii) produce a nice log-like-scaled axis showing 0 and ∞.

With the exception of meta forestplot, Stata does not automatically label a log-scaled axis with multiplicative labels, e.g. 1/4, 1/2, 1, 2, 4. With a twoway graph, specifying yscale(log) will create a log-scaled y-axis but with additive labels, e.g. 1, 2, 3, 4. The niceloglabels command (Cox 2018) can suggest a variety of nice multiplicative labels, which can benefit community- contributed graph commands that use log-scaled axes. However, decisions still need to be made such as when to choose which set of labels. There is no log- scale equivalent of _natscale to do this for you. I will show how I overcame this for my blandaltman and box_logscale commands (Chatfield 2023). The latter is an example of working with log-transformed data but labelling the axis with multiplicative, original-scale labels. The mylabels command (Cox 2022) is helpful here. I will also show how to use other transformations such as asinh(y/#) or logistic(#*log(y/#)) to produce a nice log-like-scaled axis showing 0 and ∞.

About the speaker

Mark Chatfield is a biostatistician. He has enjoyed using Stata for 20+ years. He is currently thinking hard about log scales, geometric means, geometric SDs, and various ways of defining relative differences.


Answering Stata assignments using Generative Artificial Intelligence: An Example

ChatGPT and Bard are now part of the research landscape. They are tools being used daily by students, professionals, academics and researchers. We can choose to ignore them or acknowledge that they have a part in our practice. In this presentation, Amy and David demonstrate how these tools can be used (ineffectively and effectively) to develop answers to real assignment questions using Stata.

David White

About David White

David is a director with SDAS. He enjoys helping his clients drive value through their data. He has been performing data analytics using various tools since the early 1990s and now primarily uses Stata for audit analytics, business analysis and data management. He is part of the team at SDAS presenting the SDAS Stata Webinar Series.

Amy Grant

About Amy Grant

Amy is part of the team at Survey Design and Analysis Services helping with administration, technical support and content creation. She is also studying a Bachelor of Statistics at the Australian National University.

Zumin Shi


Exporting results of multivariable model to a word document can be time consuming. This presentation covers the epitable2 and epitable3 packages developed to create table 2 and table 3 used in epidemiological studies.

About the speaker

Dr. Zumin Shi is a Professor of Nutrition at Qatar University, with expertise in biostatistics, epidemiology, nutrition, and public health. He has over 300 peer-reviewed publications and is listed in Stanford University’s list of the top 2% scientists in the world.