Oceania Stata Conference: Robust statistics, powerful data presentation, machine learning, models and predictions. Using Stata in 2024 to advance your research.

Oceania Stata Conference Presentations

Giovanni Cerulli, Research Institute on Sustainable Economic Growth, National Research Council of Italy, Unit of Rome

Running Machine Learning in Stata: Performance and usability evaluation

This presentation provides a comprehensive survey reviewing machine learning (ML) commands in Stata. It will systematically categorize and summarize the available ML commands in Stata and evaluate their performance and usability for different tasks such as classification, regression, clustering, and dimension reduction. The presentation also provides examples of how to use these commands with real-world datasets and compare their performance. This review aims to help researchers and practitioners choose appropriate ML methods and related Stata tools for their specific research questions and datasets and to improve the efficiency and reproducibility of ML analyses using Stata. It concludes by discussing some limitations and future directions for ML research in Stata.

Links

Watch presentation

Mark Schaffer, Heriot-Watt University

pystacked and ddml: Machine learning for prediction and causal inference in Stata

This presentation explores the pystacked and ddml commands in Stata.

Links

Watch presentation

Meghan Cain, StataCorp

Bayesian model averaging

Are you unsure which predictors to include in your model? Rather than choosing one model, aggregate results across all candidate models to account for model uncertainty with Bayesian model averaging (BMA). Which predictors are important given the observed data? Which models are more plausible? How do predictors relate to each other across different models? BMA can answer these questions and many more. Stata 18 introduced the bma suite of commands to perform BMA in linear regression models. In this talk, you will learn how to explore influential models, make inferences, and obtain better predictions with BMA. I will demonstrate the utility of BMA for any researcher—Bayesian, frequentist, and everyone in between! No prior knowledge of the Bayesian framework is required.

Marea Sing, RBNZ’s Economics Directorate, and Guanyu Zheng, NZ Ministry of Business, Innovation and Employment

Sectoral reallocation and income growth in the labour market during the COVID-19 pandemic

This paper investigates the effects of the COVID-19 pandemic on the labour market in New Zealand. Utilizing a comprehensive administrative dataset, we delve into the intricacies of labour reallocation during the pandemic, while establishing links between these reallocations and two distinct measures of income growth. Our findings reveal that COVID-19 presented as an atypical and relatively persistent reallocation shock to the New Zealand labour market. Notably, the surge in job-to-job transitions primarily stemmed from transitions between industries, rather than those within industries. Moreover, it is these between-industry transitions that exhibited a positive correlation with overall income growth in the labour market.

Links

Watch presentation

Arul Earnest, Monash University

Machine Learning Techniques to Predict Timeliness of Care among Lung Cancer Patients

Delays in the assessment, management, and treatment of lung cancer patients may adversely impact prognosis and survival. This study is the first to use machine learning techniques to predict the quality and timeliness of care among lung cancer patients, utilising data from the Victorian Lung Cancer Registry (VLCR) between 2011 and 2022, in Victoria, Australia.

Andrew Gray, University of Otago

ChatGPT and other large language models: How useful are they to statisticians using Stata?

Some statisticians, including Stata users, are already using ChatGPT and other LLMs, either for answers to questions about statistics, code generation, or data processing (e.g., sentiment analysis). Some researchers may already be using the technology to automatically perform their analyses. This presentation explores these four uses through examples and brief case studies.

Nyi Nyi Naing, Universiti Sultan Zainal Abidin

Beauty of STATA: Relevant and plausible

STATA software makes it easy for users in medical and health sciences research fields because of its easy data transfer from other databases, competent intermediate and advanced statistical methods by both common and menu options, relevant and meaningful output for making inferences, interpretation and conclusion for both interventional (clinical and community trials), and observational studies (Cohort, Case-control and Cross-sectional studies as examples). It is also applicable and friendly to determine minimum required sample size with appropriate power for those studies. Various regression methods, general linear models and cross-sectional time series are frequently used by these researchers. Step by step procedures of statistical analyses using STATA are taught to academic staff in universities, researchers at research institutes, clinicians and health personnel at ministry of health, biostatisticians, epidemiologists and pharmaceutical companies staff from the levels of basic, intermediate to advanced. Output of epidemiological studies are much superior to those of other software in terms of relevance and biological plausibility.

Mark Chatfield, University of Queensland

Nice log (and log-like) scaled axes

In this presentation, Mark will show how to i) create graph commands which nicely label a log-scaled axis and ii) produce a nice log-like-scaled axis showing 0 and ∞. With the exception of meta forestplot, Stata does not automatically label a log-scaled axis with multiplicative labels, e.g. 1/4, 1/2, 1, 2, 4. With a twoway graph, specifying yscale(log) will create a log-scaled y-axis but with additive labels, e.g. 1, 2, 3, 4. The niceloglabels command (Cox 2018) can suggest a variety of nice multiplicative labels, which can benefit community- contributed graph commands that use log-scaled axes. However, decisions still need to be made such as when to choose which set of labels. There is no log- scale equivalent of _natscale to do this for you. I will show how I overcame this for my blandaltman and box_logscale commands (Chatfield 2023). The latter is an example of working with log-transformed data but labelling the axis with multiplicative, original-scale labels. The mylabels command (Cox 2022) is helpful here. I will also show how to use other transformations such as asinh(y/#) or logistic(#*log(y/#)) to produce a nice log-like-scaled axis showing 0 and ∞.

Facilitated Panel Session - Teaching Stata

With Tai Bee Choo, National University of Singapore, Siew-Pang Chan, National University of Singapore, and Chris Erwin, Auckland University of Technology

Stata, a globally recognized software, is pivotal in teaching statistics and data analysis across diverse university disciplines, including biostatistics, economics, econometrics, epidemiology, health sciences, and social sciences. This panel session offers a unique opportunity to delve into the experiences of three distinguished lecturers who have extensively utilized Stata in their teaching endeavors for many years.
Watch presentation

David White and Amy Grant, SDAS

Answering Stata assignments using Generative Artificial Intelligence: An Example

ChatGPT and Bard are now part of the research landscape. They are tools being used daily by students, professionals, academics and researchers. We can choose to ignore them or acknowledge that they have a part in our practice. In this presentation, Amy and David demonstrate how these tools can be used (ineffectively and effectively) to develop answers to real assignment questions using Stata.
Watch presentation

Zumin Shi, Qatar University

EpiTable

Exporting results of multivariable model to a word document can be time consuming. This presentation covers the epitable2 and epitable3 packages developed to create table 2 and table 3 used in epidemiological studies.

Oceania Stata Conference 2024 – Virtual

1 February 2024

Oceania Stata Conference Presentations

Giovanni Cerulli, Research Institute on Sustainable Economic Growth, National Research Council of Italy, Unit of Rome

Running Machine Learning in Stata: Performance and usability evaluation

Links

Mark Schaffer, Heriot-Watt University

pystacked and ddml: Machine learning for prediction and causal inference in Stata

Links

Meghan Cain, StataCorp

Bayesian model averaging

Links

Marea Sing, RBNZ’s Economics Directorate, and Guanyu Zheng, NZ Ministry of Business, Innovation and Employment

Sectoral reallocation and income growth in the labour market during the COVID-19 pandemic

Links

Arul Earnest, Monash University

Machine Learning Techniques to Predict Timeliness of Care among Lung Cancer Patients

Links

Andrew Gray, University of Otago

ChatGPT and other large language models: How useful are they to statisticians using Stata?

Links

Nyi Nyi Naing, Universiti Sultan Zainal Abidin

Beauty of STATA: Relevant and plausible

Links

Mark Chatfield, University of Queensland

Nice log (and log-like) scaled axes

Links

Facilitated Panel Session - Teaching Stata

With Tai Bee Choo, National University of Singapore, Siew-Pang Chan, National University of Singapore, and Chris Erwin, Auckland University of Technology

David White and Amy Grant, SDAS

Answering Stata assignments using Generative Artificial Intelligence: An Example

Zumin Shi, Qatar University

EpiTable

Links