“An introduction to precisely and ggdag: Tools for modern methods in R” – a summary by Ana Sofia Oliveira Gonçalves
On the 4th September 2019, Malcolm Barrett held a lecture on the topic of “An introduction to precisely and ggdag: Tools for modern methods in R”. Malcolm Barrett is a PhD student in Epidemiology at the University of Southern California. He has experience in epidemiology and has worked with R studio.
During his lecture, he introduced two R packages that he has developed: “precisely” and “ggdag”. He then wrapped up his talk by sharing best practices in creating software for epidemiology analysis.
Malcolm first introduced the package “precisely”. Precisely is an R package which calculates sample size based on precision rather than power. It allows researchers to calculate sample sizes for common epidemiology measures, like risk differences, risk ratios and odds ratios. It can be used with R or just as a calculator on the web. It goes hand-in-hand with the recent discussion regarding statistical significance. During the discussion, he commented that the move away from p-values will still take some time. The motivation behind developing this package came from reading an article from Rothman and Greenland on planning study size based on precision. In this package, researchers need to set a desired precision, proportions of exposed to unexposed, group ratio and coverage. It also allows the calculation of precision given the sample size. The package shiny helps to run webapps, thus, people who do not work with R can still use precisely. He highlighted the common wrong interpretations of confidence intervals.
Malcolm proceeded to introduce his package “ggdag”. Ggdag is a package used to create causal diagrams in R. Dagitty does not always create beautiful plots and ggplot2 is the best data visualization tool at the moment. Hence, ggdag aims to integrate dagitty and ggplot2 (and ggraph which is actually part of ggplot2). Dagitty has powerful, robust algorithms and ggplot2 has unlimited flexibility. Ggdag also provides information (graphically) regarding the variables that need to be adjusted/controlled for.
Later on, he gave some insights on designing software for epidemiology. He mentioned that the developed software should be 1) very flexible, in order to automate tedious parts of analysis and be very loud about the difficult part, 2) expressive (modular code is better than monolithic functions), 3) able to fit into the ecosystem. He finished his lecture describing the package he is currently creating, which will be a tool to help clone datasets.