logo

Program

Saturday, November 12, 2022 (All times are Central Time)

  • 11:00 am – 11:10 am: Welcoming Remark (Chair: Jun Yan, University of Connecticut)
  • 11:10 am – 12:10 pm: Keynote Address (Chair: Linglong Kong, University of Alberta)
    • Douglas Bates, Emeritus Professor of Statistics, University of Wisconsin-Madison:
      Cross-language technologies for statistics and data science To some extent, statisticians and data scientists currently suffer from an "embarrassment of riches" in that there are many high-quality, open-source tools written in R, Python, Julia and other languages available for our work. However, having so many tools available in different languages creates the burden of needing to gain familiarity with many languages and packages - a not-inconsiderable task. This makes finding and learning about cross-language technologies particularly valuable. I will discuss three such tools that I have used with R, Python and Julia and that make the transition between languages much easier. The first is the Arrow storage format, which can be considered as a language-neutral format for storing and easily reading column-oriented tabular data. (Think of it as a language-neutral binary storage format for data frames.) Another is Quarto, which can be considered as a "next-generation RMarkdown" format and processor for literate programming. Those currently using knitr/RMarkdown can transition, more-or-less effortlessly, to Quarto and those who currently use Python or Julia now have an RMarkdown for their environment. (Quarto uses Jupyter, another cross-language technology, for evaluation of Julia and Python code blocks). Finally the VS Code editor provides editing and code evaluation for the data science languages I mentioned, and for Jupyter notebooks, and for several other languages and data formats. By gaining familiarity with these tools a data scientist can make the transition between analysis languages less traumatic.
  • 12:15 pm – 02:00 pm: Data Jamboree (Chair: Lucy D’Agostino McGowan, Wake Forest University): Each language, in alphabetical order, will be allocated 25 minutes to tackle the same problems from the NYC Crash data, followed by questions.
    • Julia: Josh Day, Senior Research Scientist, Julia Computing. Code
    • Python: Dan Chen, Postdoctoral Research and Teaching Fellow, University of British Columbia. Code
    • R: Sam Tyner, Data Scientist, Tritura. Code
  • 02:00 pm – 03:30 pm: Panel Discussion, Frontliners and Next Frontiers of Statistical Computing in Data Science (Moderator: Kun Chen, University of Connecticut)
    • Hannah Frick, Software Engineer, Posit
    • Haoda Fu, Associate Vice President Enterprise Lead Machine Learning and AI, Eli Lilly.
    • Eric Kolaczyk, Professor of Statistics and Inaugural director of the Computational and Data Systems Initiative (CDSI), McGill University.
    • Teresa Filshtein Sönmez, Biostatistician, Quantitative Scientist, 23andMe.