GitHub

Building this document

To build this README, run build_readme.R. Talks data is in csv talks_table.csv

Workshops

Talks - Day 1

Coline Zeballos (Roche)
R Package Validation at Roche

Abstract

R package validation is in all our minds since the pharmaceutical industry started moving away from SAS to R for its statistical analysis and regulatory submissions. Opting for open source programming requires to revisit our way of validating code, internally but also in a cross-Pharma effort when it comes to CRAN. Roche will present its approach to R package validation, and share some material for you to apply.

Slides Youtube

Juliane Manitz (EMD Serono)
Statistical Analysis and Pathway to a Risk-based Assessment of R packages at Merck KGaA/EMD Serono

Abstract

Like many other companies, Merck KGaA/EMD Serono has embarked on their journey to enable the use R for regulatory submissions. Following the framework introduced by the R validation hub (Nicholls et al., 2020), we started to develop an algorithm to qualify a CRAN package as a Merck standard package. In a nutshell: If an R package passes the installation qualification and successfully executes available tests, the package will be made available to the user. Then, an automated risk assessment of R packages is performed based on the test coverage score (more is better) and the riskmetric score generated from the meta-information (smaller is better). If pre-defined thresholds are fulfilled, the package is qualified as Merck standard package, otherwise an explicit (manual) risk assessment is needed. In this presentation, we introduce our pathway to a risk-based assessment of R packages at Merck. We provide relevant details on the statistical analysis which led to the definition of thresholds supporting a robust classification of CRAN packages as Merck standard packages. We want to inspire other companies and seek feedback from the community.

Slides Youtube

Thomas Neitmann and Teckla Akinyi (Roche | GSK)
Introducing {admiral}?The ADaM in R Asset Library

Abstract

Slides Youtube

Volha Tryputsen, Wilson Tendong and Kathy Mutambanengwe (Janssen | Open Analytics)
R/Shiny tools for Immune Fitness Exploration

Abstract

Slides Youtube

Pawel Rucki (Roche)
5 testing packages to make your package better!

Abstract

In this short talk I will present few packages that can be used inside package testing framework that will help to increase overall quality of a package. The main point of focus would be static R code analysis tools such as well-known {{codetools}} or {{lintr}} and also less popular packages such as {{prefixer}}. For each of them, I am going to give a short introduction, present its configuration capabilities and how to use them within {{testthat}} framework.

Slides Youtube

Charlotte Soneson, Kevin Rue-Albrecht, Federico Marini, Aaron Lun (Friedrich Miescher Institute)
Interactive and collaborative exploration of large-scale transcriptomics data with iSEE

Abstract

Detailed exploration of large transcriptomics datasets, increasingly available at single-cell resolution, is a time-consuming task which often requires the complementary skill sets of data analysts and experimental scientists to complete analyses and interpretation in an efficient manner. The iSEE (Interactive SummarizedExperiment Explorer) R/Bioconductor software package (https://bioconductor.org/packages/iSEE/), built on the shiny R framework, provides a general-purpose graphical interface for exploring any rectangular dataset with additional sample and feature annotations, such as single-cell RNA-seq data. Users can create, configure, and interact with the iSEE interface, enabling quick iterations of data visualization. This facilitates generation of new scientific hypotheses and insights into biological phenomena, and empowers a wide range of researchers to explore their data in depth. iSEE also guarantees the reproducibility of the analysis, by reporting the code generating all the output elements as well as the layout and configuration of the user interface. The combination of interactivity and reproducibility makes iSEE an ideal candidate to bridge and complement the expertise of researchers, who are able to design flexible, accessible, and robust dashboards that can also be directly shared and deployed in collaborative contexts - connecting large data collections to broad audiences, thus further increasing the value of generated research data.

Slides Youtube

Kevin Snyder (FDA)
SENDing Toxicology Study Data Analysis into the 21st Century with a New R Package: sendigR.

Abstract

The CDISC-SEND data standard has created new opportunities for collaborative development of open-source software solutions to facilitate cross-study analyses of toxicology study data. A public private partnership between BioCelerate and FDA/CDER was established in part to develop and publicize novel methods of extracting value from SEND datasets. As part of this work in collaboration with PHUSE, an R package, sendigR, has been developed to enable end users to easily construct a relational database from any collection of SEND datasets and then query that database to perform cross-study analyses. The package includes an R Shiny application with a graphical user interface, allowing users who are not familiar with the R programming language to perform cross-study analysis. Experienced R programmers, on the other hand, will be able to integrate the package functions into their own custom scripts/packages and potentially contribute improvements to the functionality of sendigR.

Slides Youtube

Talks - Day 2

Regis James (Regeneron)
Intermap: An integrative multiomics approach to generating therapeutic target hypotheses

Abstract

In this talk, we will be discussing an architecturally and bioinformatically multi-layered integrative multiomic approach to the development of target hypotheses.

Scientists work to help pharmaceutical companies advance towards the identification of potent therapeutics on a daily basis. In some scenarios, biological scientists can develop therapeutic tools without a specific target in mind. In this case, they would like to generate a list of potential targets for their tools, within a given set of parameters for the delivery. However, combing through all of the appropriate databases to find these targets that have the appropriate molecular biology characteristics, viable mouse models that recapitulate the human disease phenotypes, and pathologies in the tissues of interest, to generate this list is very difficult to perform manually.

This work requires making recursive decisions from the present wealth of biological literature and its data at scale. Such decision-making is a herculean task that requires the simultaneous propagated joins of annotated entity catalogs (genes, knockout mice, diseases, structured vocabulary terms, etc.) and, orthogonally, recursive filtration of hierarchical associations between those entities and controlled biomedical vocabularies.

To streamline and accelerate this process, we used public data repositories (Uniprot, National Center for Biotechnology Information, International Mouse Phenotyping Consortium, Online Mendelian Inheritance in Man), ontologies (Gene Ontology, Mammalian Phenotype Ontology, Human Phenotype Ontology), and their multi-species (mouse, human) entity annotations to populate and index a MySQL relational database and a Neo4j graph database with their descriptive and relational properties.

We then built an API (application programming interface) via the plumber package for R to dynamically generate optimized SQL and Neo4j Cypher queries that interact with the MySQL database, via the RMariaDB package for R, and the Neo4j graph database, via the neo4r package for R, to fuse data across the ingested biomedical repository data and use the yielded results to generate parseable JSON objects.

Finally, we built a user-friendly shiny app for constructing and submitting queries via the API, parsing the JSON API outputs, and providing interactive network visualizations of the queries via the VisNet package for R, in-depth explanations of how the results were generated, and links to external resources for further relevant scientific data. We delivered this app to fellow scientist collaborators via RStudio Connect, enabling these biologists to, within milliseconds, leverage high-dimensional, multi-species relationships to identify potential targets.

[Slides](https://github.com/rinpharma/2021_presentations/tree/master/talks_folder/2021-James-Intermap.pdf) [Youtube](https://youtu.be/fHO6PpJoL2k)

Monika Huhn, Saleha Patel, Kalliopi Tsafou, Bj?rn Magnusson, Elke Ericson, Wen Yu, Eliseo Papa, Stefano Bragaglia, Claire Donoghue, Faisal Khan, Hitesh Sanganee, Shameer Khader, Adrian Freeman (Astra Zeneca)
OneView ? A Shiny app to unlock the full potential of drug repositioning investigations

Abstract

Drug repositioning is an area of growing interest in drug development that can accelerate the discovery of new treatment options to benefit patients worldwide. Briefly, drug repositioning refers to the systematic investigation of a novel disease indication for a drug molecule. Drug repositioning can be accelerated using various tools and technologies, including intelligent dashboards, data integration and human-in-the-loop machine learning. A typical drug repositioning investigation generates a large amount that often needs to be linked and interpreted using a visual grammar familiar to various scientific groups leading drug repositioning investigation. We developed OneView - a shiny app that enables seamless integration, computing and visualization to accelerate drug repositioning investigations. As in many clinical and pre-clinical projects, the problem that OneView tries to solve is to connect biologists and clinicians with the data in a meaningful way. The core data behind the dashboard are from an analysis comparing transcriptomic signatures of drug molecules with hundreds of disease transcriptomic signatures, creating connections between a compound and diseases based on an inverse correlation between the transcriptomic signatures. To fully understand the significance of the relationships, OneView provides a dynamic dashboard enabling scientists to filter/search within the data, follow connections through multiple datasets, and provide meaningful interactive visualizations. We have incorporated additional data from several internal knowledge repositories to find further evidence to substantiate potential links between a compound and a disease.

From a technical aspect, the most challenging part has been visualizing the data in the best way. A lot of the interesting information is in the standard connections of different elements in the data - such as common genes in multiple mappings between compound and disease signatures. In many cases, network plots were too busy to display those connections meaningfully. Instead, UpSet plots were found the best way to visualize interactions between multiple sets. While several packages are implementing UpSet plots in R, none of them allowed for interactive visualizations. To allow interaction with the visualization and further drilling down the data by selecting bars in the graph, we implemented our version of UpSet plots using the JavaScript library D3.

Slides Youtube

Annekathrin Ludt (IMBEI)
GeneTonic: enjoying the interpretation of your RNA-seq data analysis

Abstract

RNA-seq transcriptome analysis workflows often generate the essential information (data and results) distributed among a variety of different tabular files and formats, e.g. raw and normalized expression values, results of differential gene expression analysis, or functional enrichment analysis. The efficient interpretation of the results can be hampered due to this fragmentation, and the same can happen even when providing static analysis reports.

We developed the GeneTonic package (https://bioconductor.org/packages/GeneTonic/), containing a Shiny application which provides an efficient and interactive solution to combine the results of RNA-seq analysis. GeneTonic assists users in the identification of relevant functional patterns, as well as their contextualization in the data and results at hand, with interactivity (to make the analysis simple and accessible) and reproducibility (via RMarkdown reports) to simplify the integration of all components and communication of results.

With GeneTonic, researchers can generate a variety of visualizations, including bird?s eye perspective summaries (with interactive bipartite gene-geneset graphs or enrichment maps) as well as detailed information and visualizations of individual genes and gene-sets. These can be further inspected via drill-down actions that display additional content in specific elements of the user interface, streamlining analysis, interpretation, and knowledge extraction of transcriptome data for a broad spectrum of collaborating scientists.

(https://doi.org/10.1101/2021.05.19.444862)

Slides Youtube

Ning Leng, Adrian Waddell, Kate Ostbye, Andy Nicholls (R/Consortium | Roche | SCHARP | GSK)
R Consortium Pharma Working Groups Overview and Updates

Abstract

In this talk, we would like to provide updates on the four biopharmaceutical industry focused R consortium cross-industry working groups. These working groups have a similar overall objective to support the use of R within the biopharmaceutical industry, with complementary scopes. We would also like to call for volunteers for these three working groups (these working groups are open to everyone). R-based submission pilots to FDA provide example R-submission materials to the public, identify potential gaps in R based submissions - Presenter Ning Leng (Roche). R table for regulatory reporting develop packages and white papers for generating tables in R to fulfill regulatory requirements - Presenter Adrian Waddell (Roche). R certificates R trainings and certification for the SAS->R transition - Presenter Kate Ostbye (SCHARP). R adoption series A series of webinars focusing on adoption of R - Presenter Andy Nicholls (GSK).

Slides Youtube

Adam Skubala (Bayer)
Industrialized Machine Learning and Explainable AI for Late Phase Trails

Abstract

Terms like "digitalization", "machine learning (ML)" or "artificial intelligence (AI)" are more than just buzzwords these days. Databases are analyzed worldwide with modern algorithms and entire industries are making data-driven decisions at an even faster pace. In Pharma, it is not enough to get the prediction (the what). The model must also explain how it came to the prediction (the why). ML models can only be debugged and audited when they can be interpreted, which then allows for fairness, robustness and trust. Presently, however, the amount, complexity, variety, and speed of clinical data runs the risk of leaving us knowing less about our compounds than regulatory bodies. While the capabilities of ML and AI have received much attention, their role in clinical development has now moved from the theoretical to practical application stage. Using industrialized ML/AI tools, can detect clinically relevant, highly complex safety/efficacy signals that are not identifiable via classical approaches that force hypotheses on the data. By deriving the best hypothesis given the data, ML is currently the best available methodology to create holistic mathematical models of complex (biological) systems using all available data and variables while complementing findings from classical approaches. We, the Biomarker & Data Insight Group at Bayer, have developed a MLAI pipeline in R. Our MLAI pipeline is comprised of four core-modules (data preprocessing, modeling / hyperparameter tuning, higher order interaction analysis and reporting) using most of the available data of late phase trails covering standard endpoint types (time-to-event, class and continuous.). Each core module has its own created internal R package integrating several R packages (e.g. tidyverse, tidymodels, mlr3, iml, Rmarkdown, Shiny,...). The pipeline is an industrialized, mature and validated software product with continuous delivery and continuous deployment. Something special about this pipeline is that we have the effort to open the "black box" using explainable AI. With these extra tools, we can understand better why a certain variable is relevant for the prediction, reveal the nature of its relationship (monotonic or non-monotonic) with the outcome, and make the ML results more understandable and meaningful for clinicians.

Slides Youtube

Robert Kirk DeLisle (Somalogic)
Making Python Objects R-like with PyR

Abstract

R and Python compose the fundamental tools used by data scientists across industries including pharma and biotech. With a rich set of analytical packages in both language domains, analysts who are able to work with both possess a significantly larger selection of tools in their toolbox compared to single language analysts. To consolidate these camps, the reticulate package has played a fundamental and critical role in enabling the direct use of Python from the R console. Additionally, integration of Python capabilities into the RStudio IDE allows single point of access to both languages and their integration. Once a Python module or class is imported, however, accessing methods and attributes from R requires the usage of the $ operator in a way that is not completely consistent with typical R code and creates challenges for integration of objects or models developed in both languages. The result can become a mixture of R-esque and Python-like code that can resemble two different language structures, despite the efforts to combine them. In order to provide analysts an environment in which Python modules and classes can be used as though they were R-native objects, SomaLogic developed the PyR package. This package consists of a set of Python classes that wrap Python objects and a set of S3 methods providing wrappers to those imported classes. A model object hierarchy defining the expected interfaces for the Python components provides an overall architecture enabling introduction of new Python capabilities in a way that appears to the user to be native R code.

Slides Youtube

David Benkeser (Emory University)
R at Warp Speed: Reproducible coding for COVID vaccine trials

Abstract

(The) Operation (formally known as) Warp Speed is a joint venture between pharma and government to bring COVID-19 vaccines to market at unprecedented speed. A key tenet of the program is to generate the data needed to establish correlates of vaccine protection -- immune responses that predict the level of protective efficacy of the vaccines. Our team was tasked with designing an analysis plan and the code needed to analyze the data and produce results that answered these key questions. However, lacking full FDA approval of their products, some vaccine manufacturers were highly protective of their data. Thus, our team was faced with the challenge of building an analysis pipeline capable of analyzing data that we have never seen, on servers that we do not have access to, all under the extreme time pressure associated with COVID vaccine development. In this talk, I will describe the R-based set of tools that we used to achieve this goal and some lessons learned along the way.

Slides Youtube

Talks - Day 3

Heather Crandall and Paul Schuette (FDA)
Submitting Data to CDER: What Comes Next?

Abstract

Slides Youtube

David Granjon (Novartis)
Outstanding User Interfaces with Shiny

Abstract

In recent years, R users' understanding of Shiny has greatly increased but so have client expectations. While one of Shiny?s greatest strengths is that it allows producing web applications solely from R code, meeting client?s more delicate expectations will often involve going beyond R code and work with HTML, CSS, and JavaScript. We recognize that R developers tend not to be familiar with the latter as they generally do not have significant background in web development, these may therefore appear daunting at first. In this talk, I?ll present my journey toward the creation of the RinteRface organization, powering many Shiny extensions like {bs4Dash} or {shinyMobile} as well as the work in progress "Outstanding user interfaces with Shiny" book (https://divadnojnarg.github.io/outstanding-shiny-ui/), exposing some keys to design amazing user experiences.

Slides Youtube

James Black (Roche)
Smoothing out our path to open source and pan-pharma code collaboration

Abstract

In recent years late stage Pharma has begun to transition from a consumer of open source, and a sporadic creator, to a heavily invested collaborator on open source tools like R packages. In this short talk, James will discuss our recent focus on open source collaboration in post-competitive tools, and some important lessons we<U+0092>ve learned.

Slides Youtube

Afshin Mashadi-Hossein, Julie Rytlewski, Garth McGrath (BMS)
Data-as-a-Product: A data science framework for data collaborations

Abstract

For data science teams, data preparation takes substantial investment of time, data science expertise and subject matter proficiency. However, as the name implies, data preparation is typically viewed merely as a means to an end, encouraging creation of expensive but often single-use and fragile elements in data analysis workflows.

Rather than seeing data preparation as an obstacle to be removed, we propose a framework that recognizes the time and expertise invested in data preparation and seeks to maximize the value that can be derived from it. Viewing analysis-ready data as a multi-purpose, modularly built product that should lend itself to collaborative development and maintenance, the framework of Data-as-a-Product (DaaP) aims to remove barriers to version tracking and collaborative data development and maintenance. Specifically, the framework, which is entirely implemented in R, enables joint code and data versioning based on git, standardizes metadata capture, tracks R packages used, and encourages best practices such as adherence to functional programming and use of data testing. Collectively, the patterns established by the DaaP framework can help data science teams transition from developing expensive, single-use "wrangled" datasets to building maintainable, version-controlled, and extendable data products that could serve as reliable components of their data analyses workflows.

Slides Youtube

Nan Xiao (Merck)
Reimagine the R package distribution system for reproducible research and submissions

Abstract

In this talk, we will discuss an infrastructure-free R package exchange and distribution system. The components include: {pkglite} for compact package representations, {cleanslate} for portable R environments, and {pkglink} for runtime dependency resolution. We will also discuss its potential applications in reproducible research and submissions.

Slides Youtube

Keaven M Anderson (Merck)
The gsDesign Shiny app for clinical trial design

Abstract

Slides Youtube

Name		Name	Last commit message	Last commit date
Latest commit History 20 Commits
talks_folder		talks_folder
.gitignore		.gitignore
2021_presentations.Rproj		2021_presentations.Rproj
README.md		README.md
build_readme.R		build_readme.R
talks_table.csv		talks_table.csv

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Building this document

Workshops

Talks - Day 1

Talks - Day 2

Talks - Day 3

About

Releases

Packages

Languages

artlesshao/2021_presentations

Folders and files

Latest commit

History

Repository files navigation

Building this document

Workshops

Talks - Day 1

Talks - Day 2

Talks - Day 3

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages