Package version: CDSE2016 0.0.3

Contents

R is a very popular open-source statistical programming language, with lots of interesting features and challenging quirks. R is used in many aspects of data analysis, by people ranging from students in small academic groups to professional engineers in the largest social media companies. This tutorial takes us from the basics of R to advanced features that are particularly interesting to scientists and engineers. We’ll talk about ‘atomic’ vectors, functional and vectorized computations, R’s unique class systems, visualization, extending R to process large data, and literate programming. A lot of material for a couple of hours, but it’ll be fun.

1 Introduction: R and Bioconductor

Today

R

Bioconductor

Basics of R

2 Literate Programming

R-flavored markdown, .Rmd files

3 Class systems

S3

S4

Advanced: what about R itself?

4 Large Data

4.1 Basics

Efficient R code

Algorithms

4.2 Scalability

4.3 Parallel evaluation

Clusters & clouds

4.4 ‘Native’ implementation

Rcpp

5 (Visualization)

6 Case Study: Cancer Genome Atlas Gene Expression

Background

Upstream

Reduction

Analysis

Summarize results

Making complicated configurations accessible

7 Bioconductor Job / Career Opportunity

This position offers a challenging and creative opportunity for a talented and independent Web / Systems Administrator. The successful applicant will participate in many end-user-facing activities of the successful open-source Bioconductor project for the analysis and comprehension of high-throughput genomic data. Initial duties involve management of cloud-based computing resources, including our web site https://bioconductor.org and support facilities https://support.bioconductor.org. Responsibilities will grow to include day-to-day oversight of our software build system, as well as trouble-shooting and management of user-contributed packages. There are considerable opportunities for developing innovative modern, containerized (e.g., Docker, AMI), and cloud-based (commercial and in-house) solutions to enable use of our software, and to employ modern automation software to deploy and manage in-house computational resources. This person will work in our small team of on- and off-site team members. The size of our team means that the successful applicant will become an expert in these areas, relying on close collaboration with other team members for support. The successful applicant will of necessity become familiar with the R programming language and Bioconductor ecosystem, and should be comfortable with the challenges and opportunities that implies.

Interested? Contact martin.morgan@roswellpark.org

8 Acknowledgements

Bioconductor Core Team (Current and Recent)

Technical and Scientific Advisory Boards

Funding