BICF R for Beginners 2: Data Science with R
Have you already taken the R for Beginner 1 and now want to build up your skills? Do you want to create interactive plots or perform complicated genomics analsysis with Bioconductor? Do you understand dataframes, matries and vectors but need more practice on more sophisticaled analysis?
We will cover:
 Data Importing and Cleaning with tidy & stringr
 Data Manipulation and Data Joining with dplyr
 Correlations and Simple Regression
 Data Visualization with ggplot2
 R Scripting and Markdown
Students will also have a chance to present their own data challenges and come up analysis strategies.
R is a freely available language and programming environment for statistical computing and graphics which provides a wide variety of statistical and graphical techniques: linear and nonlinear modelling, statistical tests, time series analysis, classification, clustering, etc.
If you already have R studio, please update your R here
Contacts
 Course Coordinator Brandi Cantarel
 Course Administration Rebekah Craig
Schedule
Time  Topic  Instructor 

January 11th 2019  Room NB2.100A  
9  12:00 a.m.  Data Importing and Cleaning (with Answers) Data Files 
Brandi Cantarel 
1:00 pm  2:00 p.m.  Data Manipulation and Data Joining  Spencer Barnes 
2:00  3:00 p.m.  Workshop II  
3:00  4:00 p.m.  Correlations and Linear Regression  Rong Lu 
4:00  5:00 p.m.  Workshop III Workshop III Solutions 

January 18th 2019  Room NB2.100A  
9:00  10:00 a.m.  Data Visualization with GGPlot  Jeon Lee 
10:00  11:00 a.m.  Workshop IV  
11:00  12:00 p.m.  Scripting and Markdown  Chris Bennett 
1:00  2:30 p.m.  Scripting Workshop Scripting Data 

2:30  5:00 p.m.  R Therapy: Student Case Studies  All instructors 
R Therapy
Here the an opportunity to apply what you learned to your own research! Students should present a question with some possible solutions to discuss as a class.
Here are some example questions:
 Pick a dataset from the class or from your own work. Plot 2 continuous variables and add a trend lines. Create boxlots using a continuous variable and a categorical variable. Add text to indicate the mean of each group on the plot (type mtext) Present the summary statistics for the comparison between a continuous variable and a categorical variable.
 Calculate logCMP from RNASeq read count data and make a heatmap of the a subset of genes — chose 10 or 20 genes. Choose 2 genes to make boxplots comparing the expression with the sample groups. Create a 3D plots to show the expression of 3 genes.
 Create 3 vectors using random functions for classic distributions see https://stat.ethz.ch/Rmanual/Rdevel/library/stats/html/Distributions.html Plot these vectors as histograms, cumulative distribution function and density function.
 Pick a package in bioconductor, prepare 510 slides to show the other students in the class on how to use this function from installation to some final plot.
 Pick a plot from a recent publication and determine how to make that plot in R.