BICF R for Beginners 2: Data Science with R
Have you already taken the R for Beginner 1 and now want to build up your skills? Do you want to create interactive plots or perform complicated genomics analsysis with Bioconductor? Do you understand dataframes, matries and vectors but need more practice on more sophisticaled analysis?
We will cover:
- Data Importing and Cleaning with tidy & stringr
- Data Manipulation and Data Joining with dplyr
- Correlations and Simple Regression
- Data Visualization with ggplot2
- R Scripting and Markdown
Students will also have a chance to present their own data challenges and come up analysis strategies.
R is a freely available language and programming environment for statistical computing and graphics which provides a wide variety of statistical and graphical techniques: linear and nonlinear modelling, statistical tests, time series analysis, classification, clustering, etc.
If you already have R studio, please update your R here
|January 11th 2019||Room NB2.100A|
|9 - 12:00 a.m.||Data Importing and Cleaning (with Answers)
|1:00 pm - 2:00 p.m.||Data Manipulation and Data Joining||Spencer Barnes|
|2:00 - 3:00 p.m.||Workshop II|
|3:00 - 4:00 p.m.||Correlations and Linear Regression||Rong Lu|
|4:00 - 5:00 p.m.||Workshop III
Workshop III Solutions
|January 18th 2019||Room NB2.100A|
|9:00 - 10:00 a.m.||Data Visualization with GGPlot||Jeon Lee|
|10:00 - 11:00 a.m.||Workshop IV|
|11:00 - 12:00 p.m.||Scripting and Markdown||Chris Bennett|
|1:00 - 2:30 p.m.||Scripting Workshop
|2:30 - 5:00 p.m.||R Therapy: Student Case Studies||All instructors|
Here the an opportunity to apply what you learned to your own research! Students should present a question with some possible solutions to discuss as a class.
Here are some example questions:
- Pick a dataset from the class or from your own work. Plot 2 continuous variables and add a trend lines. Create box-lots using a continuous variable and a categorical variable. Add text to indicate the mean of each group on the plot (type mtext) Present the summary statistics for the comparison between a continuous variable and a categorical variable.
- Calculate logCMP from RNASeq read count data and make a heatmap of the a subset of genes — chose 10 or 20 genes. Choose 2 genes to make boxplots comparing the expression with the sample groups. Create a 3-D plots to show the expression of 3 genes.
- Create 3 vectors using random functions for classic distributions see https://stat.ethz.ch/R-manual/R-devel/library/stats/html/Distributions.html Plot these vectors as histograms, cumulative distribution function and density function.
- Pick a package in bioconductor, prepare 5-10 slides to show the other students in the class on how to use this function from installation to some final plot.
- Pick a plot from a recent publication and determine how to make that plot in R.