BICF Data Science for Biologist
Do you want to be to do simple statistical analyses yourself? Do you find yourself spending time and effort generating the same plots and statistics for each project? R is a freely available language and programming environment for statistical computing and graphics which provides a wide variety of statistical and graphical techniques: linear and nonlinear modelling, statistical tests, time series analysis, classification, clustering, etc.
If you already have R studio, please update your R here
Here are datasets that you will need for this course:
- Course Coordinator Brandi Cantarel
- Course Administration Neha Sinha
- Mingzhu Nie (Jan 10)
- Erika Villa (Jan 10)
- Holly Ruess (Jan 17)
- Krishna Kanth Chitta (Jan 17)
- Jaideep Chaudhary (Jan 24 and Jan 31)
|Introduction to R and R Data Structures
|Data Importing and Cleaning with Tidyverse
|Data Manipulation and Data Joining with dplyr and tidyverse
|Introduction to Statistical Tests
|Correlations and Linear Regression
|Plotting with GGPlot and plotly
|Loops and Looping functions with Apply, Scripting and Markdown
|R Package Repositories
|Accessing Public Data Though Bioconductor
Here the an opportunity to apply what you learned to your own research! Students should present a question with some possible solutions to discuss as a class.
Here are some example questions:
- Pick a dataset from the class or from your own work. Plot 2 continuous variables and add a trend lines. Create box-lots using a continuous variable and a categorical variable. Add text to indicate the mean of each group on the plot (type mtext) Present the summary statistics for the comparison between a continuous variable and a categorical variable.
- Calculate logCMP from RNASeq read count data and make a heatmap of the a subset of genes — chose 10 or 20 genes. Choose 2 genes to make boxplots comparing the expression with the sample groups. Create a 3-D plots to show the expression of 3 genes.
- Create 3 vectors using random functions for classic distributions see https://stat.ethz.ch/R-manual/R-devel/library/stats/html/Distributions.html Plot these vectors as histograms, cumulative distribution function and density function.
- Pick a package in bioconductor, prepare 5-10 slides to show the other students in the class on how to use this function from installation to some final plot.
- Pick a plot from a recent publication and determine how to make that plot in R.