Objects in R

R Data Structures

R Variables are Objects

  • The variables in R are technically known as objects
    • Objects should have meaningful names
    • Try to avoid common function names such as mean and sqrt or else it gets confusing
  • Object names CANNOT start with a number
  • Object names CAN have “.” and numbers within them
  • Avoid “_”
In [89]:
# Variables (objects) can be created by = or <- 
x <- 'abc'
x = 'abc'
x
'abc'
In [90]:
# Determine data type of a variable
typeof(x)
'character'
In [91]:
#Determine the number of variables in a variable
length(x)
1
In [92]:
# Determine the number of characters in a variable
nchar(x)
3

R Data Types

In [93]:
x <- 4
typeof(x)
y <- 4.8 

#Determine data types
typeof(y)
x+y
'double'
'double'
8.8
In [94]:
x <- 'a'
typeof(x)
y <- 'Hello There'
typeof(y)
# These data types allow you to know what sort of functions can be performed
# For example numbers can be added but characters can’t be
x+y
'character'
'character'
Error in x + y: non-numeric argument to binary operator
Traceback:
In [95]:
x <- TRUE
typeof(x)
y <- FALSE
typeof(y)
'logical'
'logical'
In [96]:
x <- 1+4i
typeof(x)
'complex'
In [97]:
x <- c(1:4)
typeof(x)
'integer'

Basic Arithmetic

In [98]:
# Numbers in R need not be given object names

20+3
20-3
20*3
23
17
60
In [99]:
20*3
20/3
20^3
60
6.66666666666667
8000
In [100]:
20 %% 3 #(remainder of the division)
2
In [101]:
20 %/% 3 #(integer of the division)
6
In [102]:
z <- 3*3
sqrt(z)
3
In [103]:
x <- 20
y <- 3
x + y
x - y
x * y
x / y
x ^ y
x %% y 
x %/% y #(integer of the division)
23
17
60
6.66666666666667
8000
2
6

Vectors and Matrices

A vector can be a “collection” of values or a single value

  • Atomic vector
    • a collection of values
  • Factors
    • special vectors that represent categorical data
  • Matrix
    • special vector with rows and columns
  • Data frame
    • a special data structure of rows and columns, the default structure for reading in “excel-like” files
  • List
    • a vector of different data types (including other vectors)
In [104]:
# Vectors
# An example of a numeric vector
x <- 1
x
y <- c(1:10) 
y
length(y)
typeof(y)
1
  1. 1
  2. 2
  3. 3
  4. 4
  5. 5
  6. 6
  7. 7
  8. 8
  9. 9
  10. 10
10
'integer'
In [105]:
# Vectors can be any datatype (character, logical, complex)
x <- c('a','b','c','d')
x
typeof(x)
  1. 'a'
  2. 'b'
  3. 'c'
  4. 'd'
'character'
In [106]:
# For Numerical Vectors you can do comparisons

x <- c(1:10)
x
  1. 1
  2. 2
  3. 3
  4. 4
  5. 5
  6. 6
  7. 7
  8. 8
  9. 9
  10. 10
In [107]:
x > 3
(x > 3) & (x < 8)
x[x > 3]
  1. FALSE
  2. FALSE
  3. FALSE
  4. TRUE
  5. TRUE
  6. TRUE
  7. TRUE
  8. TRUE
  9. TRUE
  10. TRUE
  1. FALSE
  2. FALSE
  3. FALSE
  4. TRUE
  5. TRUE
  6. TRUE
  7. TRUE
  8. FALSE
  9. FALSE
  10. FALSE
  1. 4
  2. 5
  3. 6
  4. 7
  5. 8
  6. 9
  7. 10
In [108]:
#Comparisons are logical objects

typeof((x > 3) & (x < 8))
'logical'
In [109]:
# Combining Vectors

x <- c(1:10)
x <- c(x,11:20)
x
  1. 1
  2. 2
  3. 3
  4. 4
  5. 5
  6. 6
  7. 7
  8. 8
  9. 9
  10. 10
  11. 11
  12. 12
  13. 13
  14. 14
  15. 15
  16. 16
  17. 17
  18. 18
  19. 19
  20. 20
In [110]:
#Performing Math on Vectors
x <- c(1:10)
x*3
  1. 3
  2. 6
  3. 9
  4. 12
  5. 15
  6. 18
  7. 21
  8. 24
  9. 27
  10. 30
In [111]:
x[4]
4

Factors Are Nominal Vectors

The factor stores the nominal values as a vector of integers in the range [ 1... k ] and an internal vector of character strings (the original values) mapped to these integers.

In [112]:
genotype <- c(rep(“WT”,5),rep("KO",5))
factor(genotype)
Error in parse(text = x, srcfile = src): <text>:1:19: unexpected input
1: genotype <- c(rep(<e2>
                      ^
Traceback:
In [113]:
genotype <- factor(genotype,levels=c("WT","KO"))
genotype
summary(genotype)
Error in factor(genotype, levels = c("WT", "KO")): object 'genotype' not found
Traceback:

1. factor(genotype, levels = c("WT", "KO"))
In [114]:
genotype <- factor(genotype,levels=c("WT","KO"),ordered=TRUE)
min(genotype)
Error in factor(genotype, levels = c("WT", "KO"), ordered = TRUE): object 'genotype' not found
Traceback:

1. factor(genotype, levels = c("WT", "KO"), ordered = TRUE)

What is a function

  • Built-in functions are operations that one can “perform” on object that are available in R
  • User-defined functions are functions that are written by the user
  • Packages are R functions that are written by the R community that need to be loaded before using them

Getting Help with R functions

  • R Help
    • The help() function and ? help operator in R provide access to the documentation pages for R functions, data sets, and other objects, both for packages in the standard R distribution and for contributed packages.
    • help()
    • ?
  • To access documentation for the standard lm (linear model)
    • help(lm)
    • help("lm")
    • ?lm
    • ?"lm"
    • quotes are optional
  • To access help for a function in a package that’s not currently loaded, specify in addition the name of the package:
    • for the rlm() (robust linear model) function in the MASS package:
    • help(rlm, package="MASS")

Math Functions

In [115]:
# absolute value
x <- -2
abs(x)
2
In [116]:
# square root
x <- 4
sqrt(x)
2
In [117]:
#Rounding and Creating Integers
x <- 4.693959
n <- 3
ceiling(x)
floor(x)
trunc(x)
round(x, digits=n)
signif(x, digits=n)
5
4
4
4.694
4.69
In [118]:
#Triganometry
x <- 60
cos(x)
sin(x)
tan(x)
-0.952412980415156
-0.304810621102217
0.320040389379563
In [119]:
#Logarithms and Exponents
x <- 3.2
log(x)
log10(x)
exp(x)
1.16315080980568
0.505149978319906
24.5325301971094
In [120]:
# divide continuous variable in factor with n levels 
y <- cut(1:20, 20)
y
  1. (0.981,1.95]
  2. (1.95,2.9]
  3. (2.9,3.85]
  4. (3.85,4.8]
  5. (4.8,5.75]
  6. (5.75,6.7]
  7. (6.7,7.65]
  8. (7.65,8.6]
  9. (8.6,9.55]
  10. (9.55,10.5]
  11. (10.5,11.4]
  12. (11.4,12.4]
  13. (12.4,13.3]
  14. (13.3,14.3]
  15. (14.3,15.2]
  16. (15.2,16.2]
  17. (16.2,17.1]
  18. (17.1,18.1]
  19. (18.1,19.1]
  20. (19.1,20]
Levels:
  1. '(0.981,1.95]'
  2. '(1.95,2.9]'
  3. '(2.9,3.85]'
  4. '(3.85,4.8]'
  5. '(4.8,5.75]'
  6. '(5.75,6.7]'
  7. '(6.7,7.65]'
  8. '(7.65,8.6]'
  9. '(8.6,9.55]'
  10. '(9.55,10.5]'
  11. '(10.5,11.4]'
  12. '(11.4,12.4]'
  13. '(12.4,13.3]'
  14. '(13.3,14.3]'
  15. '(14.3,15.2]'
  16. '(15.2,16.2]'
  17. '(16.2,17.1]'
  18. '(17.1,18.1]'
  19. '(18.1,19.1]'
  20. '(19.1,20]'

String Manipulation

In [124]:
# generate a sequence
indices <- seq(1,10,2)

# Repeat a string

rep('a',3)
  1. 'a'
  2. 'a'
  3. 'a'
In [125]:
# Extract or replace substrings in a character vector.
#substr(x, start=n1, stop=n2)
x <- "abcdef" 
substr(x, 2, 4)
substr(x, 2, 4) <- "22222"

a <- "Hello"
substring(a,1,2)
substring(a,2,5)
'bcd'
'He'
'ello'
In [126]:
# Search for pattern in x. If fixed =FALSE then pattern is a regular expression. 
# If fixed=TRUE then pattern is a text string. Returns matching indices.
#grep(pattern, x , ignore.case=FALSE, fixed=FALSE)

grep("A", c("b","A","c"), fixed=TRUE)
2
In [127]:
# Find pattern in x and replace with replacement text. If fixed=FALSE then pattern is a regular expression.
# If fixed = T then pattern is a text string. 
#sub(pattern, replacement, x, ignore.case =FALSE, fixed=FALSE)

sub("\\s",".","Hello There")
'Hello.There'
In [128]:
#Split the elements of character vector x at split. 
#strsplit(x, split)

strsplit("abc", "")
    1. 'a'
    2. 'b'
    3. 'c'
In [129]:
#Concatenate strings after using sep string to seperate them.
#paste(..., sep="")

paste('x',1:3,sep="")
paste('x',1:3,sep="M")
paste('Today is', date())

a <- "Hello"
b <- 'How'
c <- "are you?"
print(paste(a,b,c))
print(paste(a,b,c, sep = "-"))
print(paste(a,b,c, sep = "", collapse = ""))
  1. 'x1'
  2. 'x2'
  3. 'x3'
  1. 'xM1'
  2. 'xM2'
  3. 'xM3'
'Today is Wed Jul 4 11:54:04 2018'
[1] "Hello How are you?"
[1] "Hello-How-are you?"
[1] "HelloHoware you?"
In [130]:
# Create Upper and Lowercase strings
x <- 'Hello There'
toupper(x)
tolower(x)
'HELLO THERE'
'hello there'

Built In Statistical Functions

Means, Medians, Ranges and Other Basic Functions on Number Sets

In [131]:
x <- 1:20
mean(x)
sd(x)
median(x)
min(x)
max(x)
10.5
5.91607978309962
10.5
1
20
In [132]:
quantile(x)
0%
1
25%
5.75
50%
10.5
75%
15.25
100%
20
In [133]:
range(x)
sum(x)
diff(x, lag=1)
y <- scale(x, center=TRUE, scale=TRUE)
plot(x,y)
  1. 1
  2. 20
210
  1. 1
  2. 1
  3. 1
  4. 1
  5. 1
  6. 1
  7. 1
  8. 1
  9. 1
  10. 1
  11. 1
  12. 1
  13. 1
  14. 1
  15. 1
  16. 1
  17. 1
  18. 1
  19. 1

Normal Distribution

  • dnorm(x)
  • pnorm(q)
  • qnorm(p)
  • rnorm(n, m=0,sd=1)
In [134]:
#normal density function (by default m=0 sd=1)
# plot standard normal curve
x <- pretty(c(-3,3), 30)
y <- dnorm(x)
plot(x, y, type='l', xlab="Normal Deviate", ylab="Density", yaxs="i")
In [135]:
#cumulative normal probability for q 
#(area under the normal curve to the left of q)
pnorm(1.96)
0.97500210485178
In [136]:
#normal quantile. 
#value at the p percentile of normal distribution 
qnorm(.9)
1.2815515655446
In [137]:
#n random normal deviates with mean m and standard deviation sd. 
#50 random normal variates with mean=50, sd=10
x <- rnorm(50, m=50, sd=10)
x
  1. 56.6593084122963
  2. 60.155317653088
  3. 55.9083150728
  4. 42.7956776967685
  5. 48.1538238204272
  6. 61.2491412601765
  7. 42.1120560973629
  8. 29.8359394706739
  9. 45.8338910253642
  10. 58.2448515583633
  11. 38.6109669417492
  12. 62.0962376168555
  13. 68.5203253007431
  14. 36.2190694151207
  15. 60.1039979053207
  16. 42.2388179502121
  17. 27.2779567858732
  18. 53.9431384260507
  19. 43.418241734822
  20. 49.5180042989959
  21. 49.904999082633
  22. 39.8515765295455
  23. 56.8724909716531
  24. 52.5112200161407
  25. 58.4144867894071
  26. 56.1534851022861
  27. 64.2174994728223
  28. 54.4695716892177
  29. 63.2337816079386
  30. 26.8765566849792
  31. 63.5395107191654
  32. 56.4947166745401
  33. 38.5082922785383
  34. 44.984078651159
  35. 51.0389344888757
  36. 51.7443141145394
  37. 50.3855085642358
  38. 62.4231746924192
  39. 53.415130743734
  40. 36.5536444124325
  41. 39.3893628531487
  42. 31.915579161233
  43. 59.171492260718
  44. 44.7482169887281
  45. 50.8543498764551
  46. 45.4870324232751
  47. 50.1850815937212
  48. 39.8624749665707
  49. 61.4501816873907
  50. 50.4804472120022

Binomial Distribution

binomial distribution where size is the sample size and prob is the probability of a heads (for a coin toss)

  • dbinom(x, size, prob)
  • pbinom(q, size, prob)
  • qbinom(p, size, prob)
  • rbinom(n, size, prob)
In [138]:
x <- seq(0,50,by=1)
y <- dbinom(x,50,0.2)
plot(x,y)

Poisson Distribution

poisson distribution with m=std=lamda

  • dpois(x, lamda)
  • ppois(q, lamda)
  • qpois(p, lamda)
  • rpois(n, lamda)
In [139]:
x <- 0:20
y <- dpois( x=0:20, lambda=6 )
plot(x, y, xlim=c(-2,20))

Uniform Distribution

  • dunif(x, min=0, max=1)
  • punif(q, min=0, max=1)
  • qunif(p, min=0, max=1)
  • runif(n, min=0, max=1)
In [140]:
numcases <- 10000
min <- 1
max <- 6
x <- as.integer(runif(numcases,min,max+1))
hist(x,main=paste( numcases," roles of a single die"),breaks=seq(min-.5,max+.5,1))

Matrix

special vector with rows and columns

  • All columns in a matrix must have the same mode(numeric, character, etc.) and the same length. The general format is:
    • mymatrix <- matrix(vector, nrow=r, ncol=c, byrow=FALSE)
  • byrow=TRUE indicates that the matrix should be filled by rows.
  • byrow=FALSE indicates that the matrix should be filled by columns (the default)
In [141]:
y <- matrix(1:20, nrow=5,ncol=4)
y 
cells <- c(1,26,24,68)
rnames <- c("R1", "R2")
cnames <- c("C1", "C2") 
x <- matrix(cells, nrow=2, ncol=2, byrow=TRUE, dimnames=list(rnames, cnames))
x
1 61116
2 71217
3 81318
4 91419
5 101520
C1C2
R1 126
R22468
In [142]:
x[,1] # 1st column of matrix
x[2,] # 2nd row of matrix 
x[2,1] # row 2, column 1
x[1,2]
R1
1
R2
24
C1
24
C2
68
24
26

Functions to Combine and Calculate Statistics on Matrices

  • cbind(A,B,...)
    • Combine matrices(vectors) horizontally. Returns a matrix.
  • rbind(A,B,...)
    • Combine matrices(vectors) vertically. Returns a matrix.
  • rowMeans(A)
    • Returns vector of row means.
  • rowSums(A)
    • Returns vector of row sums.
  • colMeans(A)
    • Returns vector of column means.
  • colSums(A)
    • Returns vector of column sums.
  • t(A)
    • Transpose
In [143]:
y <- matrix(1:20, nrow=5,ncol=4,byrow = FALSE)
y
y <- matrix(1:20, nrow=5,ncol=4,byrow = TRUE)
y
t(y) #transpose
rowSums(y)
colMeans(y)
1 61116
2 71217
3 81318
4 91419
5 101520
1 2 3 4
5 6 7 8
9101112
13141516
17181920
1 5 91317
2 6 101418
3 7 111519
4 8 121620
  1. 10
  2. 26
  3. 42
  4. 58
  5. 74
  1. 9
  2. 10
  3. 11
  4. 12
In [144]:
y*4
y*y
y/y
4 81216
20242832
36404448
52566064
68727680
1 4 9 16
25 36 49 64
81100121144
169196225256
289324361400
1111
1111
1111
1111
1111
In [145]:
# Element-wise multiplication
A %*% B

# Outer product. AB

A %o% B

# A'B and A'A respectively.

crossprod(A,B)
crossprod(A)
Error in eval(expr, envir, enclos): object 'A' not found
Traceback:
In [146]:
# Creates diagonal matrix with elements of x in the principal diagonal
#Returns a vector containing the elements of the principal diagonal

diag(y)

#If k is a scalar, this creates a k x k identity matrix. Go figure.
diag(4)
  1. 1
  2. 6
  3. 11
  4. 16
1000
0100
0010
0001

Data Frames

  • A special data structure of rows and columns, the default structure for reading in “excel-like” files
  • A data frame is more general than a matrix, in that different columns can have different modes (numeric, character, factor, etc.). This is similar to SAS and SPSS datasets.
In [147]:
d <- c(1,2,3,4)
e <- c("red", "white", "red", NA)
f <- c(TRUE,TRUE,TRUE,FALSE)
x <- data.frame(d,e,f)
names(x) <- c("ID","Color","Passed")
x
IDColorPassed
1 red TRUE
2 white TRUE
3 red TRUE
4 NA FALSE
In [148]:
setwd("~/Desktop/")
tbl <- read.csv(file="sample_example_R1_data_structures.csv",header=TRUE)
tbl
SampleIDTissueSampleGroupSubjectIDOrganismRaceSampleNameGenderFullPathToFqR1FullPathToFqR2
SRR1551074 Whole.Blood whole_blood 53 Homo sapiens White 53_Tempus female /project/BICF/s166458/pipeline_devel/SRR1551074_1.fastq.gz/project/BICF/s166458/pipeline_devel/SRR1551074_2.fastq.gz
SRR1551073 Whole.Blood NK 53 Homo sapiens White 53_NK female /project/BICF/s166458/pipeline_devel/SRR1551073_1.fastq.gz/project/BICF/s166458/pipeline_devel/SRR1551073_2.fastq.gz
SRR1551072 Whole.Blood CD8 53 Homo sapiens White 53_CD8T female /project/BICF/s166458/pipeline_devel/SRR1551072_1.fastq.gz/project/BICF/s166458/pipeline_devel/SRR1551072_2.fastq.gz
SRR1551071 Whole.Blood CD4 53 Homo sapiens White 53_CD4T female /project/BICF/s166458/pipeline_devel/SRR1551071_1.fastq.gz/project/BICF/s166458/pipeline_devel/SRR1551071_2.fastq.gz
SRR1551070 Whole.Blood B-Cells 53 Homo sapiens White 53_Bcells female /project/BICF/s166458/pipeline_devel/SRR1551070_1.fastq.gz/project/BICF/s166458/pipeline_devel/SRR1551070_2.fastq.gz
SRR1551069 Whole.Blood monocytes 53 Homo sapiens White 53_Monocytes female /project/BICF/s166458/pipeline_devel/SRR1551069_1.fastq.gz/project/BICF/s166458/pipeline_devel/SRR1551069_2.fastq.gz
SRR1551068 Whole.Blood neutrophils 53 Homo sapiens White 53_Neutrophils female /project/BICF/s166458/pipeline_devel/SRR1551068_1.fastq.gz/project/BICF/s166458/pipeline_devel/SRR1551068_2.fastq.gz
SRR1551060 Whole.Blood whole_blood 21 Homo sapiens White 21_Tempus female /project/BICF/s166458/pipeline_devel/SRR1551060_1.fastq.gz/project/BICF/s166458/pipeline_devel/SRR1551060_2.fastq.gz
SRR1551059 Whole.Blood NK 21 Homo sapiens White 21_NK female /project/BICF/s166458/pipeline_devel/SRR1551059_1.fastq.gz/project/BICF/s166458/pipeline_devel/SRR1551059_2.fastq.gz
SRR1551058 Whole.Blood CD8 21 Homo sapiens White 21_CD8T female /project/BICF/s166458/pipeline_devel/SRR1551058_1.fastq.gz/project/BICF/s166458/pipeline_devel/SRR1551058_2.fastq.gz
SRR1551057 Whole.Blood CD4 21 Homo sapiens White 21_CD4T female /project/BICF/s166458/pipeline_devel/SRR1551057_1.fastq.gz/project/BICF/s166458/pipeline_devel/SRR1551057_2.fastq.gz
SRR1551056 Whole.Blood B-Cells 21 Homo sapiens White 21_Bcells female /project/BICF/s166458/pipeline_devel/SRR1551056_1.fastq.gz/project/BICF/s166458/pipeline_devel/SRR1551056_2.fastq.gz
SRR1551055 Whole.Blood monocytes 21 Homo sapiens White 21_Monocytes female /project/BICF/s166458/pipeline_devel/SRR1551055_1.fastq.gz/project/BICF/s166458/pipeline_devel/SRR1551055_2.fastq.gz
SRR1551054 Whole.Blood neutrophils 21 Homo sapiens White 21_Neutrophils female /project/BICF/s166458/pipeline_devel/SRR1551054_1.fastq.gz/project/BICF/s166458/pipeline_devel/SRR1551054_2.fastq.gz
SRR1551053 Whole.Blood whole_blood 20 Homo sapiens White 20_Tempus female /project/BICF/s166458/pipeline_devel/SRR1551053_1.fastq.gz/project/BICF/s166458/pipeline_devel/SRR1551053_2.fastq.gz
SRR1551052 Whole.Blood NK 20 Homo sapiens White 20_NK female /project/BICF/s166458/pipeline_devel/SRR1551052_1.fastq.gz/project/BICF/s166458/pipeline_devel/SRR1551052_2.fastq.gz
SRR1551051 Whole.Blood CD8 20 Homo sapiens White 20_CD8T female /project/BICF/s166458/pipeline_devel/SRR1551051_1.fastq.gz/project/BICF/s166458/pipeline_devel/SRR1551051_2.fastq.gz
SRR1551050 Whole.Blood CD4 20 Homo sapiens White 20_CD4T female /project/BICF/s166458/pipeline_devel/SRR1551050_1.fastq.gz/project/BICF/s166458/pipeline_devel/SRR1551050_2.fastq.gz
SRR1551049 Whole.Blood B-Cells 20 Homo sapiens White 20_Bcells female /project/BICF/s166458/pipeline_devel/SRR1551049_1.fastq.gz/project/BICF/s166458/pipeline_devel/SRR1551049_2.fastq.gz
SRR1551048 Whole.Blood monocytes 20 Homo sapiens White 20_Monocytes female /project/BICF/s166458/pipeline_devel/SRR1551048_1.fastq.gz/project/BICF/s166458/pipeline_devel/SRR1551048_2.fastq.gz
SRR1551047 Whole.Blood neutrophils 20 Homo sapiens White 20_Neutrophils female /project/BICF/s166458/pipeline_devel/SRR1551047_1.fastq.gz/project/BICF/s166458/pipeline_devel/SRR1551047_2.fastq.gz
SRR1550991 Whole.Blood NK 44 Homo sapiens Hispanic 44_NK female /project/BICF/s166458/pipeline_devel/SRR1550991_1.fastq.gz/project/BICF/s166458/pipeline_devel/SRR1550991_2.fastq.gz
SRR1550990 Whole.Blood CD8 44 Homo sapiens Hispanic 44_CD8T female /project/BICF/s166458/pipeline_devel/SRR1550990_1.fastq.gz/project/BICF/s166458/pipeline_devel/SRR1550990_2.fastq.gz
SRR1550989 Whole.Blood CD4 44 Homo sapiens Hispanic 44_CD4T female /project/BICF/s166458/pipeline_devel/SRR1550989_1.fastq.gz/project/BICF/s166458/pipeline_devel/SRR1550989_2.fastq.gz
SRR1550988 Whole.Blood B-Cells 44 Homo sapiens Hispanic 44_Bcells female /project/BICF/s166458/pipeline_devel/SRR1550988_1.fastq.gz/project/BICF/s166458/pipeline_devel/SRR1550988_2.fastq.gz
SRR1550987 Whole.Blood monocytes 44 Homo sapiens Hispanic 44_Monocytes female /project/BICF/s166458/pipeline_devel/SRR1550987_1.fastq.gz/project/BICF/s166458/pipeline_devel/SRR1550987_2.fastq.gz
SRR1550986 Whole.Blood neutrophils 44 Homo sapiens Hispanic 44_Neutrophils female /project/BICF/s166458/pipeline_devel/SRR1550986_1.fastq.gz/project/BICF/s166458/pipeline_devel/SRR1550986_2.fastq.gz
SRR1550981 Whole.Blood whole_blood 44 Homo sapiens Hispanic 44_Tempus female /project/BICF/s166458/pipeline_devel/SRR1550981_1.fastq.gz/project/BICF/s166458/pipeline_devel/SRR1550981_2.fastq.gz
In [149]:
tbl[3:5]
SampleGroupSubjectIDOrganism
whole_blood 53 Homo sapiens
NK 53 Homo sapiens
CD8 53 Homo sapiens
CD4 53 Homo sapiens
B-Cells 53 Homo sapiens
monocytes 53 Homo sapiens
neutrophils 53 Homo sapiens
whole_blood 21 Homo sapiens
NK 21 Homo sapiens
CD8 21 Homo sapiens
CD4 21 Homo sapiens
B-Cells 21 Homo sapiens
monocytes 21 Homo sapiens
neutrophils 21 Homo sapiens
whole_blood 20 Homo sapiens
NK 20 Homo sapiens
CD8 20 Homo sapiens
CD4 20 Homo sapiens
B-Cells 20 Homo sapiens
monocytes 20 Homo sapiens
neutrophils 20 Homo sapiens
NK 44 Homo sapiens
CD8 44 Homo sapiens
CD4 44 Homo sapiens
B-Cells 44 Homo sapiens
monocytes 44 Homo sapiens
neutrophils 44 Homo sapiens
whole_blood 44 Homo sapiens
In [150]:
tbl[c("SampleID","Tissue")]
SampleIDTissue
SRR1551074 Whole.Blood
SRR1551073 Whole.Blood
SRR1551072 Whole.Blood
SRR1551071 Whole.Blood
SRR1551070 Whole.Blood
SRR1551069 Whole.Blood
SRR1551068 Whole.Blood
SRR1551060 Whole.Blood
SRR1551059 Whole.Blood
SRR1551058 Whole.Blood
SRR1551057 Whole.Blood
SRR1551056 Whole.Blood
SRR1551055 Whole.Blood
SRR1551054 Whole.Blood
SRR1551053 Whole.Blood
SRR1551052 Whole.Blood
SRR1551051 Whole.Blood
SRR1551050 Whole.Blood
SRR1551049 Whole.Blood
SRR1551048 Whole.Blood
SRR1551047 Whole.Blood
SRR1550991 Whole.Blood
SRR1550990 Whole.Blood
SRR1550989 Whole.Blood
SRR1550988 Whole.Blood
SRR1550987 Whole.Blood
SRR1550986 Whole.Blood
SRR1550981 Whole.Blood
In [151]:
tbl$Gender
  1. female
  2. female
  3. female
  4. female
  5. female
  6. female
  7. female
  8. female
  9. female
  10. female
  11. female
  12. female
  13. female
  14. female
  15. female
  16. female
  17. female
  18. female
  19. female
  20. female
  21. female
  22. female
  23. female
  24. female
  25. female
  26. female
  27. female
  28. female
Levels: 'female'
In [152]:
tbl[tbl$SampleGroup == 'monocytes',]
SampleIDTissueSampleGroupSubjectIDOrganismRaceSampleNameGenderFullPathToFqR1FullPathToFqR2
6SRR1551069 Whole.Blood monocytes 53 Homo sapiens White 53_Monocytes female /project/BICF/s166458/pipeline_devel/SRR1551069_1.fastq.gz/project/BICF/s166458/pipeline_devel/SRR1551069_2.fastq.gz
13SRR1551055 Whole.Blood monocytes 21 Homo sapiens White 21_Monocytes female /project/BICF/s166458/pipeline_devel/SRR1551055_1.fastq.gz/project/BICF/s166458/pipeline_devel/SRR1551055_2.fastq.gz
20SRR1551048 Whole.Blood monocytes 20 Homo sapiens White 20_Monocytes female /project/BICF/s166458/pipeline_devel/SRR1551048_1.fastq.gz/project/BICF/s166458/pipeline_devel/SRR1551048_2.fastq.gz
26SRR1550987 Whole.Blood monocytes 44 Homo sapiens Hispanic 44_Monocytes female /project/BICF/s166458/pipeline_devel/SRR1550987_1.fastq.gz/project/BICF/s166458/pipeline_devel/SRR1550987_2.fastq.gz
In [153]:
subset(x=tbl,SampleGroup == 'monocytes',select=c('Tissue','SampleID'))
TissueSampleID
6Whole.BloodSRR1551069
13Whole.BloodSRR1551055
20Whole.BloodSRR1551048
26Whole.BloodSRR1550987
In [154]:
tbl1 <- read.csv(file="sample_example_R1_data_structures.csv",header=TRUE)
tbl2 <- read.csv(file="table2.csv",header=TRUE)
tbl2
SampleIDSubjectIDBMI
SRR155107453 23
SRR155107353 23
SRR155107253 23
SRR155107153 23
SRR155107053 23
SRR155106953 23
SRR155106853 23
SRR155106021 28
SRR155105921 28
SRR155105821 28
SRR155105721 28
SRR155105621 28
SRR155105521 28
SRR155105421 28
SRR155105320 35
SRR155105220 35
SRR155105120 35
SRR155105020 35
SRR155104920 35
SRR155104820 35
SRR155104720 35
SRR155099144 40
SRR155099044 40
SRR155098944 40
SRR155098844 40
SRR155098744 40
SRR155098644 40
SRR155098144 40

String Functions

  • gives a very brief description of the data
    • str(df)
  • gives the name of each variables
    • names(df)
  • gives some very basic summary statistics for each variable
    • summary(df)
  • shows the first few rows
    • head(df)
  • shows the last few rows.
    • tail(df)
  • looks at duplicated elements and returns a logical vector. You can use table() to summarize this vector.
    • duplicated()
  • keeps only the unique lines in a dataset
    • unique()
In [155]:
merge.tbl <- merge(tbl1,tbl2,by='SampleID')
merge.tbl
SampleIDTissueSampleGroupSubjectID.xOrganismRaceSampleNameGenderFullPathToFqR1FullPathToFqR2SubjectID.yBMI
SRR1550981 Whole.Blood whole_blood 44 Homo sapiens Hispanic 44_Tempus female /project/BICF/s166458/pipeline_devel/SRR1550981_1.fastq.gz/project/BICF/s166458/pipeline_devel/SRR1550981_2.fastq.gz44 40
SRR1550986 Whole.Blood neutrophils 44 Homo sapiens Hispanic 44_Neutrophils female /project/BICF/s166458/pipeline_devel/SRR1550986_1.fastq.gz/project/BICF/s166458/pipeline_devel/SRR1550986_2.fastq.gz44 40
SRR1550987 Whole.Blood monocytes 44 Homo sapiens Hispanic 44_Monocytes female /project/BICF/s166458/pipeline_devel/SRR1550987_1.fastq.gz/project/BICF/s166458/pipeline_devel/SRR1550987_2.fastq.gz44 40
SRR1550988 Whole.Blood B-Cells 44 Homo sapiens Hispanic 44_Bcells female /project/BICF/s166458/pipeline_devel/SRR1550988_1.fastq.gz/project/BICF/s166458/pipeline_devel/SRR1550988_2.fastq.gz44 40
SRR1550989 Whole.Blood CD4 44 Homo sapiens Hispanic 44_CD4T female /project/BICF/s166458/pipeline_devel/SRR1550989_1.fastq.gz/project/BICF/s166458/pipeline_devel/SRR1550989_2.fastq.gz44 40
SRR1550990 Whole.Blood CD8 44 Homo sapiens Hispanic 44_CD8T female /project/BICF/s166458/pipeline_devel/SRR1550990_1.fastq.gz/project/BICF/s166458/pipeline_devel/SRR1550990_2.fastq.gz44 40
SRR1550991 Whole.Blood NK 44 Homo sapiens Hispanic 44_NK female /project/BICF/s166458/pipeline_devel/SRR1550991_1.fastq.gz/project/BICF/s166458/pipeline_devel/SRR1550991_2.fastq.gz44 40
SRR1551047 Whole.Blood neutrophils 20 Homo sapiens White 20_Neutrophils female /project/BICF/s166458/pipeline_devel/SRR1551047_1.fastq.gz/project/BICF/s166458/pipeline_devel/SRR1551047_2.fastq.gz20 35
SRR1551048 Whole.Blood monocytes 20 Homo sapiens White 20_Monocytes female /project/BICF/s166458/pipeline_devel/SRR1551048_1.fastq.gz/project/BICF/s166458/pipeline_devel/SRR1551048_2.fastq.gz20 35
SRR1551049 Whole.Blood B-Cells 20 Homo sapiens White 20_Bcells female /project/BICF/s166458/pipeline_devel/SRR1551049_1.fastq.gz/project/BICF/s166458/pipeline_devel/SRR1551049_2.fastq.gz20 35
SRR1551050 Whole.Blood CD4 20 Homo sapiens White 20_CD4T female /project/BICF/s166458/pipeline_devel/SRR1551050_1.fastq.gz/project/BICF/s166458/pipeline_devel/SRR1551050_2.fastq.gz20 35
SRR1551051 Whole.Blood CD8 20 Homo sapiens White 20_CD8T female /project/BICF/s166458/pipeline_devel/SRR1551051_1.fastq.gz/project/BICF/s166458/pipeline_devel/SRR1551051_2.fastq.gz20 35
SRR1551052 Whole.Blood NK 20 Homo sapiens White 20_NK female /project/BICF/s166458/pipeline_devel/SRR1551052_1.fastq.gz/project/BICF/s166458/pipeline_devel/SRR1551052_2.fastq.gz20 35
SRR1551053 Whole.Blood whole_blood 20 Homo sapiens White 20_Tempus female /project/BICF/s166458/pipeline_devel/SRR1551053_1.fastq.gz/project/BICF/s166458/pipeline_devel/SRR1551053_2.fastq.gz20 35
SRR1551054 Whole.Blood neutrophils 21 Homo sapiens White 21_Neutrophils female /project/BICF/s166458/pipeline_devel/SRR1551054_1.fastq.gz/project/BICF/s166458/pipeline_devel/SRR1551054_2.fastq.gz21 28
SRR1551055 Whole.Blood monocytes 21 Homo sapiens White 21_Monocytes female /project/BICF/s166458/pipeline_devel/SRR1551055_1.fastq.gz/project/BICF/s166458/pipeline_devel/SRR1551055_2.fastq.gz21 28
SRR1551056 Whole.Blood B-Cells 21 Homo sapiens White 21_Bcells female /project/BICF/s166458/pipeline_devel/SRR1551056_1.fastq.gz/project/BICF/s166458/pipeline_devel/SRR1551056_2.fastq.gz21 28
SRR1551057 Whole.Blood CD4 21 Homo sapiens White 21_CD4T female /project/BICF/s166458/pipeline_devel/SRR1551057_1.fastq.gz/project/BICF/s166458/pipeline_devel/SRR1551057_2.fastq.gz21 28
SRR1551058 Whole.Blood CD8 21 Homo sapiens White 21_CD8T female /project/BICF/s166458/pipeline_devel/SRR1551058_1.fastq.gz/project/BICF/s166458/pipeline_devel/SRR1551058_2.fastq.gz21 28
SRR1551059 Whole.Blood NK 21 Homo sapiens White 21_NK female /project/BICF/s166458/pipeline_devel/SRR1551059_1.fastq.gz/project/BICF/s166458/pipeline_devel/SRR1551059_2.fastq.gz21 28
SRR1551060 Whole.Blood whole_blood 21 Homo sapiens White 21_Tempus female /project/BICF/s166458/pipeline_devel/SRR1551060_1.fastq.gz/project/BICF/s166458/pipeline_devel/SRR1551060_2.fastq.gz21 28
SRR1551068 Whole.Blood neutrophils 53 Homo sapiens White 53_Neutrophils female /project/BICF/s166458/pipeline_devel/SRR1551068_1.fastq.gz/project/BICF/s166458/pipeline_devel/SRR1551068_2.fastq.gz53 23
SRR1551069 Whole.Blood monocytes 53 Homo sapiens White 53_Monocytes female /project/BICF/s166458/pipeline_devel/SRR1551069_1.fastq.gz/project/BICF/s166458/pipeline_devel/SRR1551069_2.fastq.gz53 23
SRR1551070 Whole.Blood B-Cells 53 Homo sapiens White 53_Bcells female /project/BICF/s166458/pipeline_devel/SRR1551070_1.fastq.gz/project/BICF/s166458/pipeline_devel/SRR1551070_2.fastq.gz53 23
SRR1551071 Whole.Blood CD4 53 Homo sapiens White 53_CD4T female /project/BICF/s166458/pipeline_devel/SRR1551071_1.fastq.gz/project/BICF/s166458/pipeline_devel/SRR1551071_2.fastq.gz53 23
SRR1551072 Whole.Blood CD8 53 Homo sapiens White 53_CD8T female /project/BICF/s166458/pipeline_devel/SRR1551072_1.fastq.gz/project/BICF/s166458/pipeline_devel/SRR1551072_2.fastq.gz53 23
SRR1551073 Whole.Blood NK 53 Homo sapiens White 53_NK female /project/BICF/s166458/pipeline_devel/SRR1551073_1.fastq.gz/project/BICF/s166458/pipeline_devel/SRR1551073_2.fastq.gz53 23
SRR1551074 Whole.Blood whole_blood 53 Homo sapiens White 53_Tempus female /project/BICF/s166458/pipeline_devel/SRR1551074_1.fastq.gz/project/BICF/s166458/pipeline_devel/SRR1551074_2.fastq.gz53 23
In [88]:
help('merge')

dplyr

dplyr is a package for data manipulation, that uses intuitive commands

  • add new variables that are functions of existing variables
    • mutate()
  • pick variables based on their names.
    • select() 
  • pick cases based on their values
    • filter()
  • reduce multiple values down to a single summary
    • summarise()
  • change the ordering of the rows
    • arrange()
In [156]:
library(dplyr) 
filter(tbl1,SampleGroup=='monocytes')
filter(tbl1,SampleGroup=='monocytes',SubjectID==53)
Attaching package: ‘dplyr’

The following objects are masked from ‘package:stats’:

    filter, lag

The following objects are masked from ‘package:base’:

    intersect, setdiff, setequal, union

Warning message:
“package ‘bindrcpp’ was built under R version 3.4.4”
SampleIDTissueSampleGroupSubjectIDOrganismRaceSampleNameGenderFullPathToFqR1FullPathToFqR2
SRR1551069 Whole.Blood monocytes 53 Homo sapiens White 53_Monocytes female /project/BICF/s166458/pipeline_devel/SRR1551069_1.fastq.gz/project/BICF/s166458/pipeline_devel/SRR1551069_2.fastq.gz
SRR1551055 Whole.Blood monocytes 21 Homo sapiens White 21_Monocytes female /project/BICF/s166458/pipeline_devel/SRR1551055_1.fastq.gz/project/BICF/s166458/pipeline_devel/SRR1551055_2.fastq.gz
SRR1551048 Whole.Blood monocytes 20 Homo sapiens White 20_Monocytes female /project/BICF/s166458/pipeline_devel/SRR1551048_1.fastq.gz/project/BICF/s166458/pipeline_devel/SRR1551048_2.fastq.gz
SRR1550987 Whole.Blood monocytes 44 Homo sapiens Hispanic 44_Monocytes female /project/BICF/s166458/pipeline_devel/SRR1550987_1.fastq.gz/project/BICF/s166458/pipeline_devel/SRR1550987_2.fastq.gz
SampleIDTissueSampleGroupSubjectIDOrganismRaceSampleNameGenderFullPathToFqR1FullPathToFqR2
SRR1551069 Whole.Blood monocytes 53 Homo sapiens White 53_Monocytes female /project/BICF/s166458/pipeline_devel/SRR1551069_1.fastq.gz/project/BICF/s166458/pipeline_devel/SRR1551069_2.fastq.gz
In [157]:
arrange(tbl1,SampleGroup,SubjectID)
SampleIDTissueSampleGroupSubjectIDOrganismRaceSampleNameGenderFullPathToFqR1FullPathToFqR2
SRR1551049 Whole.Blood B-Cells 20 Homo sapiens White 20_Bcells female /project/BICF/s166458/pipeline_devel/SRR1551049_1.fastq.gz/project/BICF/s166458/pipeline_devel/SRR1551049_2.fastq.gz
SRR1551056 Whole.Blood B-Cells 21 Homo sapiens White 21_Bcells female /project/BICF/s166458/pipeline_devel/SRR1551056_1.fastq.gz/project/BICF/s166458/pipeline_devel/SRR1551056_2.fastq.gz
SRR1550988 Whole.Blood B-Cells 44 Homo sapiens Hispanic 44_Bcells female /project/BICF/s166458/pipeline_devel/SRR1550988_1.fastq.gz/project/BICF/s166458/pipeline_devel/SRR1550988_2.fastq.gz
SRR1551070 Whole.Blood B-Cells 53 Homo sapiens White 53_Bcells female /project/BICF/s166458/pipeline_devel/SRR1551070_1.fastq.gz/project/BICF/s166458/pipeline_devel/SRR1551070_2.fastq.gz
SRR1551050 Whole.Blood CD4 20 Homo sapiens White 20_CD4T female /project/BICF/s166458/pipeline_devel/SRR1551050_1.fastq.gz/project/BICF/s166458/pipeline_devel/SRR1551050_2.fastq.gz
SRR1551057 Whole.Blood CD4 21 Homo sapiens White 21_CD4T female /project/BICF/s166458/pipeline_devel/SRR1551057_1.fastq.gz/project/BICF/s166458/pipeline_devel/SRR1551057_2.fastq.gz
SRR1550989 Whole.Blood CD4 44 Homo sapiens Hispanic 44_CD4T female /project/BICF/s166458/pipeline_devel/SRR1550989_1.fastq.gz/project/BICF/s166458/pipeline_devel/SRR1550989_2.fastq.gz
SRR1551071 Whole.Blood CD4 53 Homo sapiens White 53_CD4T female /project/BICF/s166458/pipeline_devel/SRR1551071_1.fastq.gz/project/BICF/s166458/pipeline_devel/SRR1551071_2.fastq.gz
SRR1551051 Whole.Blood CD8 20 Homo sapiens White 20_CD8T female /project/BICF/s166458/pipeline_devel/SRR1551051_1.fastq.gz/project/BICF/s166458/pipeline_devel/SRR1551051_2.fastq.gz
SRR1551058 Whole.Blood CD8 21 Homo sapiens White 21_CD8T female /project/BICF/s166458/pipeline_devel/SRR1551058_1.fastq.gz/project/BICF/s166458/pipeline_devel/SRR1551058_2.fastq.gz
SRR1550990 Whole.Blood CD8 44 Homo sapiens Hispanic 44_CD8T female /project/BICF/s166458/pipeline_devel/SRR1550990_1.fastq.gz/project/BICF/s166458/pipeline_devel/SRR1550990_2.fastq.gz
SRR1551072 Whole.Blood CD8 53 Homo sapiens White 53_CD8T female /project/BICF/s166458/pipeline_devel/SRR1551072_1.fastq.gz/project/BICF/s166458/pipeline_devel/SRR1551072_2.fastq.gz
SRR1551048 Whole.Blood monocytes 20 Homo sapiens White 20_Monocytes female /project/BICF/s166458/pipeline_devel/SRR1551048_1.fastq.gz/project/BICF/s166458/pipeline_devel/SRR1551048_2.fastq.gz
SRR1551055 Whole.Blood monocytes 21 Homo sapiens White 21_Monocytes female /project/BICF/s166458/pipeline_devel/SRR1551055_1.fastq.gz/project/BICF/s166458/pipeline_devel/SRR1551055_2.fastq.gz
SRR1550987 Whole.Blood monocytes 44 Homo sapiens Hispanic 44_Monocytes female /project/BICF/s166458/pipeline_devel/SRR1550987_1.fastq.gz/project/BICF/s166458/pipeline_devel/SRR1550987_2.fastq.gz
SRR1551069 Whole.Blood monocytes 53 Homo sapiens White 53_Monocytes female /project/BICF/s166458/pipeline_devel/SRR1551069_1.fastq.gz/project/BICF/s166458/pipeline_devel/SRR1551069_2.fastq.gz
SRR1551047 Whole.Blood neutrophils 20 Homo sapiens White 20_Neutrophils female /project/BICF/s166458/pipeline_devel/SRR1551047_1.fastq.gz/project/BICF/s166458/pipeline_devel/SRR1551047_2.fastq.gz
SRR1551054 Whole.Blood neutrophils 21 Homo sapiens White 21_Neutrophils female /project/BICF/s166458/pipeline_devel/SRR1551054_1.fastq.gz/project/BICF/s166458/pipeline_devel/SRR1551054_2.fastq.gz
SRR1550986 Whole.Blood neutrophils 44 Homo sapiens Hispanic 44_Neutrophils female /project/BICF/s166458/pipeline_devel/SRR1550986_1.fastq.gz/project/BICF/s166458/pipeline_devel/SRR1550986_2.fastq.gz
SRR1551068 Whole.Blood neutrophils 53 Homo sapiens White 53_Neutrophils female /project/BICF/s166458/pipeline_devel/SRR1551068_1.fastq.gz/project/BICF/s166458/pipeline_devel/SRR1551068_2.fastq.gz
SRR1551052 Whole.Blood NK 20 Homo sapiens White 20_NK female /project/BICF/s166458/pipeline_devel/SRR1551052_1.fastq.gz/project/BICF/s166458/pipeline_devel/SRR1551052_2.fastq.gz
SRR1551059 Whole.Blood NK 21 Homo sapiens White 21_NK female /project/BICF/s166458/pipeline_devel/SRR1551059_1.fastq.gz/project/BICF/s166458/pipeline_devel/SRR1551059_2.fastq.gz
SRR1550991 Whole.Blood NK 44 Homo sapiens Hispanic 44_NK female /project/BICF/s166458/pipeline_devel/SRR1550991_1.fastq.gz/project/BICF/s166458/pipeline_devel/SRR1550991_2.fastq.gz
SRR1551073 Whole.Blood NK 53 Homo sapiens White 53_NK female /project/BICF/s166458/pipeline_devel/SRR1551073_1.fastq.gz/project/BICF/s166458/pipeline_devel/SRR1551073_2.fastq.gz
SRR1551053 Whole.Blood whole_blood 20 Homo sapiens White 20_Tempus female /project/BICF/s166458/pipeline_devel/SRR1551053_1.fastq.gz/project/BICF/s166458/pipeline_devel/SRR1551053_2.fastq.gz
SRR1551060 Whole.Blood whole_blood 21 Homo sapiens White 21_Tempus female /project/BICF/s166458/pipeline_devel/SRR1551060_1.fastq.gz/project/BICF/s166458/pipeline_devel/SRR1551060_2.fastq.gz
SRR1550981 Whole.Blood whole_blood 44 Homo sapiens Hispanic 44_Tempus female /project/BICF/s166458/pipeline_devel/SRR1550981_1.fastq.gz/project/BICF/s166458/pipeline_devel/SRR1550981_2.fastq.gz
SRR1551074 Whole.Blood whole_blood 53 Homo sapiens White 53_Tempus female /project/BICF/s166458/pipeline_devel/SRR1551074_1.fastq.gz/project/BICF/s166458/pipeline_devel/SRR1551074_2.fastq.gz
In [158]:
select(tbl1,SampleGroup,SubjectID)
SampleGroupSubjectID
whole_blood53
NK 53
CD8 53
CD4 53
B-Cells 53
monocytes 53
neutrophils53
whole_blood21
NK 21
CD8 21
CD4 21
B-Cells 21
monocytes 21
neutrophils21
whole_blood20
NK 20
CD8 20
CD4 20
B-Cells 20
monocytes 20
neutrophils20
NK 44
CD8 44
CD4 44
B-Cells 44
monocytes 44
neutrophils44
whole_blood44
In [159]:
head(tbl1)
SampleIDTissueSampleGroupSubjectIDOrganismRaceSampleNameGenderFullPathToFqR1FullPathToFqR2
SRR1551074 Whole.Blood whole_blood 53 Homo sapiens White 53_Tempus female /project/BICF/s166458/pipeline_devel/SRR1551074_1.fastq.gz/project/BICF/s166458/pipeline_devel/SRR1551074_2.fastq.gz
SRR1551073 Whole.Blood NK 53 Homo sapiens White 53_NK female /project/BICF/s166458/pipeline_devel/SRR1551073_1.fastq.gz/project/BICF/s166458/pipeline_devel/SRR1551073_2.fastq.gz
SRR1551072 Whole.Blood CD8 53 Homo sapiens White 53_CD8T female /project/BICF/s166458/pipeline_devel/SRR1551072_1.fastq.gz/project/BICF/s166458/pipeline_devel/SRR1551072_2.fastq.gz
SRR1551071 Whole.Blood CD4 53 Homo sapiens White 53_CD4T female /project/BICF/s166458/pipeline_devel/SRR1551071_1.fastq.gz/project/BICF/s166458/pipeline_devel/SRR1551071_2.fastq.gz
SRR1551070 Whole.Blood B-Cells 53 Homo sapiens White 53_Bcells female /project/BICF/s166458/pipeline_devel/SRR1551070_1.fastq.gz/project/BICF/s166458/pipeline_devel/SRR1551070_2.fastq.gz
SRR1551069 Whole.Blood monocytes 53 Homo sapiens White 53_Monocytes female /project/BICF/s166458/pipeline_devel/SRR1551069_1.fastq.gz/project/BICF/s166458/pipeline_devel/SRR1551069_2.fastq.gz
In [163]:
tbl <- read.csv(file='heightweight.csv')
In [164]:
head(tbl)
sexageYearageMonthheightInweightLb
f 11.91667143 56.3 85.0
f 12.91667155 62.3 105.0
f 12.75000153 63.3 108.0
f 13.41667161 59.0 92.0
f 15.91667191 62.5 112.5
f 14.25000171 62.5 112.0
In [165]:
head(mutate(tbl,weightKg=weightLb/2.2))
sexageYearageMonthheightInweightLbweightKg
f 11.91667143 56.3 85.0 38.63636
f 12.91667155 62.3 105.0 47.72727
f 12.75000153 63.3 108.0 49.09091
f 13.41667161 59.0 92.0 41.81818
f 15.91667191 62.5 112.5 51.13636
f 14.25000171 62.5 112.0 50.90909
In [166]:
summarize(tbl,mean.height=mean(heightIn))
summarize(group_by(tbl,sex),mean.height=mean(heightIn))
summarize(group_by(tbl,sex),mean.height=mean(heightIn),mean.weight=mean(weightLb))
mean.height
61.36456
sexmean.height
f 60.52613
m 62.10317
sexmean.heightmean.weight
f 60.52613 98.87838
m 62.10317 103.44841

Lists

An ordered collection of objects (components). A list allows you to gather a variety of (possibly unrelated) objects under one name.

In [ ]:
w <- list(name="Fred", mynumbers=a, mymatrix=y, age=5.3)

Workspace Functions

  • lists the objects in your workspace
    • ls()
  • removes an object in your workspace
    • rm(object1,object2)
  • removes all objects in your workspace
    • rm(list=ls())
  • saves R objects to a file
    • save(object1,object2,file=“file.RData”)
  • load an R object from a file
    • load(“file.Rdata”)
  • find current working directory
    • getwd()
  • set working directory
    • setwd(‘C:/workingDirectory’) #
  • quit
    • quit()
  • list all packages available to load
    • library()
  • load package
    • library(package)
    • require(package)
  • Install Packages