HELLO! This file attemps to analyze an ELISA dataset that was retrieved from the gtools package in R.

Setup

R has different ways of keeping record of code/analysis such as RShiny or RMarkdown. This analysis project will be recorded using Rmarkdown. The analysis is split into code chunks which reference main.R and each chunk has a brief explanation of what it does .. just like this one! Calling the source() functions allows me to grab code from a different file called main.R. The knitr package will neatly incorporate and format code into text and can be used to specify settings to apply to all code chunks or just a specific one. This file can be processed into different types of documents formats such as word, powerpoint, or webpages. For the sake of this example, I have decided to report this code in HTML.

source("main.R") # load all our wonderful functions
## ── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
## ✔ dplyr     1.1.2     ✔ readr     2.1.4
## ✔ forcats   1.0.0     ✔ stringr   1.5.1
## ✔ ggplot2   3.5.0     ✔ tibble    3.2.1
## ✔ lubridate 1.9.2     ✔ tidyr     1.3.0
## ✔ purrr     1.0.1     
## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## ✖ dplyr::filter() masks stats::filter()
## ✖ dplyr::lag()    masks stats::lag()
## ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors
knitr::opts_chunk$set(echo = TRUE, fig.height = 10) 

Load the data using load_data() and store into a variable

Original data was in .rda format and .rda files allows users to save R data structures such as vectors, matrices, and data frames. Saving and loading data in rda format might be very useful when working with large datasets that you want to clear from your memory, but you also would like to save for later. In this example, I loaded the rda file using load() function and saved it as a csv file called ELISA.csv.

Below, ELISA.csv is being sent to load_data() in the main.R file which will load the csv into a tibble and then merge PlateDay and Read columns into col called platedayNO. Looking at the first couple of rows in the dataset, duplicates are replicated row-wise…

datax <- load_data("ELISA.csv") 
## Rows: 504 Columns: 5
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: ","
## chr (2): PlateDay, Description
## dbl (3): Read, Concentration, Signal
## 
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
library(DT)
datatable(datax)

Pivot Columns

Because data is formatted row-wise, this makes it difficult to analyze and compute means for duplicates. This function converts the data so that it pivots wider and assigns each duplicate their own column such as Concentration_A and Concentration_B.

gr_dup <- rep(1:(nrow(datax)/2), each = 2) #group technical replicates together  
gr_AB <- rep(c('A', 'B'), len = nrow(datax)) #separate replicate sets called A and B
colnames <- c('platedayNO', 'Description', 'Concentration','Signal') # col names to keep 
datax <- data_wrangle(datax, gr_dup, gr_AB, colnames) 
## Warning: Using an external vector in selections was deprecated in tidyselect 1.1.0.
## ℹ Please use `all_of()` or `any_of()` instead.
##   # Was:
##   data %>% select(colnames)
## 
##   # Now:
##   data %>% select(all_of(colnames))
## 
## See <https://tidyselect.r-lib.org/reference/faq-external-vector.html>.
## This warning is displayed once every 8 hours.
## Call `lifecycle::last_lifecycle_warnings()` to see where this warning was
## generated.
datax <- datax %>% 
  rename(
    concdil_A = Concentration_A,
    concdil_B = Concentration_B)
datax
## # A tibble: 252 × 7
##    grp_dup platedayNO_A      Description_A concdil_A concdil_B Signal_A Signal_B
##      <int> <chr>             <chr>             <dbl>     <dbl>    <dbl>    <dbl>
##  1       1 Plate 1 (Day 1) 1 Standard         500       500       2.75     2.59 
##  2       2 Plate 1 (Day 1) 1 Standard         200       200       1.38     1.35 
##  3       3 Plate 1 (Day 1) 1 Standard          80        80       0.713    0.746
##  4       4 Plate 1 (Day 1) 1 Standard          32        32       0.462    0.466
##  5       5 Plate 1 (Day 1) 1 Standard          12.8      12.8     0.375    0.356
##  6       6 Plate 1 (Day 1) 1 Standard           5.12      5.12    0.335    0.335
##  7       7 Plate 1 (Day 1) 1 Standard           2.05      2.05    0.318    0.287
##  8       8 Plate 1 (Day 1) 1 BLANK              0         0       0.284    0.295
##  9       9 Plate 1 (Day 1) 1 Quality Cont…    312.      312.      1.90     1.97 
## 10      10 Plate 1 (Day 1) 1 Quality Cont…    156.      156.      1.01     1.07 
## # ℹ 242 more rows

Find Mean Concentration

the header explains it all…

datax <- datax %>%  
    group_by(grp_dup) %>% 
    mutate(mean_Signal = mean(c(Signal_A, Signal_B))) 
datax
## # A tibble: 252 × 8
## # Groups:   grp_dup [252]
##    grp_dup platedayNO_A      Description_A concdil_A concdil_B Signal_A Signal_B
##      <int> <chr>             <chr>             <dbl>     <dbl>    <dbl>    <dbl>
##  1       1 Plate 1 (Day 1) 1 Standard         500       500       2.75     2.59 
##  2       2 Plate 1 (Day 1) 1 Standard         200       200       1.38     1.35 
##  3       3 Plate 1 (Day 1) 1 Standard          80        80       0.713    0.746
##  4       4 Plate 1 (Day 1) 1 Standard          32        32       0.462    0.466
##  5       5 Plate 1 (Day 1) 1 Standard          12.8      12.8     0.375    0.356
##  6       6 Plate 1 (Day 1) 1 Standard           5.12      5.12    0.335    0.335
##  7       7 Plate 1 (Day 1) 1 Standard           2.05      2.05    0.318    0.287
##  8       8 Plate 1 (Day 1) 1 BLANK              0         0       0.284    0.295
##  9       9 Plate 1 (Day 1) 1 Quality Cont…    312.      312.      1.90     1.97 
## 10      10 Plate 1 (Day 1) 1 Quality Cont…    156.      156.      1.01     1.07 
## # ℹ 242 more rows
## # ℹ 1 more variable: mean_Signal <dbl>

Blank Correction

In order to account for background noise, and extract a true absorbency value, a blank correction has to be implemented. Below is a an embedded R code chunk doing just that. The dataset called datax is passed along to blank_correction() and the function will subtract the blank from the remaining signal values.

datax <- blank_correction(datax) 
datatable(datax)

Simple Linear Regression Model to Find the Best Fitting Line

To find the best fitting line for the standard set, a linear regression model is applied using the lm() function in R to find the best fitting line and coefficients with minimal error. The code block below is applying a linear_model(x,y) against the data where x is the predictor variable and y is the outcome variable.

Predictor Variable X = Theoretical Concentration

Outcome Variable Y = Mean Signal

The resulting table shows the coefficient and slope value for each plate.

standards <- datax %>% 
  filter(Description_A == 'Standard') %>%  
  group_by(platedayNO_A)

lin_mod <- standards %>% 
  do(lin_coeff = coef(lm(mean_Signal ~ concdil_A, data = .))) %>% 
  summarise(plateday_lm = platedayNO_A, b=lin_coeff[1], slope=lin_coeff[2])

lin_mod
## # A tibble: 12 × 3
##    plateday_lm            b   slope
##    <chr>              <dbl>   <dbl>
##  1 Plate 1 (Day 1) 1 0.0341 0.00477
##  2 Plate 1 (Day 1) 2 0.0358 0.00476
##  3 Plate 1 (Day 1) 3 0.0359 0.00476
##  4 Plate 2 (Day 1) 1 0.0238 0.00478
##  5 Plate 2 (Day 1) 2 0.0267 0.00472
##  6 Plate 2 (Day 1) 3 0.0288 0.00467
##  7 Plate 3 (Day 2) 1 0.0253 0.00557
##  8 Plate 3 (Day 2) 2 0.0267 0.00552
##  9 Plate 3 (Day 2) 3 0.0298 0.00545
## 10 Plate 4 (Day 2) 1 0.0418 0.00486
## 11 Plate 4 (Day 2) 2 0.0415 0.00486
## 12 Plate 4 (Day 2) 3 0.0403 0.00484

Solving for the Concentration

After finding the coefficient and slope, we can now solve for the concentration and because our coefficients were fit against a linear model, we can assume the line equation of y = mx + b.

Using our values from above, and plugging it into the equation we can now add our concentration results.

datax <- solve_for_x(datax, lin_mod) %>% 
  mutate_if(is.double,  round, 3)
## `mutate_if()` ignored the following grouping variables:
## • Columns `grp_dup`, `platedayNO_A`
datatable(datax)

Calculate mean_Concentration and find coefficient of variation

Below is a an embedded R code chunk computing the mean concentration of groups A and B.
The dataset called datax is passed along to find_cv() and the function will calculate the mean of two groups and then find std dev for the technical replicates and the variation from the mean, in other words, the coefficient variation.

datax <- find_cv(datax)
datax <- datax %>% mutate(across(where(is.numeric), round, 3))
## Warning: There was 1 warning in `mutate()`.
## ℹ In argument: `across(where(is.numeric), round, 3)`.
## ℹ In group 1: `grp_dup = 1`.
## Caused by warning:
## ! The `...` argument of `across()` is deprecated as of dplyr 1.1.0.
## Supply arguments directly to `.fns` through an anonymous function instead.
## 
##   # Previously
##   across(a:b, mean, na.rm = TRUE)
## 
##   # Now
##   across(a:b, \(x) mean(x, na.rm = TRUE))
datax
## # A tibble: 252 × 13
## # Groups:   grp_dup [252]
##    grp_dup platedayNO_A      Description_A concdil_A concdil_B Signal_A Signal_B
##      <int> <chr>             <chr>             <dbl>     <dbl>    <dbl>    <dbl>
##  1       1 Plate 1 (Day 1) 1 Standard         500       500       2.47     2.29 
##  2       2 Plate 1 (Day 1) 1 Standard         200       200       1.10     1.06 
##  3       3 Plate 1 (Day 1) 1 Standard          80        80       0.429    0.451
##  4       4 Plate 1 (Day 1) 1 Standard          32        32       0.178    0.171
##  5       5 Plate 1 (Day 1) 1 Standard          12.8      12.8     0.091    0.061
##  6       6 Plate 1 (Day 1) 1 Standard           5.12      5.12    0.051    0.04 
##  7       7 Plate 1 (Day 1) 1 Standard           2.05      2.05    0.034    0.008
##  8       8 Plate 1 (Day 1) 1 BLANK              0         0       0        0    
##  9       9 Plate 1 (Day 1) 1 Quality Cont…    312.      312.      1.62     1.67 
## 10      10 Plate 1 (Day 1) 1 Quality Cont…    156.      156.      0.724    0.773
## # ℹ 242 more rows
## # ℹ 6 more variables: mean_Signal <dbl>, Concentration_A <dbl>,
## #   Concentration_B <dbl>, mean_Concentration <dbl>, sd_c <dbl>, cv <dbl>

Standards Plot

Below is facet grid plotting standards for all plates in this dataset.

Caclulate LCL and UCL

Using shewhart charts (levey jenning) to determine if variables remain within the upper and lower control range.

If the mean concentration is the average variable for each subgroup (qc levels), then we can find the grand mean by adding all the qc level averages and dividing by the number of qc levels which is 7.

Although this dataset does contain enough data points to pass CLT and assume normality. We cannot use pdc to determine std dev from the mean bc we don’t know the population variance, so an estimated paramter is used instead

After getting the standard deviations for each subgroup, we can then calculate the average std dev for each of the subgroups by adding them and dividing by the total. The next challenge is to correct the std by using a theortical factor (an). Finally, the upper and lower control limits are retrieved for each QC level and are shown below in the table.

# IDK why im doing this 
QC_all <- extract_qc(datax, 'Plate 1|Plate 2|Plate 3|Plate 4')
# get the number of how many plates were ran (k=subgroups)
N <- unique(QC_all$platedayNO_A) %>% length()
# assign group for each QC levels (1 = 312.500, 2= 156.250, 3= 78.125,  4= 39.100 , 5= 19.500, 6=13.000, 7=9.750 ng/mL)
QC_all$qc_lvl <- rep(1:(nrow(QC_all)/N), length = nrow(QC_all))
CL <- calculate_LUCL(QC_all, N)
CL
## # A tibble: 7 × 4
##       low grand_mean   high qc_lvl
##     <dbl>      <dbl>  <dbl>  <int>
## 1 327.        339.   351.        1
## 2 157.        160.   164.        2
## 3  75.3        79.8   84.3       3
## 4  33.4        36.5   39.6       4
## 5  12.5        15.7   18.9       5
## 6   5.61        7.30   9.00      6
## 7  -0.390       7.81  16.0       7
QC_all <- QC_all %>% left_join(CL, by = c('qc_lvl') )

QC Plot

Below is a 7 figure plot for each QC level (1 = 312.500, 2 = 156.250, 3 = 78.125, 4 = 39.100 , 5 = 19.500, 6 = 13.000, 7 = 9.750 ng/mL) and the data points from each plate. It is shown that some QCS fall out of the upper and lower control limit range and should be omitted from the data and the LC and UC should be recomputed.

Recalculate LCL and UCL after Filter

Before recalculating the LCL and UCL, the dataset should be filtered down to exclude any QCS higher or lower than the control range. After doing so, only 52 QC data points remain in the dataset.

QC_filter <- QC_all %>% filter(mean_Concentration >= low & mean_Concentration <= high)
CL <- calculate_LUCL(QC_filter, N)
QC_filter <- QC_filter %>% left_join(CL, by = c('qc_lvl') )
QC_filter
## # A tibble: 52 × 20
## # Groups:   platedayNO_A [12]
##    grp_dup platedayNO_A      Description_A concdil_A concdil_B Signal_A Signal_B
##      <int> <chr>             <chr>             <dbl>     <dbl>    <dbl>    <dbl>
##  1       9 Plate 1 (Day 1) 1 Quality Cont…    312.      312.      1.62     1.67 
##  2      12 Plate 1 (Day 1) 1 Quality Cont…     39.1      39.1     0.195    0.208
##  3      13 Plate 1 (Day 1) 1 Quality Cont…     19.5      19.5     0.113    0.096
##  4      14 Plate 1 (Day 1) 1 Quality Cont…     13        13       0.074    0.048
##  5      15 Plate 1 (Day 1) 1 Quality Cont…      9.75      9.75    0.054    0.023
##  6      30 Plate 1 (Day 1) 2 Quality Cont…    312.      312.      1.63     1.67 
##  7      33 Plate 1 (Day 1) 2 Quality Cont…     39.1      39.1     0.195    0.209
##  8      34 Plate 1 (Day 1) 2 Quality Cont…     19.5      19.5     0.114    0.096
##  9      35 Plate 1 (Day 1) 2 Quality Cont…     13        13       0.074    0.056
## 10      36 Plate 1 (Day 1) 2 Quality Cont…      9.75      9.75    0.056    0.025
## # ℹ 42 more rows
## # ℹ 13 more variables: mean_Signal <dbl>, Concentration_A <dbl>,
## #   Concentration_B <dbl>, mean_Concentration <dbl>, sd_c <dbl>, cv <dbl>,
## #   qc_lvl <int>, low.x <dbl>, grand_mean.x <dbl>, high.x <dbl>, low.y <dbl>,
## #   grand_mean.y <dbl>, high.y <dbl>

Recalculated QC Plot

Below is recalculated plot for the new LCL and UCL. Although the range is tighter for each QC level, there is not enough data points to qualify the QCs and future work should aim to collect more data and compute error probability such as false postives or false negatives.

Note that the echo = TRUE parameter was added to the code chunk to print the R code that generated the plot.