# Timings

In [1]:
library("ggpubr")
library(readr)
library(ggplot2)
library(tidyverse)
library(ARTool)
library(emmeans)
library(multcomp)
library(car)
library(rstatix)

Loading required package: ggplot2

-- [1mAttaching packages[22m --------------------------------------- tidyverse 1.3.1 --

[32mv[39m [34mtibble [39m 3.1.5     [32mv[39m [34mdplyr  [39m 1.0.7
[32mv[39m [34mtidyr  [39m 1.1.4     [32mv[39m [34mstringr[39m 1.4.0
[32mv[39m [34mpurrr  [39m 0.3.4     [32mv[39m [34mforcats[39m 0.5.1

-- [1mConflicts[22m ------------------------------------------ tidyverse_conflicts() --
[31mx[39m [34mdplyr[39m::[32mfilter()[39m masks [34mstats[39m::filter()
[31mx[39m [34mdplyr[39m::[32mlag()[39m    masks [34mstats[39m::lag()

Loading required package: mvtnorm

Loading required package: survival

Loading required package: TH.data

Loading required package: MASS


Attaching package: 'MASS'


The following object is masked from 'package:dplyr':

    select



Attaching package: 'TH.data'


The following object is masked from 'package:MASS':

    geyser


Loading required package: carData


Attaching package: 'car'


Th

In [2]:
timings <- read_csv("timings.csv") %>%
    rename(question = `...1`) %>%
    pivot_longer(!question, names_to=c("retriever", "reader", "method"), names_sep="[._]", values_to="time")

timings$retriever <- as.factor(timings$retriever)
timings$reader    <- as.factor(timings$reader)
timings$method    <- as.factor(timings$method)

head(timings)

New names:
* `` -> ...1

[1mRows: [22m[34m59[39m [1mColumns: [22m[34m9[39m
[36m--[39m [1mColumn specification[22m [36m--------------------------------------------------------[39m
[1mDelimiter:[22m ","
[32mdbl[39m (9): ...1, faiss_dpr.retrieve, faiss_dpr.read, faiss_longformer.retrieve...

[36mi[39m Use `spec()` to retrieve the full column specification for this data.
[36mi[39m Specify the column types or set `show_col_types = FALSE` to quiet this message.


question,retriever,reader,method,time
<dbl>,<fct>,<fct>,<fct>,<dbl>
0,faiss,dpr,retrieve,0.30384302
0,faiss,dpr,read,4.56640005
0,faiss,longformer,retrieve,0.92279482
0,faiss,longformer,read,5.76836824
0,es,dpr,retrieve,0.01930094
0,es,dpr,read,2.7453649


In [3]:
timings_read <- filter(timings, method == "read") %>%
    select(!method)
timings_retrieve <- filter(timings, method == "retrieve") %>%
    select(!method)

To test which tests we can use, we need to check for normality. For this, we use a Shapiro-Wilk test of normality. As you can see in the results below, all $p$-values are lower than 0.001, so we reject the null-hypothesis of normality and now know that none of the f1-scores are normally distributed.

In [4]:
timings %>%
    group_by(retriever) %>%
    summarise(sw.stat = shapiro.test(time)$statistic,
              sw.p = shapiro.test(time)$p)

timings %>%
    group_by(reader) %>%
    summarise(sw.stat = shapiro.test(time)$statistic,
              sw.p = shapiro.test(time)$p)

timings %>%
    group_by(method) %>%
    summarise(sw.stat = shapiro.test(time)$statistic,
              sw.p = shapiro.test(time)$p)

retriever,sw.stat,sw.p
<fct>,<dbl>,<dbl>
es,0.7534261,1.667341e-18
faiss,0.7585727,2.563192e-18


reader,sw.stat,sw.p
<fct>,<dbl>,<dbl>
dpr,0.7639005,4.029344e-18
longformer,0.8116362,3.381683e-16


method,sw.stat,sw.p
<fct>,<dbl>,<dbl>
read,0.8838182,1.779766e-12
retrieve,0.6237773,1.838892e-22


Since our data is not normally distributed, we cannot use an ANOVA to compare our results. Therefore, we use an aligned-rank test, which is a non-parameteric version of a factorial repeated measures ANOVA.

In [5]:
model.acc <- art(time ~ retriever * reader, data = timings_read)
anova(model.acc)
art.con(model.acc, ~ retriever)
art.con(model.acc, ~ reader)

Unnamed: 0_level_0,Term,Df,Df.res,Sum Sq,Sum Sq.res,F value,Pr(>F)
Unnamed: 0_level_1,<chr>,<dbl>,<dbl>,<dbl>,<dbl>,<dbl>,<dbl>
retriever,retriever,1,232,41088.97,1037631.8,9.18692,0.002714084
reader,reader,1,232,790427.81,301414.1,608.39633,8.80273e-67
retriever:reader,retriever:reader,1,232,101903.46,983331.4,24.04235,1.771995e-06


NOTE: Results may be misleading due to involvement in interactions



 contrast   estimate   SE  df t.ratio p.value
 es - faiss     26.4 8.71 232   3.031  0.0027

Results are averaged over the levels of: reader 

NOTE: Results may be misleading due to involvement in interactions



 contrast         estimate   SE  df t.ratio p.value
 dpr - longformer     -116 4.69 232 -24.666  <.0001

Results are averaged over the levels of: retriever 

In [6]:
model.acc <- art(time ~ retriever * reader, data = timings_retrieve)
anova(model.acc)
art.con(model.acc, ~ retriever)
art.con(model.acc, ~ reader)

Unnamed: 0_level_0,Term,Df,Df.res,Sum Sq,Sum Sq.res,F value,Pr(>F)
Unnamed: 0_level_1,<chr>,<dbl>,<dbl>,<dbl>,<dbl>,<dbl>,<dbl>
retriever,retriever,1,232,821516,240071.9,793.8944,7.630526e-77
reader,reader,1,232,821516,214935.3,886.7398,3.256422e-81
retriever:reader,retriever:reader,1,232,821516,215501.6,884.4096,4.148583e-81


NOTE: Results may be misleading due to involvement in interactions



 contrast   estimate   SE  df t.ratio p.value
 es - faiss     -118 4.19 232 -28.176  <.0001

Results are averaged over the levels of: reader 

NOTE: Results may be misleading due to involvement in interactions



 contrast         estimate   SE  df t.ratio p.value
 dpr - longformer     -118 3.96 232 -29.778  <.0001

Results are averaged over the levels of: retriever 