Introduction
Libraries
The sampling distribution of the mean revisited
- Estimating the sampling error from the sample standard deviation
What happens when our sample size is small?
The t-distribution is the appropriate distribution for describing the sampling distribution of the mean when $n$ is small
Confidence intervals for the mean based on the t-distribution
- Critical values of t for confidence intervals of the mean
In-class Assignment #1
Hypothesis test for the mean using the t-distribution
- One sample t-test
- The t.test function in R
In-class Assignment #2
- Two sample t-test
In-class Assignment #3
In-class Assignment #4
- Paired t-test

Introduction

Today we’re going to explore how to use the t-distribution and t-tests to calculate confidence intervals and carry out hypothesis tests related to means.

In a previous lecture we talked about confidence intervals for the mean, but we made an important assumption – that our samples were relatively large (>30 observations). Today we’re going to relax that assumption and see how this change makes the t-distribution more appropriate than the normal distribution for calculating confidence intervals for the mean and carrying out hypothesis tests related to the mean.

Libraries

library(tidyverse)
library(magrittr)

The sampling distribution of the mean revisited

Under the assumption that:

The observations in are sample are independent
The observations come from a population with a nearly normal distribution, $\sim N(\mu,\sigma)$
The sample size is relatively large ($n \geq 30$)

We learned that the sampling distribution of the mean for samples of size $n$ is $\sim N(\mu,\sigma/\sqrt{n})$.

The standard deviation of the sampling distribution of the mean, $\sigma/\sqrt{n}$, is the “standard error of the mean” (${SE}_\mu$).

Estimating the sampling error from the sample standard deviation

In general, we do not know the population paramters, $\mu$ and $\sigma$, so instead we use the information we do have:

the sample mean $\overline{x}$ is our estimator of $\mu$
the sample standard deviation $s_x$ is our estimate of $\sigma$

We can then use our sample standard deviation to estimate the standard error of the mean as so:

\[ SE_\overline{x} = \frac{s_x}{\sqrt{n}} \]

Such estimates of the standard error feed into our calculations of confidence intervals and p-values. For example, we calculate confidence intervals in terms of multiples of the SE:

\[ {CI}_\overline{x} = \overline{x} \pm (z \times {SE}_\overline{x}) \]

What happens when our sample size is small?

One of the important assumptions underlying the discussion above was that the size of our sample was relatively large.

We would hope that, regardless of sample size, the sampling distributions of both the mean and standard deviation should be centered around the true population value, $\mu$ and $\sigma$ respectively.

Let’s use simulation to explore how well this is expectation is met for small samples – say 3, 5, 10, or 20 observations?

Simulating sampling distributions

We first setup some parameters for doing our simulations:

# set seed for random number generations
set.seed(20161107)

# setup the parameters of the popn distribution we're sampling from
mu <- 0
sigma <- 1

# list of sample sizes we'll generate
ssizes <- c(2, 3, 4, 5, 7, 10, 20, 30)

# number of simulations to carry out *for each sample size*
nsims <- 2500

We then simulate the sampling distribution of the mean and standard deviation for samples of varying size (we also calculate some other values which will be useful for our exposition below):

df.combined <- data_frame(sample.size = double(), sample.mean = double(), sample.sd = double(),
                          estimated.SE = double(), sample.zscore = double())

for (i in ssizes) {
  # create empty vectors to hold simulation stats (for efficiency)
  s.means <- rep(NA, nsims)
  s.sds <- rep(NA, nsims)
  s.SEs <- rep(NA, nsims)
  s.zscores <- rep(NA, nsims)
  for (j in 1:nsims) {
    s <- rnorm(i, mean = mu, sd = sigma)  # draw random sample
    s.means[j] <- mean(s)
    s.sds[j] <- sd(s)
    SE <- sd(s)/sqrt(i)
    s.SEs[j] <- SE
    s.zscores[j] <- (mean(s) - mu)/SE
  }
  df <- data.frame(sample.size = i, sample.mean = s.means, sample.sd = s.sds, 
                   estimated.SE = s.SEs, sample.zscore = s.zscores)
  
  df.combined <- bind_rows(df.combined, df)
}

df.combined %<>%
  mutate(sample.sz = as.factor(sample.size))

For small samples, sample standard deviations systematically underestimate the population standard deviation

In our simulation we specified that the mean and the standard deviation of the underlying population, were $\mu = 0$ ad $\sigma = 1$, respectively. Let’s examine how the well centered the sampling distributions of the two statistiscs are around these true values, as a function of sample size.

First a table summarized

by.sample.size <-
  df.combined %>%
  group_by(sample.size) %>%
  summarize(mean.of.means = mean(sample.mean),
            mean.of.sds = mean(sample.sd))

by.sample.size

# A tibble: 8 x 3
  sample.size mean.of.means mean.of.sds
        <dbl>         <dbl>       <dbl>
1           2      0.006602      0.7950
2           3      0.017028      0.8658
3           4     -0.007289      0.9141
4           5      0.003658      0.9446
5           7      0.002972      0.9612
6          10      0.003743      0.9721
7          20      0.002456      0.9843
8          30     -0.002114      0.9922

We see that the sampling distributions of means are well centered around the true mean (zero), and there is no systematic bias one way or the other. By contrast the sampling distribution of standard deviations tends to underestimate the true standard deviation.

We can visualize this bias as shown here:

ggplot(by.sample.size, aes(x = sample.size, y = mean.of.sds)) +
  geom_point(color = 'red') +
  geom_line(color = 'red') +
  geom_hline(yintercept = 1, color = 'black', linetype='dashed') +
  labs(x = "Sample Size", y = "Mean of Sampling Distn of Std Dev") + 
  ylim(0, 1.1)

The source of this bias is clear if we look at the sampling distribution of the standard deviation for samples of size 3, 5, and 30.

filtered.df <- 
  df.combined %>%
  filter(sample.size %in% c(3, 5, 30))

ggplot(filtered.df, aes(x = sample.sd, y = ..density.., fill = sample.sz)) + 
  geom_histogram(bins=50, alpha=0.65) +
  facet_wrap(~sample.size, nrow = 1) +
  geom_vline(xintercept = 1, linetype = 'dashed') +
  labs(x = "Std Deviations", y = "Density",
       title = "Sampling distributions of the standard deviation\nAs a function of sample size")

There’s very clear indication that the the sampling distribution of standard deviations is not centered around 1 for $n=3$ and for $n=5$, however with samples of size 30 the sampling distribution of the standard deviation appears fairly well centered around the true value of the underlying population.

Underestimates of the standard deviation given small $n$ lead to understimates of the SE of the mean

If when sample sizes are small, sample standard deviations, $s_x$, tend to underestime the true standard deviation, $\sigma$, then it follows that sample estimates of the standard error of the mean, $SE_\overline{x} = \frac{s_x}{\sqrt{n}}$, must tend to understimate the true standard error of the mean, $SE_\mu = \frac{\sigma}{\sqrt{n}}$.

This means that when we use $SE_\overline{x}$ to estimate confidence intervals as follows: \[ {CI}_\overline{x} = \overline{x} \pm (z \times {SE}_\overline{x}) \] If we use $z$ values based on the standard Normal distribution, our confidence intervals will be too small for the desired level of confidence. Instead we must turn to an alternate distribution called the “t-distribution” to get appropriate multiples of the standard error.

The t-distribution is the appropriate distribution for describing the sampling distribution of the mean when $n$ is small

The problem associated with estimating the standard error of the mean were recognized in the early 20th century by William Gosset, an employee at the Guinness Brewing Company. He published a paper, under the pseudonym “Student”, giving the appropriate distribution for describing the standard error of the mean as a function of the sample size $n$. Gosset’s distribution is known as the “t-distribution” or “Student’s t-distribution”.

The t-distribution is specified by a single parameter, called degrees of freedom ($df$) where ${df} = n - 1$. As $df$ increases, the t-distribution becomes more and more like the normal such that when $n \geq 30$ it’s nearly indistinguishable from the standard normal distribution.

In the figures below we compare the t-distribution and the normal distribution for sample sizes $n = {4, 8, 32}$.

x <- seq(-6, 6, length.out = 200)
n <- c(5, 10, 30) # sample sizes

distns.df <- data_frame(sample.size = double(), z.or.t = double(),
                        norm.density = double(), t.density = double())
for (i in n) {
  norm.density <- dnorm(x, mean = mu, sd = 1)
  t.density <- dt(x, df = i - 1)
  df.temp <- data_frame(sample.size = i, z.or.t = x,
                        norm.density = norm.density, t.density  = t.density)
  distns.df <- bind_rows(distns.df, df.temp)
}

distns.df %<>% mutate(df = as.factor(sample.size - 1))
  
ggplot(distns.df, aes(x = z.or.t, y = t.density, color = df)) + 
  geom_line() +
  geom_line(aes(y = norm.density), color='black', linetype="dotted") + 
  labs(x = "z or t value", y = "Probablity density",
       title = "Standard normal distribution (black dotted line)\nversus t-distributions for different degrees of freedom")

Confidence intervals for the mean based on the t-distribution

With our knowledge of the t-distribution, we now revisit how to calculate a confidence interval for the mean. To calculate a $(1-\alpha)\%$ confidence interval for the mean, use the equation:

\[ CI_\overline{x} = \overline{x} \pm t_{1-\alpha/2, df} \times SE_\overline{x} \]

Where $t_{1-\alpha/2, df}$ is the critical t-value, corresponding to the $1-\alpha/2$ cutoff of the t-distribution with $df$ degrees of freedom. We use $1-\alpha/2$ to account for the fact that for our uncertainty includes both the right and left tails of the t-distribution.

t.values <- seq(-4, 4, length.out = 200)
t.density <- dt(t.values, df = 4)

left.interval <- seq(-4, qt(0.025, df = 4), length.out = 50)
right.interval <- seq(qt(0.975, df = 4), 4, length.out = 50)

ggplot() +
  geom_line(aes(x = t.values, y = t.density)) +
  geom_area(aes(x = left.interval, y = dt(left.interval, df = 4)), fill = "firebrick") +  
  geom_area(aes(x = right.interval, y = dt(right.interval, df = 4)), fill = "firebrick") +  
  annotate("text", x = -3.25, y = 0.05, label = "2.5% left") +
  annotate("text", x = 3.25, y = 0.05, label = "2.5% right") +
  labs(x = "t", y = "Probability density",
       title = "Critical values of t-distribution with df = 4\ncorresponding to 5% cutoff" )

Critical values of t for confidence intervals of the mean

The function qt() (quantile function for t-distribution) calculates the appropriate critical t-value. For example, to calculate $t_{0.975, 4}$, which corresponds to the left border of the region labeled “2.5% right” in the figure above, we do:

qt(0.975, 4)

[1] 2.776

In-class Assignment #1

For each of the following, the mean, standard deviation, sample size, and desired confidence interval are given. Find the critical values of $t$ required for a confidence interval of the mean. [0.5 pts each]

$\overline{Y} = 14$, $s_y = 32$, $n = 12$, 95% confidence interval
$\overline{Y} = -23$, $s_y = 12$, $n = 32$, 95% confidence interval
$\overline{Y} = 144$, $s_y = 2.1$, $n = 101$, 99% confidence interval
$\overline{Y} = 3.21$, $s_y = 1.83$, $n = 23$, 95% confidence interval
$\overline{Y} = -152$, $s_y = 38$, $n = 8$, 99% confidence interval

Example: gene expression in mice following administration of a drug

You treat a sample of mice with a drug (Drug X) and measure the expression of the gene YFG1 following treatment. For a sample of five mice you observe the following expression values:

YFG1 = {11.25, 10.5, 12, 11.75, 10}

What is the 95% confidence interval for the mean expression of YFG1?

alpha = 0.05
yfg1.1sample <- data_frame(YFG1 = c(11.25, 10.5, 12, 11.75, 10))

yfg1.1sample %>%
  summarize(mean.YFG1 = mean(YFG1),
            sd.YFG1 = sd(YFG1),
            se.mean.YFG1 = sd.YFG1/sqrt(n()),
            df = n() - 1,
            t = qt(1 - alpha/2, df),
            left.CI.95 = mean.YFG1 - t * se.mean.YFG1,
            right.CI.95 = mean.YFG1 + t * se.mean.YFG1)

# A tibble: 1 x 7
  mean.YFG1 sd.YFG1 se.mean.YFG1    df     t left.CI.95 right.CI.95
      <dbl>   <dbl>        <dbl> <dbl> <dbl>      <dbl>       <dbl>
1      11.1  0.8404       0.3758     4 2.776      10.06       12.14

Hypothesis test for the mean using the t-distribution

We consider three different situations for hypothesis tests regarding means.

One sample t-test

A one sample t-test is appropriate when you want to compare an observed sample mean to some a priori hypothesis about what the mean should be.

$H_0$: The mean of variable $X$ equals $\mu_0$ (some a priori value for the mean)
$H_A$: The mean of variable $X$ does not equal $\mu_0$

One sample t-test, test statistic

Calculate the test statistic:

\[ t^{\star} = \frac{\overline{x} - \mu_0}{SE_\overline{x}} \]

where $\overline{x}$ is the sample mean and $SE_\overline{x}$ is the sample standard error of the mean.

Compare the test statistic to the t-distribution to calculate the probability that you’d observe a mean value at least as extreme as $\overline{x}$ if the null hypothesis was true:

\[ P = \text{Pr}[t < -|t^\star|)] + \text{Pr}[t > |t^\star|] \]

Since the t-distribution is symmetric, this simplifies to:

\[ P = 2 \times \text{Pr}[t > |t^\star|] \]

Example: gene expression in mice, cont.

You have established from previous studies that the expression level of YFG1 in control mice is 10. You wish to determine whether the average expression of YFG1 in mice treated with Drug X differs from control mice.

$H_0$: the mean expression of YFG1 is 10 $ $H_A$: the mean expression of YFG1 does not equal 10

mu_0 = 10

yfg1.1sample %>%
  summarize(mean.YFG1 = mean(YFG1),
            sd.YFG1 = sd(YFG1),
            se.mean.YFG1 = sd.YFG1/sqrt(n()),
            df = n() - 1,
            t.star = (mean.YFG1 - mu_0)/se.mean.YFG1,
            P.value = 2 * pt(abs(t.star), df = df, lower.tail = FALSE))

# A tibble: 1 x 6
  mean.YFG1 sd.YFG1 se.mean.YFG1    df t.star P.value
      <dbl>   <dbl>        <dbl> <dbl>  <dbl>   <dbl>
1      11.1  0.8404       0.3758     4  2.927 0.04295

Using a type I error cutoff of $\alpha = 0.05$, we reject the null hypothesis that the mean expression of YFG1 in mice treated with Drug X is the same as the mean expression of YFG1 in control mice. Our data support the hypothesis that Drug X increases the expression of YFG1.

The `t.test` function in R

The built-in t.test() function will take care of the all the calcuations we did by hand above. Take a moment to read the t.test documentation.

For a one sample t-test t.test we need to specify the a priori mean value, mu, we are comparing to:

t.test(yfg1.1sample$YFG1, mu = 10)


    One Sample t-test

data:  yfg1.1sample$YFG1
t = 2.9, df = 4, p-value = 0.04
alternative hypothesis: true mean is not equal to 10
95 percent confidence interval:
 10.06 12.14
sample estimates:
mean of x 
     11.1

In-class Assignment #2

Global warming is predicted to lead to shifts in the geographic ranges of species. Chen et al. (2011) studied recent changes in the highest elevation at which species occur. Typically, higher elevations are coolor than lower elevations. The CSV file ABD-range-shifts.csv gives the changes in highest elevation (measured in meters) for a wide variety of taxa, as measure in the over the last 30 or so years. Positive numbers indicate upward shifts in elevation, and negative numbers indicate shifts to lower elevations.

Each of the following questions is worth 0.5 pts:

Draw a histogram showing the distribution of changes in elevation. Make sure to follow good practice (appropriate bins, axis labels, etc) in creating your figure.
What is the sample size for this data set?
What is the mean of the elevation shift data?
What is the standard deviation of the elevational shifts?
What is the standard error of the mean for these data?
How many degrees of freedome will be associated with a confidence interval and one-sample t-test for the mean elevational shift?
What is the value of $\alpha$ needed for a a 95% confidence interval?
What is the critical value of $t$ for this $alpha$ and number of degrees of freedom?
Calculate the 95% confidence interval for the mean using these data
For the one-sample t-test, write the appropriate null and alternative hypotheses
Carry out a one sample t-test using t.test()
What is your conclusions whether species have changed their highest elevation on average?

Two sample t-test

We use a two sample t-test to analyze the different between the means of the same variable measured in two different groups or treatments. It is assumed that the two groups are independent samples from two populations.

$H_0$: The mean of variable $X$ in group 1 is the same as the mean of $X$ in group 2, i.e. $\overline{X}_1 = \overline{X}_2$. This is equivalent to $overline{X}_1 - overline{X}_2 = 0$.
$H_A$: The mean of variable $X$ in group 1 is not the same as the mean of $X$ in group 2, i.e. $\overline{X}_1 \neq \overline{X}_2$. This is equivalent to $overline{X}_1 - overline{X}_2 \neq 0$.

Standard error for the difference in means between groups

In a two-sample t-test, we have to account for the uncertainty associated with the means of both groups, which we express in terms of the standard error of the difference in the means between the groups:

\[ SE_{\overline{X}_1 - \overline{X}_2} = \sqrt{s^2_p\left(\frac{1}{n_1} + \frac{1}{n_2} \right)} \]

where

\[ s^2_p = \frac{df_1 s_1^2 + df_2 s_2^2}{df_1 + df_2} \]

$s^2_p$ is called the “pooled sample variance” and is a weighted average of the sample variances, $s_1^2$ and $s_2^2$, of the two groups.

Two sample t-test, test statistic

Given the standard error for the difference in means between groups as defined above, we define our test statistic for a two sample t-test as:

\[ t^\star = \frac{(\overline{X}_1 - \overline{X}_2)}{SE_{\overline{X}_1 - \overline{X}_2}} \]

The degrees of freedom for this test statistic is:

\[ df = df_1 + df_2 = n_1 + n_2 - 2 \]

Example: gene expression in mice, comparing the effects of two drugs

You treat samples of mice with two drugs, X and Y. We want to know if the two drugs have the same average effect on expression of the gene YFG1. The measurements of YFG1 in samples treated with X and Y are as follows:

X = {11.25, 10.5, 12, 11.75, 10}
Y = {8.75, 10, 11, 9.75, 10.5}

For simplicity, we skip the “by-hand” calculations and simply use the built in t.test function:

yfg1.2sample <- data_frame(YFG1.X = c(11.25, 10.5, 12, 11.75, 10),
                           YFG1.Y = c(8.75, 10, 11, 9.75, 10.5))

t.test(yfg1.2sample$YFG1.X, yfg1.2sample$YFG1.Y)


    Welch Two Sample t-test

data:  yfg1.2sample$YFG1.X and yfg1.2sample$YFG1.Y
t = 2.1, df = 8, p-value = 0.07
alternative hypothesis: true difference in means is not equal to 0
95 percent confidence interval:
 -0.1311  2.3311
sample estimates:
mean of x mean of y 
     11.1      10.0

Using a type I error cutoff of $\alpha = 0.05$, we fail to reject the null hypothesis that the mean expression of YFG1 is different in mice treated with Drug X versus those treated with Drug Y.

In-class Assignment #3

Notice that the mean of YFG1 in the Drug Y mice is 10.0. This is exactly the same value for the mean of YFG1 we used as our a priori hypothesis in the one-sample test. We rejected the null hypothesis in the one sample test, but we failed to reject the null hypothesis in the two-sample case. What explains the different outcomes even though the values of the means we’re comparing are the same? [2 pts]

In-class Assignment #4

It has been hypothesized that many North Americans dreamt in black and white in the era of black-and-white television and shifted to color after the introduction of color media (movies and TV). To test this hypothesis Murzyn (2008) queried 30 older individuals who grew up in the black-and-whte era and 30 younger individuals who grew up with color media. She recorded the percentage of color dreams for each individuals. In younger individuals, a mean of 68.4% their dreams were in color, with a standard deviation of 31.8%. For the older individuals, the mean was 33.9% with a standard deviation of 36.9%. The scores were approximately normally distributed in each group. Is the difference between the two means statistically significant?

Each of the following questions is worth 0.5 pts:

State the null and alternate hypotheses for this test.
What are the sample variances for each group?
How many degrees of freedom are associated with each group?
Calculate the pooled variance for these data.
What is the standard error of the difference between means?
Calculate the test statistic, t, for these data
What is the critical value of $t$ with $\alpha = 0.05$ for this test?
Calculate a p-value for this comparison.
What can you conclude about the difference between the two groups in how often they dream in color? Interpret the conlusions of the test in terms of the original question.

Paired t-test

In a paired t-test there are two groups/treatments, but the samples in the two groups are paired or matched. This typically arises in “before-after” studies where the same individual/object is measured at different time points, before and after application of a treatment. The repeated measurement of the same individual/object means that we can’t treat the two sets of observations as independent. Null and alternative hypotheses are thus typically framed in terms of a mean diffrenes between time points/conditions

$H_0$: The mean difference of variable $X$ in the paired measurements is zero, i.e. $\overline{D} = 0$ where $D = X_\text{after} - X_\text{before}$
$H_A$: The mean difference of variable $X$ in the paired measurements is not zero, i.e. $\overline{D} \neq 0$

Paired t-test, test statistic

Let the variable of interest for indiviual $i$ in the paired conditions be designated $x_{i,\text{before}}$ and $x_{i, \text{after}}$
Let ( D_i = x_{i, } - x_{i, } ) be the paired difference for individual $i$
Let $\overline{D}$ be the mean difference and $s_D$ be the standard deviation of the differences
The standard error of mean difference is $SE(\overline{D}) = \frac{s_D}{\sqrt{n}}$

The test statistic is thus: \[ t^\star = \frac{\overline{D}}{SE(\overline{D})} \]

Under the null hypothesis, this statistic follows a t-distribution with $n-1$ degrees of freedom.

Paired t-test, example

You measure the expression of the gene YFG1 in five mice. You then treat those five mice with drug Z and measure gene expression again.

YFG1 expression before treatment = {12, 11.75, 11.25, 10.5, 10}
YFG1 expression after treatment = {11, 10, 10.50, 8.75, 9.75}

yfg1.paired <- data_frame(YFG1.before = c(12, 11.75, 11.25, 10.5, 10),
                          YFG1.after = c(11, 10, 10.50, 8.75, 9.75))

t.test(yfg1.paired$YFG1.before, yfg1.paired$YFG1.after,
       paired = TRUE)


    Paired t-test

data:  yfg1.paired$YFG1.before and yfg1.paired$YFG1.after
t = 3.8, df = 4, p-value = 0.02
alternative hypothesis: true difference in means is not equal to 0
95 percent confidence interval:
 0.2905 1.9095
sample estimates:
mean of the differences 
                    1.1

Using a type I error cutoff of $\alpha = 0.05$, we reject the null hypothesis of no difference in the average expression of YFG1 before and after treatment with Drug Z.

The t-distribution and t-tests

Paul M. Magwene

Introduction

Libraries

The sampling distribution of the mean revisited

Estimating the sampling error from the sample standard deviation

What happens when our sample size is small?

Simulating sampling distributions

For small samples, sample standard deviations systematically underestimate the population standard deviation

Underestimates of the standard deviation given small \(n\) lead to understimates of the SE of the mean

The t-distribution is the appropriate distribution for describing the sampling distribution of the mean when \(n\) is small

Confidence intervals for the mean based on the t-distribution

Critical values of t for confidence intervals of the mean

In-class Assignment #1

Example: gene expression in mice following administration of a drug

Hypothesis test for the mean using the t-distribution

One sample t-test

One sample t-test, test statistic

Example: gene expression in mice, cont.

The `t.test` function in R

In-class Assignment #2

Two sample t-test

Standard error for the difference in means between groups

Two sample t-test, test statistic

Example: gene expression in mice, comparing the effects of two drugs

In-class Assignment #3

In-class Assignment #4

Paired t-test

Paired t-test, test statistic

Paired t-test, example

The t-distribution and t-tests

Paul M. Magwene

Introduction

Libraries

The sampling distribution of the mean revisited

Estimating the sampling error from the sample standard deviation

What happens when our sample size is small?

Simulating sampling distributions

For small samples, sample standard deviations systematically underestimate the population standard deviation

Underestimates of the standard deviation given small \(n\) lead to understimates of the SE of the mean

The t-distribution is the appropriate distribution for describing the sampling distribution of the mean when \(n\) is small

Confidence intervals for the mean based on the t-distribution

Critical values of t for confidence intervals of the mean

In-class Assignment #1

Example: gene expression in mice following administration of a drug

Hypothesis test for the mean using the t-distribution

One sample t-test

One sample t-test, test statistic

Example: gene expression in mice, cont.

The t.test function in R

In-class Assignment #2

Two sample t-test

Standard error for the difference in means between groups

Two sample t-test, test statistic

Example: gene expression in mice, comparing the effects of two drugs

In-class Assignment #3

In-class Assignment #4

Paired t-test

Paired t-test, test statistic

Paired t-test, example

The `t.test` function in R