MA121 Study Guide

Unit 5: Hypothesis Test

5a. Differentiate between type I and type II errors

What is an error in the context of hypothesis testing? What is the difference between a Type I and Type II error? How do they relate to each other? Which is more serious?
How are the Type I and Type II errors calculated or determined?

No hypothesis test is perfect, nor can it produce results that are absolutely 100 percent reliable. This is because the samples are not completely like the population.

A Type I Error results when the null hypothesis is incorrectly rejected. For our jury example, this means the jury just convicted an innocent defendant.
A Type II Error is the opposite: failing to detect a difference from the null and incorrectly failing to reject the null hypothesis. For our jury example, the jury failed to convict a guilty defendant.

Remember: the term error does not necessarily mean the researcher made a mistake in their calculation. An error in statistics occurs when we do not have access to the population, and we may, by random chance, get a sample that does not properly represent the population.

For example, a drug study may show that a drug is ineffective simply because a large percentage of the sample the researcher used had a genetic tendency that made the drug less effective. The drug should have shown that it was more effective. A mistake did not cause the error. The sample was unusual. By random chance, the researcher simply chose a group that was less helped by the drug.

Type I and II Errors are related in that, all other things being equal, they are inversely related. The researchers chose the Type I Error (𝛂). They calculate the Type II Error (𝜷) based on possible alternate values for the mean or whatever else they are estimating. When you decrease one, if all else is equal, you inevitably increase the other.

Calculating a Type II Error is beyond the scope of this course. However, this shows why researchers do not simply choose a tiny number for Type I Error: this would increase the Type II Error. Consequently, detecting a true difference from the null will be harder. This creates a more serious situation, although the severity depends on the situation.

For example, in our jury trial scenario, we want to avoid sending an innocent person to prison. Since this is represented by Type I Error, we might consider Type I Error to be more serious and thus lower 𝛂, taking the chance that it will raise 𝜷.

If, in a drug study, the null hypothesis is that a drug is safe for consumption, a Type II error would fail to find that the drug is dangerous, so that would be more serious. In this case, you might choose a more conservative value for alpha, even though it increases the probability that a safe drug will be rejected.

Review

To review, see:

Setting up Hypotheses

5b. Calculate p-values to be used in hypothesis testing

What is the p-value in hypothesis testing, and what does it represent?
What does the p-value tell you about accepting or rejecting the null hypothesis?

The p-value of a hypothesis test provides the key to getting the conclusion of the test. The p-value refers to the probability of obtaining a sample value equal to, or more extreme (see note below) than, the one we got if we assume the null hypothesis is true.

A very low p-value (such as 0.005) means that "if the defendant really is innocent, the probability that we could have obtained the blood and DNA evidence we did is extremely small". This is why a smaller p-value will cause us to reject the null hypothesis.

A very large p-value (usually greater than 0.10) means that, for example, "we assume by default the drug is ineffective; there is a 10 percent chance we could have gotten the results we did even if the drug is ineffective". This is less impressive and might lead us to fail to reject the null hypothesis since we have not found enough evidence to prove the drug is effective.

The proper cutoff (where below would reject the null, and above would fail to reject) is subjective. The standard is 0.05, but can be as low as 0.01 for an aggressive test or as high as 0.10 for a more conservative test.

Note that the definition of extreme depends on whether we are conducting a right-tail test (the probability of a result higher than the result we got), a left-tail test (a lower result), or a two-tail test (higher or lower).

Review

To review, see:

Sample Tests for a Population Mean

5c. Conduct hypothesis tests for a single population mean and population proportion

What is hypothesis testing? How is it related to confidence intervals?
What are a null and an alternative hypothesis?
How do you know when to use the z- or t-distributions in a hypothesis test for the mean? What about for a proportion?
Under what circumstances could neither z nor t be used?

Hypothesis testing is a form of inferential statistics similar to confidence intervals. We have a null hypothesis (H₀), which we assume to be true by default, and an alternative hypothesis (H₁ or H_a), which we can prove or fail to prove based on sample data.

We have three types of alternative hypotheses:

A right-tailed test has an alternative hypothesis that is greater than the null.
A left-tailed test has an alternative hypothesis that is less than the null.
A two-tailed test tests for both higher and lower values than the null. While this may seem more convenient, the downside of a two-tailed test is that it reduces the power of the test, making it less likely to detect a change from the null hypothesis.

The conclusion of a hypothesis test is to reject or fail to reject the null hypothesis. In other words, we assume the null is true, then we either find evidence (based on finding the p-value) that it is false or fail to find evidence that the null is false.

You can think about this situation as a jury trial. The null hypothesis is innocence, and the prosecutor tries to get a guilty verdict by providing evidence that makes the null hypothesis unlikely if all the evidence is true.

You would use either z- or t-distributions based on the same criteria you would use to generate a confidence interval. If you know the population standard deviation and have a reasonably large sample size, you use the z-distribution form of the given equations. If the population standard deviation is unknown or the sample size is small, you would use t.

Remember that the Central Limit Theorem still applies. In other words, if the sample size is small, you must have a normally distributed population, or you cannot use either z or t.

The steps for performing a p-value test are:

Decide or know what value of Type I Error you will use (𝛂).
Use the appropriate formula to calculate the test statistic (or test value), which is a calculated value that measures how far the sample data deviates from what would be expected under the null hypothesis. The correct formula is determined by the parameter you are testing (such as mean or proportion) and which distribution you are using (z- or t-test for the mean) within each.
Use technology or a distribution table to look up the probability of getting a value more than the test value (right-tailed), less than the test value (left-tailed), or a combination of higher and lower (two-tailed). If you are running a two-tailed test, for example, and you get a test value of 1.85, you want to find p(z > 1.85) + p(z < −1.85). This probability is your p-value.
Compare the p-value to the alpha. If it is lower, reject the null hypothesis; if it is higher, we will fail to reject it.

Remember, we never accept the alternate hypothesis. Just like in a court case, we either find the person guilty (reject the null) or not guilty (fail to reject), but the court never calls the person innocent.

Review

To review, see:

Unit 5 Vocabulary

This vocabulary list includes terms you will need to know to successfully complete the final exam.

alternative hypothesis
left-tailed test
null hypothesis
p-value
right-tailed test
test statistic
two-tailed test
Type I Error
Type II Error