MA121: Introduction to Statistics
Section outline
-
If you invest in financial markets, you may want to predict the price of a stock in six months from now based on company performance measures and other economic factors. As a college student, you may be interested in knowing the dependence of the mean starting salary of a college graduate, based on your GPA. These are just some examples that highlight how statistics are used in our modern society. To figure out the desired information for each example, you need data to analyze.
The purpose of this course is to introduce you to the subject of statistics as a science of data. Data abounds in this information age; extracting useful knowledge and gaining a sound understanding of complex data sets has been more of a challenge. In this course, we will focus on the fundamentals of statistics, broadly described as the techniques to collect, clarify, summarize, organize, analyze, and interpret numerical information.
This course will begin with a brief overview of the discipline of statistics and will then quickly focus on descriptive statistics, introducing graphical methods of describing data. You will learn about combinatorial probability and random distributions, which are the foundation for statistical inference. With inference, we will focus on estimation and hypothesis testing issues. We will also examine the techniques to study the relationship between two or more variables, known as regression.
By the end of this course, you should understand what statistics represent, how to use statistics to organize and display data, and how to draw valid inferences based on data by using appropriate statistical tools.
-
Today, we have access to large volumes of data. The first step of data analysis is to accurately summarize all of this data, both graphically and numerically, so that we can understand what the data reveals. To be able to use and interpret the data correctly is essential to making informed decisions. For instance, when you see a survey of opinion about a certain TV program, you may be interested in the proportion of those people who indeed like the program. In this unit, you will learn about descriptive statistics used to summarize and display data. After completing this unit, you will know how to present your findings once you have collected data. For example, suppose you want to buy a new mobile phone with a particular type of camera. Suppose you are unsure about the prices of any of the phones with this feature, so you access a website with a sample data set of prices, given your desired features. Looking at all of the prices in a sample can sometimes be confusing. A better way to compare this data might be to look at the median price and the variation of prices. The median and variation are two ways out of several ways that you can describe data.
You can also graph the data so that it is easier to see the price distribution. In this unit, you will study precisely this; you will learn numerical and graphical ways to describe and display your data. You will understand the essentials of calculating common descriptive statistics for measuring center, variability, and skewness in data. You will learn to calculate and interpret these measurements and graphs. Descriptive statistics are, as their name suggests, descriptive. They do not generalize beyond the data considered. Descriptive statistics illustrate what the data shows. Numerical descriptive measures computed from data are called statistics. Numerical descriptive measures of the population are called parameters. Inferential statistics can generalize the findings from sample data to a broader population.
Completing this unit should take you approximately 3 hours.
-
Probabilities affect our everyday lives. In this unit, you will learn about probability and its properties, how probability behaves, and how to calculate and use it. You will study the fundamentals of probability and work through examples covering different types of probability questions. These basic probability concepts will provide a foundation for understanding more statistical concepts, for example, interpreting polling results. Though you may have already encountered concepts of probability, after this unit, you will be able to formally and precisely predict the likelihood of an event occurring given certain constraints. Probability theory is a discipline that was created to deal with chance phenomena. For instance, before getting a surgery, a patient wants to know the chances that the surgery might fail; before taking medication, you want to know the chances of side effects; before leaving your house, you want to know the chances that it will rain today.
Probability is a measure of likelihood that takes on values between 0 and 1, inclusive, with 0 representing impossible events and 1 representing certainty. The chances of events occurring fall between these two values. The skill of calculating probability allows us to make better decisions. Whether you are evaluating how likely it is to get more than 50% of the questions correct on a quiz if you guess randomly; predicting the chance that the next storm will arrive by the end of the week; or exploring the relationship between the number of hours students spend at the gym and their performance on an exam, an understanding of the fundamentals of probability is crucial.
We will also talk about random variables. A random variable describes the outcomes of a random experiment. A statistical distribution describes the number of times each possible outcome occurs in a sample. The values of a random variable can vary with each repetition of an experiment. Intuitively, a random variable, summarizing a certain chance phenomenon, takes on values with certain probabilities. A random variable can be classified as discrete or continuous, depending on the values it assumes. Suppose you count the number of people who go to a coffee shop between 4 p.m. and 5 p.m. and the amount of waiting time they spend in that hour. In this case, the number of people is an example of a discrete random variable, and the amount of waiting time they spend is an example of a continuous random variable.Completing this unit should take you approximately 3 hours.
-
The concept of sampling distribution lies at the very foundation of statistical inference. It is best to introduce sampling distribution using an example here. Suppose you want to estimate a population parameter, say the population mean. There are two natural estimators: 1. sample mean, which is the average value of the data set; and 2. median, which is the middle number when the measurements are arranged in ascending (or descending) order. In particular, for a sample of even size n, the median is the mean of the middle two numbers. But which one is better, and in what sense? This involves repeated sampling, and you want to choose the estimator that would do better on average.
Different samples may give different sample means and medians; some may be closer to the truth than the others. Consequently, we cannot compare these two sample statistics or, in general, any of them based on their performance with a single sample. Instead, you should recognize that sample statistics are random variables; therefore, they should have frequency distributions by considering all possible samples. In this unit, you will study the sampling distribution of several sample statistics. This unit will show you how the central limit theorem can help to approximate sampling distributions in general.
Completing this unit should take you approximately 2 hours.
-
In this unit, you will learn how to use the central limit theorem and confidence intervals, which enable you to estimate unknown population parameters. The central limit theorem allows us to make inferences from samples of non-normal populations. This theorem states that given any population, as the sample size increases, the sampling distribution of the means approaches a normal distribution. This powerful theorem allows us to assume that, given a large enough sample, the sampling distribution will be normally distributed. You will also learn about confidence intervals, which provide you with a way to estimate a population parameter. Instead of giving just a one-number estimate of a variable, a confidence interval gives a range of likely values for it. This is useful because point estimates vary from sample to sample, so an interval with a certain confidence level is better than a single point estimate. After completing this unit, you will know how to construct such confidence intervals and the level of confidence.
Completing this unit should take you approximately 1 hour.
-
A hypothesis test involves collecting and evaluating data from a sample. The data gathered and evaluated is then used to decide whether or not the data supports the claim about the population. This unit will teach you how to conduct hypothesis tests and how to identify and differentiate between the errors associated with them. Many times, you need answers to questions to make efficient decisions. For example, a restaurant owner might claim that his restaurant's food costs 30% less than other restaurants in the area, or a phone company might claim that its phones last at least one year longer than phones from other companies. To decide whether it would be more affordable to eat at the restaurant that "costs 30% less" or another restaurant in the area, or to determine which phone company to choose based on its durability, you must collect data to justify these claims.
The process of hypothesis testing is a way of decision-making. In this unit, you will learn to establish assumptions through null and alternative hypotheses. The null hypothesis is the hypothesis that is assumed to be true and the hypothesis you hope to nullify. In contrast, the alternative hypothesis is the research hypothesis you claim to be true. This means that you need to conduct the correct tests to be able to accept or reject the null hypothesis. You will learn how to compare sample characteristics to see whether there is enough data to accept or reject the null hypothesis.
Completing this unit should take you approximately 3 hours.
-
In this unit, we will discuss situations where the mean of a population, treated as a variable, depends on the value of another variable. One of the main reasons we conduct such analyses is to understand how two variables are related. The most common type of relationship is a linear relationship. For example, you may want to know what happens to one variable when you increase or decrease the other variable. You want to answer questions such as, "Does one variable increase as the other increases, or does the variable decrease?" For example, you may want to determine how the mean reaction time of rats depends on the amount of drug in the bloodstream.
You will also learn to measure the degree of a relationship between two or more variables. Both correlation and regression are measures for comparing variables. Correlation quantifies the strength of a relationship between two variables and is a measure of existing data. On the other hand, regression is the study of the strength of a linear relationship between an independent and dependent variable. It can be used to predict the value of the dependent variable when the value of the independent variable is known.
Completing this unit should take you approximately 3 hours.
-
This optional subunit will teach you about "Analysis of Variance" (abbreviated ANOVA), which is used for hypothesis tests involving more than two averages. ANOVA is about examining the amount of variability in the y variable and trying to see where that variability is coming from. You will study the simplest form of ANOVA, called single factor or one-way ANOVA. Finally, you will briefly study the F distribution, used for ANOVA, and the test of two variances.
-
This study guide will help you get ready for the final exam. It discusses the key topics in each unit, walks through the learning outcomes, and lists important vocabulary terms. It is not meant to replace the course materials!
-
Take this exam if you want to earn a free Course Completion Certificate.
To receive a free Course Completion Certificate, you will need to earn a grade of 70% or higher on this final exam. Your grade for the exam will be calculated as soon as you complete it. If you do not pass the exam on your first try, you can take it again as many times as you want, with a 7-day waiting period between each attempt. Once you pass this final exam, you will be awarded a free Course Completion Certificate.
-
Please take a few minutes to give us feedback about this course. We appreciate your feedback, whether you completed the whole course or even just a few resources. Your feedback will help us make our courses better, and we use your feedback each time we make updates to our courses. If you come across any urgent problems, email contact@saylor.org.