Confidence Interval. MarketCheese

A confidence interval (sometimes abbreviated to CI) is a term describing a variety of data values that a parameter might fall into that shows the volume of certainty for this specific case in a statistical study. A confidence interval is closely connected with the confidence level, which estimates the number of times a researcher will get the same results falling into the confidence interval from the same experiment or survey. A confidence level is presented in a percentage and might take different meanings, but there are common values of 90%, 95% and 99%, with 95% being the most popular one.

Confidence Interval significance for statistics

When preparing a statistical study, it’s almost impossible to cover an entire population, so statisticians often use sampling when it’s necessary to work with large populations. The key principle of sampling is the following. A fixed number of examinations is performed over randomly chosen groups of population (samples). But then a question of certainty appears, as the results of the research might not give the complete and truthful information about the whole population. To measure the level of certainty, a confidence level and confidence intervals are used. Confidence intervals are determined for the samples to find out the level of uncertainty, as the results are typically different for each set, and some of them include the true population variable value.

Confidence intervals are represented by the range of values which includes the estimated mean of the statistic and the values plus and minus possible variations, and possible results of the survey must fall again between this range in case the survey is repeated under the same conditions. Confidence level, at the same time, is a percentage that represents how many times that range would contain the true value of the parameter, which is important to remember, because a 95% confidence level doesn’t mean that 95% of all data would definitely correspond with the study’s results, but that 95% of intervals would represent actual value.

Confidence Interval use example

Let’s imagine that it’s necessary to find out an average salary, but only ten percent of random employees are available to provide the information. So, it’s possible to calculate that a point estimate for this sample is $500, but this result is not particularly informative as it’s unknown if the rest of the employees have similar salaries or how far this sample results are from possible results of all the employees. If the volume of uncertainty is known, it’s easier to understand the real meanings, and finding out confidence intervals turn out to be more useful than just finding out an average in the given group.

If a researcher determines a confidence interval around the sample’s mean with a 95% confidence level (to do this, standard deviation should be used, and a distribution should be normal or transformed into a normal one with the bell curve depiction), an upper and the lower levels of the range containing true values in 95% of occasions would be received. Let’s assume that these levels are $250 and $750, so if a researcher repeats the survey with 100 different samples of employees, in 95 cases the results will fall between these levels.

It’s also possible to set a higher level of certainty of 99%. Then a range of possible results and a confidence interval accordingly would be broader as the variety of answers expands, but it helps to enhance certainty of the statistics. In this case, in 99 of 100 samples studied, the results would fit in the estimated range.

Determining Confidence Interval

There are several parameters necessary to determine a confidence interval:

the size of the studied sample;
the standard deviation of the sample (might be calculated by most statistical software or manually, taking the square root of the sample variance);
critical values (on a basis of a chosen alpha value);
the point estimate (that might be a mean, a proportion, etc.).

As there are various statistic estimates with different set of conditions and peculiarities, ways of calculating a confidence interval might vary, though the main principles and source data remain the same.

Confidence intervals are usually calculated using specific techniques, like a t-test, for which it’s also necessary to find out three main data values. They are assessed to find out if there is a notable and important difference between samples which might be connected to a specific factor.

When presenting confidence intervals, it’s important to include the determined upper and lower levels of the range, and it frequently goes with the corresponding confidence level, though sometimes only a standard deviation of the sample is reported. Confidence intervals are often depicted in statistic graphs, as they are easy to present that way and convey the needed data clearly.