Descriptive Statistics
Descriptive statistics are summaries of a specific set of data, such as a population or sample of a population, represented through brief information coefficients. The measures of central tendency (mean, median and mode) and measures of variability (dispersion - standard deviation, variance, minimum and maximum values, kurtosis and skewness) are the two components of descriptive statistics.
Understanding Descriptive Statistics
Descriptive statistics put all the characteristics of a set of concrete data together and describe it. The center measures (mean, median, and mode) are the most widely used (in mathematics and statistics). By summing up all the indicators in a data set and dividing the result by the number of indicators in the set, the average value (the mean) is obtained. For example, the sum of the five data (2, 3, 4, 5, 6) is 20. The average value is 4 (20/5). The most frequently present is the mode of the data set. The indicator located in the middle of the data set is the median. It separates higher numbers from lower numbers in the data set. In addition to these indicators, there are others that are also important in descriptive statistics.
A large set of quantitative data is difficult to understand without a brief description. For example, descriptive statistics can be used to calculate a student's grade point average (GPA) from data of a number of exams, classes, and grades, which will show his or her overall performance.
Descriptive statistics often use plots, graphs, histograms and/or stem and leaf displays. It is especially used in medicine.
Types of Descriptive Statistics
There are 2 components of descriptive statistics. They are measures of central tendency and measures of variability (measures of dispersion).
Central Tendency. Measures of central tendency reflect middle or average values of data sets. Dispersion measures reflect the variability of the data. Both of them can be shown in graphs and tables or discussed to reveal the basic meaning of the data being analyzed in brief.
Measures of central tendency provide a description of the data set and its central distribution position. The mean, median, or mode describes the frequency of each data point in the distribution. This identifies the most widespread trends in the data set.
Variability (spread). Measures of variability (measures of spread) help to analyze the dispersion in the distribution of a dataset. Measures of central tendency show the average of the data set, and measures of variability describe the distribution of the data within the set. Thus, the average data point may be 55 out of 100, but there may be data points of 1 and 100 as well. Measures of variability show the range of the set and the shape. Range, quartiles, absolute deviation, and variance are examples of measures of variability.
There is a data set 5, 12, 27, 68, 97, 100. The range of the given data set equals the largest number (100) minus the smallest number (5), which is 95.
Distribution (frequency distribution). This is the number of occurrences of matching or mismatching data points. For example: blue, blue, red, red, red, other. This data can be distributed as follows:
- The number of blue colors in the data set is 2.
- The number of red colors in the data set is 3.
- The number of other colors in the data set is 1.
- The number of not blue colors in the data set is 4.
Univariate vs. Bivariate
Univariate data from descriptive statistics analyzes the characteristics of a single trait. They do not show an analysis of any relationships or cause-effect relationships.
For example, it is necessary to collect data on the average age of students in a dormitory. These are univariate age data. In this case, the average age can be determined by taking information about each student and dividing it by their total number.
Bivariate data from descriptive statistics helps to find the relationship between 2 variables through correlation. So, 2 sets of data are collected and their relationship is analyzed. When more than two variables are present, this approach can be a multivariate one.
Let's take the example given earlier. Each student in the dormitory, for example, takes a knowledge assessment test. Thus, we need to know each student's age and test results to get two types of data. Then, data analytics help us to get a mathematical or graphical representation of the presence or absence of a relationship between age and test results of students.
Descriptive Statistics vs. Inferential Statistics
Descriptive statistics is the collection, preparation, and presentation of data. Inferential statistics is the analysis of obtained data to make predictions for the future. Data for making decisions or applying characteristics from one set can be applied in another.
For example, a company is in the business of making sauces. It collects descriptive information about the number of sales, the average number of items purchased per deal, and the average number of sales per day of the week. This data is what happened in the past, i.e., descriptive statistics. If a new sauce is coming to market, the company will also collect sales data, but will use that information to predict and model the sales of the new sauce (using data from one set to apply in another). And this is inferential statistics.