Module Summary Page 1/3
The table below displays a selection of variables from a study dataset.
ID |
Age |
Gender |
Height |
Blood group |
LDL† |
Feeling happy? |
Number of children |
Smoke? |
Social class |
1 |
25 |
F |
1.62 |
B |
150 |
Agree |
0 |
No |
I |
2 |
35 |
F |
1.58 |
O |
123 |
Strongly agree |
1 |
Yes |
II |
3 |
44 |
M |
1.35 |
A |
178 |
Disagree |
3 |
Yes |
I |
4 |
28 |
F |
1.54 |
AB |
205 |
Disagree |
0 |
No |
III |
5 |
35 |
M |
1.35 |
O |
229 |
Indifferent |
2 |
Yes |
I |
6 |
42 |
M |
1.21 |
B |
215 |
Agree |
2 |
Yes |
IV |
7 |
36 |
F |
1.76 |
A |
130 |
Strongly disagree |
1 |
No |
IV |
8 |
38 |
M |
1.57 |
A |
175 |
Disagree |
1 |
Yes |
V |
9 |
30 |
M |
1.47 |
AB |
240 |
Indifferent |
0 |
No |
III |
10 |
40 |
F |
1.18 |
B |
167 |
Strongly agree |
6 |
No |
I |
: |
: |
: |
: |
: |
: |
: |
: |
: |
: |
The table below shows three variables that have been transformed or recoded. Age was log-transformed, LDL was put into different groups according to clinical guidelines and Number of children was recoded.
ID |
Age |
Ln Age |
LDL |
LDL group |
Number of children |
Number of children group |
1 |
25 |
3.22 |
150 |
Borderline high LDL level |
0 |
0-1 |
2 |
35 |
3.56 |
123 |
Near optimal LDL level |
1 |
0-1 |
3 |
44 |
3.78 |
178 |
High LDL level |
3 |
2+ |
4 |
28 |
3.33 |
205 |
Very high LDL level |
0 |
0-1 |
5 |
35 |
3.56 |
229 |
Very high LDL level |
2 |
2+ |
6 |
42 |
3.74 |
215 |
Very high LDL level |
2 |
2+ |
7 |
36 |
3.58 |
130 |
Borderline high LDL level |
1 |
0-1 |
8 |
38 |
3.64 |
175 |
High LDL level |
1 |
0-1 |
9 |
30 |
3.40 |
240 |
Very high LDL level |
0 |
0-1 |
10 |
40 |
3.69 |
167 |
High LDL level |
6 |
2+ |
: |
: |
: |
: |
: |
: |
: |
The data type of these three newly transformed variables may be different from the original variable.
The table below shows the first 10 cases of a data set.
Variables |
|||
ID |
V1 |
V2 |
V3 |
1 |
Red |
25 |
1.62 |
2 |
Blue |
35 |
1.58 |
3 |
Yellow |
44 |
1.35 |
4 |
Green |
28 |
1.54 |
5 |
Black |
35 |
1.35 |
6 |
Brown |
42 |
1.21 |
7 |
Blue |
36 |
1.76 |
8 |
Pink |
38 |
1.57 |
9 |
Green |
30 |
1.47 |
10 |
Purple |
40 |
1.18 |
: |
: |
: |
: |
The table below shows the first 10 cases of a data set.
Variables |
|||
ID |
V1 |
V2 |
V3 |
1 |
Red |
25 |
1.62 |
2 |
Blue |
35 |
1.58 |
3 |
Yellow |
44 |
1.35 |
4 |
Green |
28 |
1.54 |
5 |
Black |
35 |
1.35 |
6 |
Brown |
42 |
1.21 |
7 |
Blue |
36 |
1.76 |
8 |
Pink |
38 |
1.57 |
9 |
Green |
30 |
1.47 |
10 |
Purple |
40 |
1.18 |
: |
: |
: |
: |
Study the following data which displays a selections of variables from a study dataset.
ID |
Age |
Gender |
Height |
Blood group |
LDL† |
Feeling happy? |
Number of children |
Smoke? |
Social class |
1 |
25 |
F |
1.62 |
B |
150 |
Agree |
0 |
No |
I |
2 |
35 |
F |
1.58 |
O |
123 |
Strongly agree |
1 |
Yes |
II |
3 |
44 |
M |
1.35 |
A |
178 |
Disagree |
3 |
Yes |
I |
4 |
28 |
F |
1.54 |
AB |
205 |
Disagree |
0 |
No |
III |
5 |
35 |
M |
1.35 |
O |
229 |
Indifferent |
2 |
Yes |
I |
6 |
42 |
M |
1.21 |
B |
215 |
Agree |
2 |
Yes |
IV |
7 |
36 |
F |
1.76 |
A |
130 |
Strongly disagree |
1 |
No |
IV |
8 |
38 |
M |
1.57 |
A |
175 |
Disagree |
1 |
Yes |
V |
9 |
30 |
M |
1.47 |
AB |
240 |
Indifferent |
0 |
No |
III |
10 |
40 |
F |
1.18 |
B |
167 |
Strongly agree |
6 |
No |
I |
: |
: |
: |
: |
: |
: |
: |
: |
: |
: |
Welcome to the "Summary Statistics Module" quiz. There are 20 questions to answer.
Please remember to click the Submit button for each separate question, and read the feedback comments!
Click the Next button to begin the quiz.
Q1. Which of the above variable(s) are classified as quantitative variable(s)?
The correct answers are b), d), f) and h). These variables take numerical values only and the values reflect the actual measurement (with units) of the subjects or objects we are measuring.
Q2. Which of the above variable(s) are classified as qualitative variable(s)?
The correct answers are c), e), g), i) and j). These variables are represented by categories and each category represents a particular characteristic of interest within a group of subjects or objects.
Q3. Which of the above variable(s) are classified as continuous variable(s)?
These variables can take any value within a range, including decimal parts. The precision of the measurement will depend on the measuring device used.
Q4. Which of the above variable(s) are classified as discrete variable(s)?
The correct answers are a) and h). These variables take integer values. ID is the subject or case number and Number of Children are counts. Note that there is a different definition for discrete variables in the Statistics for the Terrified package. It consists of nominal, ordinal and count variables. The count variable in the package is equivalent to our definition of a discrete variable. Adopting the definition used by the Statistics for the Terrified would include Gender, Blood group, Feeling happy, Number of children, Smoke, Social class as discrete variables.
Q5. Which of the above variable(s) are classified as ordinal variable(s)?
The correct answers are g) and j). These variables consist of categories that are mutually exclusive and have a ranked order. Thus, for example, the category “strongly agree” may precede “agree”. Note that the “interval” between categories may not be numerically equal.
Q6. Which of the above variable(s) are classified as nominal variable(s)?
The correct answers are c), e) and i). These variables consist of categories that are mutually exclusive but have no ranked order, e.g. Male / Female.
Q7. Which of the above variable(s) are classified as binary variable(s)?
The correct answers are c), and i). These variables consist of categories that are mutually exclusive but have no ranked order, e.g. Male / Female.
Q8. Which of the following key word(s) best describe the type of variable Ln Age?
The correct answers are a), c) and e). After log transforming the numeric, quantitative and continuous variable Age, the characteristics of the variable Ln Age are still numeric, quantitative and continuous since log transformation is in fact rescaling the original scale by a specific factor. The distributions of Age and Ln Age may be different.
Q9. Which of the following key word(s) best describe the type of variable Number of children group?
The correct answers are b), d), h) and i). After categorising the numeric, quantitative and discrete variable Number of children, the characteristics of the variable Number of children group has become categorical, qualitative, nominal and binary since a category represents a range of values. With only 2 categories, the transformed variable is binary. The ranked order is not important here, hence it is not ordinal, but can be categorised as nominal.
Q10. Which of the following key word(s) best describe the type of variable LDL group?
The correct answers are b), d) and g). After categorising the numeric, quantitative and continuous variable LDL according clinical guidelines, the characteristics of the variable LDL group has become categorical, qualitative and ordinal since a category represents a range of values and the categories have a ranked order.
Q11. Which of the following are measures of central tendency?
The correct answers are a), f) and h). A measure of central tendency is an ‘expected’ or ‘average’ value of a distribution that is used to help to summarise a variable. A measure of central tendency is often presented alongside an appropriate measure of dispersion. The mean is quoted for normally distributed data and the median where the data is not normally distributed (non parametric).The mode is rarely reported.
Q12. The height (cm) of 6 children were measured as 141, 155, 130, 146, 141, 134.
What is the mean height (cm) of these children?
The correct answer is c). Add up all of the numbers and then divide by the number of values. (141 + 155 + 130 + 146 + 141 + 134)/ 6 = 847 / 6 = 141.16666. The mean is therefore 141.17cm to 2 decimal places (2dp).
Q13. The height (cm) of 6 children were measured as 141, 155, 130, 146, 141, 134.
What is the median height (cm) of these children?
The correct answer is b). The Median is the middle value once the data have been sorted from smallest to largest. The sorted data would look like this: 130, 134, 141, 141, 146, 155. When we have an odd number of values the median is simply the middle value. Here we have an even number of values, so the median is the mean of the middle two values. (141+141)/2 = 242/6 = 141. The median is therefore 141cm.
Q14. The height (cm) of 6 children were measured as 141, 155, 130, 146, 141, 134.
What is the mode height (cm) of these children?
The correct answer is b). The Mode is he most frequently occurring value. Here we have two children who are 141 cm tall, whereas all of the other heights only occur once. The mode is therefore 141cm.
Q15. The bar chart illustrates the distribution of variable V1 from this study. Using the data in the table and graph, choose the most suitable measure of location to report the central tendency of variable “V1”.

The correct answer is c). The bar chart is used to display the frequencies of items for a categorical variable, in this example colour. The most suitable measure of location for this type of variable is the mode which indicates the most frequent occurring category.
Q16. The histogram illustrates the distribution of variable V2 from this study. Using the data in the table and graph, choose the most suitable measure of location to report the central tendency of variable “V2”.

The correct answer is a). The histogram is used to display the distribution of a continuous variable. It is showing a bell-shaped distribution, suggesting it has a Normal distribution. The mean is the most suitable measure of location for this type of variable with such a distribution.
Q17. Using the data in the table and graph, choose the most suitable measure of location to report the central tendency of variable “V3”.

The correct answer is b). The histogram is used to display the distribution of a continuous variable. It is showing a skewed distribution, suggesting there are some outliers in the data. The median is the most suitable measure of location for this type of variable with such a distribution since it is calculated by taking the middle value of the ranked data and is not affected by outliers or extreme values whereas the mean is.
Q18. Which of the following statements is true?
The correct answer is a). When there are even numbers of observations in a variable, the median is the average of the middle two values of the ordered data in a variable. The mode is normally used to represent the most frequent observation in a categorical variable. The median is the middle value of the ordered data, and it is not affected by exceptionally small or large values in the data, hence it is not sensitive to outliers and extreme values.
Q19. Which of the following statements are true?
The correct answers are a), c) and d). b) The mean is highly affected by the outliers and extreme values when the distribution is skewed; therefore, it should not be reported as the measure of central tendency for data with a skewed distribution. c) The median is not sensitive to outliers and extreme values, so it should be reported as the measure of central tendency for data with a left (or right) skewed distribution.
Q20. Which of the following statements are true?
The correct answers are a) and d). a) is correct, but note that this is only true when the distribution is symmetric and Normal. b) A variable with a negatively skewed distribution is one that has some relatively small values in the data compared to the rest of the data. The mean will therefore tend to be smaller than the median. c) It is not necessary to report all the measures of location. The types of data should first be identified. If it is categorical data, then use the mode. If it is continuous data, then the distribution of the variable should be assessed. d) If it is normally distributed data, report the mean; however, if the data is skewed, then both mean and median could be reported.
Q21. Which of the following are measures of dispersion?
The correct answers are b), d), e) and i). A measure of dispersion quantifies the spread of the variable. A measure of dispersion is often presented alongside an appropriate measure of central tendency.
Q22. The heights (cm) of 6 children were measured as 141, 155, 130, 146, 141, 134.
What is the range of the data?
The correct answer is d). The range is the difference between the minimum and the maximum. The minimum is 130 and the maximum is 155. Therefore the range is: 155-130 = 25.
Q23. The heights (cm) of 6 children were measured as 141, 155, 130, 146, 141, 134.
What is the standard deviation of the data? Use 141.1667 as the value for the mean to keep the precision. Use Excel or a calculator.
The correct answer is a). The standard deviation shows how much variation or dispersion there is from the mean. It is expressed in the same units at the data. The standard deviation in this case is 8.84 (2 d.p.). Click here to see the formula for a sample standard deviation(s).
Q24. As in the previous questions, the heights (cm) of 6 children were measured as 141, 155, 130, 146, 141, 134
What is the variance of the data? Use 141.1667 as the value for the mean to keep the precision. Use Excel or a calculator.
The correct answer is c). The variance is the square of the standard deviation. Therefore the variance is 17.68 (2 d.p.).
Q25. The bar chart illustrates the distribution of variable “V1” from this study. Using the data in the table and graph, choose the most suitable measures of dispersion to report the spread of variable “V1”.

The correct answer is h). The bar chart is used to display the frequency of a categorical variable. This categorical variable is purely nominal. It does not make sense to report a measure of dispersion for a variable of this type.
Q26. The histogram illustrates the distribution of variable “V2” from this study. Using the data in the table and graph, choose the most suitable measure(s) of dispersion to report the spread of variable “V2”

The correct answers are a), c), d) and e). The histogram is used to display the distribution of a continuous variable. It is showing a bell-shape distribution, suggesting it has a Normal distribution. Standard deviation, range (or minimum and maximum) are the most suitable measure of dispersion for this type of variable with such a distribution.
Q27. The histogram illustrates the distribution of variable “V3” from this study. Using the data in the table and graph, choose the TWO most suitable and informative measure(s) of dispersion to report the spread of variable “V3”.

The correct answers are f) and g). The histogram is used to display the distribution of a continuous variable. It is showing a skewed distribution, suggesting there are some outliers in the data. The lower quartile and upper quartile are the most suitable and informative measures of dispersion for a variable with this type of distribution. These are not affected by outliers and extreme values. Note that the minimum and the maximum could be reported here to highlight the skewness / extreme values but they are not the most suitable measures. Likewise, the IQR is less informative than the lower and upper quartiles and so these are preferable.
Q28. Which of the following statements are true?
The correct answers are a) and b). c) The standard deviation is sensitive to outliers and extreme values in the same way as the mean. d) The upper quartile is the 75th percentile of the ordered data. The 25th percentile is the lower quartile.
Q29. Which of the following statements are true?
The correct answers are b) and c). a) The inter-quartile range is the difference between the 25th percentile and the 75th percentile of the data. c) The box in a box and whisker plot represents the central 50% of the data and is bounded by the lower and the upper quartile.
Q30. Which of the following statements are true?
The correct answers are b) and c). a) The standard deviation is sensitive to outliers and extreme values in the same way as the mean.
Q31. What is the name of this type of graph?

The correct answer is b). A bar chart is a very simple graph to construct which allows you to display the frequency counts or the percentages of a categorical variable. It is the most effective way to compare frequency counts or percentages graphically. In a bar chart, the horizontal axis represents the levels or categories of the variable of interest (categorical), while the vertical axis represents the frequency counts or the percentages. The special feature of this graph is that there is always a gap between the bars
Q32. What is the name of this type of graph?

The correct answer is d). The most common form of a histogram is obtained by splitting the range of the data into equal-sized intervals or bins (i.e. classes). In a histogram, the horizontal axis represents the bins of the variable of interest (continuous), while the vertical axis represents the frequency (i.e. counts for each bin). The number of bins used is important; generally between 5 and 15 classes should be used. This graph is a very useful tool for spotting outliers or potential errors in a dataset and visualising the distribution of the variable of interest. The special feature of this graph is that there is no gap between the bars
Q33. What is the name of this type of graph?

The correct answer is a). A pie chart is built up from a number of wedges that are each used to illustrate the percentage of observations in one category. The size of the angle for each wedge at the centre of the pie indicates the proportion of subjects within the corresponding category with respect to the whole sample size. The bigger the angle of the wedge, the larger the proportion of subjects in the category. The special feature of this graph is very obvious; it looks like a round cake or pizza. If there are too many categories it can be difficult to understand data in the chart.
Q34. What is the name of this type of graph?

The correct answer is c). Alongside a histogram, a dot plot is an alternative way to display the distribution of a continuous variable. It can also be used to compare a continuous variable between groups. It plots one point for every observation, with similar values placed next to each other. It is also good at shows outliers or extreme values. This type of plot is impractical when there are too many observations as the points will be stuck to each other. The special feature of this graph is that it is constructed with dots, allowing you to identify each individual observation.
Q35. Select the variable(s) that could be displayed by the chart shown.

The correct answers are b), d) and f). Age, Height and LDL are continuous variables and a histogram is an effective graph to illustrate the distribution of the variables.
Q36. Select the variable(s) that could be displayed by the chart shown.

The correct answers are c), e), g), h), i) and j). They are categorical variables, and a bar chart is an effective graph to illustrate either the frequency or percentage (or proportion) of each category for this type of variable.
Q37. Select the variable(s) that could be displayed by the chart shown.

The correct answer is k). This chart is generated when the individual values of a continuous variable are plotted as bars. It is not a histogram but could be considered as a bar chart if you had a categorical variable with numerous categories. If you ticked Gender, Blood group, Feeling happy, Number of child, Smoke and Social Class then this is acceptable.
Q38. Select the variable(s) that could be displayed by the chart shown.

The correct answers are b), d) and f). Age, Height and LDL are continuous variables and a box plot is an effective graph to illustrate the distribution of each of these variables, though a histogram may reveal more information.
Q39. Select the variable(s) that could be displayed by the chart shown.

The correct answers are c), e), g), h), i) and j). Gender, Blood group, Feeling happy, Number of child, Smoke or Social Class are categorical variables, a pie chart may be used to illustrate either percentage (or proportion) of each category for this type of variable, though a bar chart is often preferred. Pie charts are rarely found in journal publications, but appear in presentations or posters.
Q40. Select the variable(s) that could be displayed by the chart shown.

The correct answers are b), d) and f). Age, Height and LDL are continuous variables and a dot plot may be used to illustrate the distribution of each of these variables, though a histogram is more frequently encountered.
You have completed the quiz and here is your result:
If you would like to try this quiz again, click here.
You have not answered questions .
Please go back and complete the questions.
Please remember to click submit for each question, and read the feedback comments!
Score 0/0