Statistical Principles: Displaying Two or More Variables Page 1/3
Study the following data which displays a selections of variables from a study dataset.
ID |
Age |
Gender |
Height |
Blood group |
LDL† |
Feeling happy? |
Number of children |
Smoke? |
Social class |
1 |
25 |
F |
1.62 |
B |
150 |
Agree |
0 |
No |
I |
2 |
35 |
F |
1.58 |
O |
123 |
Strongly agree |
1 |
Yes |
II |
3 |
44 |
M |
1.35 |
A |
178 |
Disagree |
3 |
Yes |
I |
4 |
28 |
F |
1.54 |
AB |
205 |
Disagree |
0 |
No |
III |
5 |
35 |
M |
1.35 |
O |
229 |
Indifferent |
2 |
Yes |
I |
6 |
42 |
M |
1.21 |
B |
215 |
Agree |
2 |
Yes |
IV |
7 |
36 |
F |
1.76 |
A |
130 |
Strongly disagree |
1 |
No |
IV |
8 |
38 |
M |
1.57 |
A |
175 |
Disagree |
1 |
Yes |
V |
9 |
30 |
M |
1.47 |
AB |
240 |
Indifferent |
0 |
No |
III |
10 |
40 |
F |
1.18 |
B |
167 |
Strongly agree |
6 |
No |
I |
: |
: |
: |
: |
: |
: |
: |
: |
: |
: |
Welcome to the "Displaying Two or More Variables" quiz. There are 11 questions to answer.
Please remember to click the Submit button for each separate question, and read the feedback comments!
Click the Next button to begin the quiz.
Q1. What is the name of this type of graph?

The correct answer is d). This is an extension of the simple bar chart, but instead of displaying the frequency counts or percentages of one categorical variable, it displays two; one categorical variable is nested within another. In a clustered bar chart, the horizontal axis represents the levels or categories of the nested variable, the bars within each cluster represent the categories of the other variable, while the vertical axis represents the frequency counts or percentage. The special feature of this graph is that there is no gap between the bars within a cluster, but there is gap between bars across the cluster. It is often clearer to use different colours on the bars, one colour for each clustered category.
Q2. What is the name of this type of graph?

The correct answer is b). A box and whisker plot (or box plot) illustrates the location and spread of a continuous variable, highlighting any extreme values. The central box, which contains the middle 50% of the observations, extends from the lower quartile, LQ, of the data to the upper quartile, UQ, with the median being marked by an internal line across the box. This divides the 50% of observations into two groups of 25%. The difference between the upper and lower quartiles is known as the inter-quartile range (IQR). Whiskers are drawn from each end of the box extending as far as 1.5*IQR, or as far as the furthest observation within that range if less. Any observations lying further out up to 3*IQR are known as outliers, and any observations lying even further than 3*IQR are known as extreme values; these are drawn as separate dots. Two box plots, each representing a level in a categorical variable, can be drawn side by side to allow comparison.
Q3. What is the name of this type of graph?

The correct answer is c). A scatter plot is always used to display the association or the relationship between 2 continuous variables. Each of these 2 variables is represented on the X and Y axis accordingly. Each point is plotted based on the corresponding pair of values for the observation, i.e. coincident point. The trend of the relationship indicated by the points represents the direction of the relationship between the two variables. The tighter the coincident points are to this trend then the stronger the relationship between the two variables. The special feature of this graph is that a cloud of dots are floating between the two axes.
Q4. What is the name of this type of graph?

The correct answer is a). A line plot is usually used when there is a time variable involved. It is used to demonstrate a trend of a continuous variable throughout time. The Y axis represents the continuous variable, while the X axis represents the time variable. The chart is produced by modifying a scatter plot so that the points are joined together by lines with respect to the chronological order of the time variable. Some software packages offer a Line Chart but this is different, the X axis in a line chart is treated as categorical and not a true continuous or scaled axis. If the interval between items on the X axis on a line chart are equal (e.g., 0,1,2,3,4,5,6), the plot will look correct. If they are not equally spaced (0,1,3,6,12), the plot will be incorrect.
Q5. Which type(s) of graph could be used to illustrate the relationship between the variables Age and Height?
The correct answer is a). The relationship between the variables Age and Height can be illustrated by scatter plot, since both variables are continuous and each axis can be used to represent one variable. Each point on the plot represents the observation with the corresponding value from the two variables.
Q6. Which type(s) of graphs from the above could be used to illustrate the relationship between the variables Height and Gender?
The correct answer is c). The relationship between the variables Height and Gender can be illustrated by a box and whisker plot, since each of the boxes represents the distribution of Height for one of the Gender groups.
Q7. Which type(s) of graphs from the above could be used to illustrate the relationship between the variables Smoke and Gender?
The correct answer is d). The relationship between the variables Smoke and Gender can be illustrated by cluster bar charts, since both of the variables are categorical. This chart displays the number (or proportion) of observations in each of the categories.
Q8. If we were trying to predict LDL from Age, which graph would be the best one for displaying this data?
Graph a:

Graph b:

Graph c:

Graph d:

The correct answer is b). A scatter plot should be used to illustrate the relationship between two continuous variables.
Q9. Read the statement below and select the most appropriate chart to represent the description of the data.
Statement: The relationship between the body mass index (BMI) and age of subjects was positive and strong. The correlation coefficient is 0.84.
The correct answer is c). This statement is describing the strength and direction of the relationship between the two continuous variables (BMI and age). A scatter plot is the best graph to use since it displays the association between 2 quantitative, generally continuous, variables. Each of these 2 variables represents the X and the Y axis accordingly. Each point is plotted based on the corresponding pair of values for the observation, i.e. coincident point. The trend of the relationship indicated by the points represents the direction of the relationship between the two variables. The closer the points are to forming a line, then the stronger the relationship between the two variables.
Q10. Read the statement below and select the most appropriate chart to represent the description of the data.
Statement: About 21% of the respondents were smokers, 27 % were ex-smokers and 52% had never smoked. Amongst the male group, 23% of them were smokers whilst 19% were smokers from the female group.
The correct answer is c). This statement is describing the proportion of subjects in each of the categories formed by the variables smoking status and gender. One categorical variable is nested within another (smoking status nested within gender). A cluster bar chart is the best graph to use since it can display the frequency counts or percentages of two categorical variables. It is the most effective way to compare frequency counts or percentages graphically between the categories. In a cluster bar chart, the horizontal axis represents the levels or categories of the nested variable (e.g. smoking status), the bars within each cluster represents the categories of the other variable (e.g. gender), while the vertical axis represents the frequency counts or percentage.
Q11. Read the statement below and select the most appropriate chart to represent the description of the data.
Statement: The distribution of the post treatment pain score for patients with lower back pain wais skewed. The median (with lower quartile to upper quartile) pain score was 4.5 (3.1, 5.9) in the under 65-year old age group, while those in the 65-year old and above age group was 4.8(2.5, 7.1).
The correct answer is a). This statement is describing the distribution of a continuous variable (pain score) broken down by a categorical variable (age group). A box and whisker plot is the best graph to use since it illustrates the median, the lower quartile and the upper quartile of the data. It also highlights any observations that are extreme values. The central box, which contains the middle 50% of the observations, extends from the lower quartile, LQ, of the data to the upper quartile, UQ, with the median being marked by an internal line across the box. This divides the 50% of observations into two groups of 25%. The difference between the upper and lower quartiles is known as the inter-quartile range (IQR). Whiskers are drawn from each end of the box extending as far as 1.5*IQR, or as far as the furthest observation within that range if less. Any observations lying further out up to 3*IQR are known as outliers, and any observations lying even further than 3*IQR are known as extreme values; these are drawn as separate dots. Two box plots, each representing a level in a categorical variable, can be put drawn side by side to allow comparison.
You have completed the quiz and here is your result:
If you would like to try this quiz again, click here.
You have not answered questions 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11.
Please go back and complete the questions.
Please remember to click submit for each question, and read the feedback comments!
Score 0/0