Module Summary Page 1/3
A study was set up to look at whether there was a difference in the mean arterial blood pressure between two groups of volunteers, after 6 weeks of following one of two treatment programs. One group of volunteers were given an exercise regimen to follow for the 6 weeks and the other group were given the same exercise regimen with the addition of an experimental tablet.
The table below shows the number of patients who developed an infection while in hospital as well as those who remained infection free. The results from two different hospitals over the same time period are displayed.
Infection |
No Infection |
Total |
|
Hospital A |
16 |
237 |
253 |
Hospital B |
27 |
594 |
621 |
A study was set up to look at whether overweight patients were able to meet their target weight loss, after 6 weeks of following one of two treatment programs. One group of patients were given an exercise regimen to follow for the 6 weeks and the other group were given the same exercise regimen with the addition of an experimental tablet.
A study was set up to look at whether overweight patients were able to meet their target weight loss, after 6 weeks of following one of three treatment programs. One group of patients were given an exercise regimen to follow for the 6 weeks, another group was given the same exercise regimen with the addition of an experimental tablet and the third group were given a placebo tablet.
In a study patients had their BMI classified as either normal (18.5 to 25) or not normal. These patients were then all asked to follow a 6 week exercise regimen. Their BMI was then classified again (in the same way) after the intervention had finished.
The table below shows the number of patients who developed an infection while in hospital as well as those who remained infection free. The results from two different hospitals over the same time period are displayed.
Infection |
No Infection |
Total |
|
Hospital A |
16 |
237 |
253 |
Hospital B |
27 |
594 |
621 |
A study was set up to look at counts of CD4+ T helper cells in a group of 17 healthy volunteers and a separate group of 7 immunocompromised patients. The following table is a snapshot of the data:
Subject |
Group |
CD4+ count (cells/mm3) |
1 |
Healthy volunteer |
1024 |
2 |
Healthy Volunteer |
789 |
3 |
Patient |
337 |
24 |
Patient |
243 |
The following Histograms display the distribution of reported alcohol consumption (units) in patients diagnosed with alcoholic liver disease before an intervention and after the intervention has been completed. A histogram of the difference (before minus after) is also presented.
The table below shows the results from looking at the diagnostic accuracy of a new rapid test for HIV in 100,000 subjects, compared to the Reference standard ELISA test. The rows of the table represent the test result and the columns the true disease status (as confirmed by ELISA).
|
HIV+ |
HIV- |
Total |
Test + |
378 |
397 |
775 |
Test - |
2 |
98,823 |
98,825 |
|
380 |
99,220 |
100,000 |
A linear regression analysis of Birth Weight (grams) and Gestational Age (weeks) gave the following output.
Model |
Beta Coefficient |
95%CI |
p-value |
Gestational Age |
96.56 |
14.41 to 178.72 |
0.02 |
Constant |
-230.34 |
-3340.0 to 3180.30 |
0.39 |
Welcome to the "Statistical Tests Module" quiz. There are 20 questions to answer.
Please remember to click the Submit button for each separate question, and read the feedback comments!
Click the Next button to begin the quiz.
Q1. Should a Chi-square test be used in this situation?
The answer is b). As we are wanting to compare the mean arterial blood pressure between the two groups then a test that is used for continuous outcomes is required. As such a Chi-square test is not appropriate here.
Q2. If we wanted to produce a graphical display to summarise all of this data then which of the following chart types could be used (select all that apply)?
The answer is f). As we want to display infection and non-infection separately for each hospital then we need to use a clustered bar chart. If we only had one hospital that we were interested in, or if we only wanted to display infections or non-infections then a normal bar chart would be suitable.
Q3. Should a Chi-square test be considered in this situation?
The answer is a). As we would be wanting to compare the proportion of individuals meeting their weight loss targets following treatment then a test that is used for categorical outcomes is required. As we are also comparing two independent groups of patients then a Chi-square test should be considered here.
Q4. Should a Chi-square test be considered in this situation?
The answer is a). As we would be wanting to compare the proportion of individuals meeting their weight loss targets following treatment then a test that is used for categorical outcomes is required. As we are also comparing two independent groups of patients then a Chi-square test should be considered here.
Q5. If we are interested in the change in state from baseline to post intervention then should a Chi-square test be considered in this situation?
The answer is b). As we would be looking at BMI classifications which have two levels, then a test that is used for categorical outcomes is required. However because we are looking at paired data (we are looking to see whether an individual classification changes or not) the Chi-square test is not appropriate in this case. McNemars test would be more appropriate here.
Q6. If we wanted to compare the proportion of patients who developed an infection between the two hospitals then which of the following statistical tests should we consider (select all that apply)?
The answers are c) and e).
Q7. If we wanted to compare the proportion of patients who developed an infection between the two hospitals then which of the following statistical tests should we use (select the best answer)?
The answer is c).
Q8. A study was set up to look at whether there was a difference in the mean arterial blood pressure between two groups of volunteers, after 6 weeks of following one of two treatment programs. One group of volunteers were given an exercise regimen to follow for the 6 weeks and the other group were given the same exercise regimen with the addition of an experimental tablet.
Which type of t-test should be used in this situation?
The answer is b). As we are wanting to compare the mean arterial blood pressure between the two groups then a test that is used for continuous outcomes is required. As the two groups are independent of each other and we are wanting to compare them then an Independent samples t-test would be the test to use.
Q9. If we wanted to produce a graphical display to summarise this data separately by group then which of the following chart types could be used (select all that apply)?
The answers are c) and g). As the outcome variable is continuous and we additionally want to produce separate summaries for a couple of groups then we have a few options of plots we could use with Box & Whisker and Dot Plots the best options. You will also see Bar charts presented for this type of data with error bars included. Although not the clearest plot they are commonly seen in laboratory scenarios. If these plots are used then it is crucial to label what the error bars represent as they could be 95% confidence intervals, SE’S, SD’s, 2xSE’s etc.
Q10. A histogram of these CD4+ cell counts has shown that the distribution is negatively skewed. If we wanted to test for differences between the average values in Healthy volunteers compared to immunocompromised patients which type of t-test should be used?
The answer is d). As the outcome variable is skewed and the sample size is small then a t-test would not be appropriate here and we should use a non-parametric test. If we were to take a transformation of the data (such as a log) that happened to normalise the data then we could use a t-test on the transformed data.
Q11. If we were interested in testing to see if there had been a significant change in reported alcohol consumption then we could use which of the following t-tests and for what reason?
The answer is d). We are interested in the change from “Before” to “After” and as this “Difference“ is normally distributed then a paired samples t-test would be the test to use.
Q12. Which of the following statements you believe to be True when considering the Mann-Whitney U test?
The correct answers are a) & c). a) is True. Data values are ranked during the calculation. b) is False. The Mann-Whitney U test is the non parametric equivalent of the independent samples t-test. c) is True. If the data is skewed, the assumption of Normality for the independent samples t-test is not met and the non-parametric alternative which is distribution free is required, d) Is False. Calculation of standard deviations and considerations of variance are not required for non parametric tests.
Q13. The best use of the Mann-Whitney U test will be for the comparison of which of the following types of data.
The answer is c). The Mann-Whitney U is a non-parametric test for assessing whether two independent or unpaired samples of observations come from the same distribution.
Q14. The accuracy of diagnosis of femoral hernia in referrals to a district general hospital over a period of 5 years was studied and related to clinical outcome. A correct diagnosis was made in only 36 of 98 cases (60 urgent, 38 routine) before admission to hospital. The median length of post-operative stay of urgent admissions was 7 days (range 4-50) when a correct initial diagnosis was made and 10 days (range 4-50) when the initial diagnosis was incorrect (P = 0.07, Mann-Whitney test). (Corder, A.P. Postgrad Med J (1992) 68, 26 – 28). Which of the following statements, if any, are true?
The correct answers are a) and d). a) This is True. However the Null hypothesis does not need to mention the statistical test. This would be mentioned in a statistical analysis plan for the study. In doing so we are also deciding in advance that the length of stay data will be skewed. This would be justified if we had existing data to support this assumption. b) This is False. The Alternative hypothesis should state that there is a difference in the length of stay. It should not say if this will be greater or less and should not quantify it, even if we had preliminary data from elsewhere. c) This is False. It is the distribution of the data that is important and the median and range demonstrate that the data is skewed so a non-parametric test is required. d) This is True. The P-value was 0.07 so it did not reach the usual level of significance of 0.05.
Q15. The following chart shows triglyceride readings collected from male and female subjects.
Which of the following options would you choose to test whether there was a difference between the triglyceride levels in male and female subjects?

The correct answer is c). The best option would be to transform the data using a logarithmic transformation and plot a histogram of the transformed data. Then if the transformed data is Normally distributed we could undertake an independent samples t-test. We would not do a t-test (a) on skewed data, we could perform a Mann-Whitney U test (b), but as the t-test is more powerful we should explore the effects of transformation first. Note that if the Mann-Whitney U test is significant on the raw data, then we would expect transformed data that was Normally distributed to be significant. Options (d) is not sensible because the Mann-Whitney U test does not assume anything about the underlying distribution of the data, it is based on ranking, so transformation will not add anything. Option (e) does not allow for visualising the transformed data. The transformation may not result in a variable that follows a normal distribution in which case the t-test would be inappropriate.
Q16. The best use of Wilcoxon signed-ranks test will be for comparison of which of the following types of data.
The answer is b). The Wilcoxon signed-ranks test is a non-parametric test for assessing whether two related or paired samples of observations come from the same distribution. It is a non-parametric alternative to the paired Student’s t-test.
Q17. A histogram of white blood cell (WBC) counts in 15 sick patients showed that the distribution was negatively skewed. If we wanted to test for differences between the published WBC count for a healthy population compared to the WBC values in these patients which type of test should be used?
The answer is a). As the outcome variable is skewed and the sample size small, a non-parametric test is required. The Wilcoxon signed-ranks test is the test to use. The Mann Whitney U would be used if we had two independent groups where we wished to compare the WBC counts from a sample of sick patients with the WBC counts from a sample of healthy patients.
Q18. Select all of the following statements which you believe to be True about the Wilcoxon signed-ranks test.
The answer is b). a) is False. There a no assumption about the distribution, so you could use the Wilcoxon signed-ranks test on Normally distributed differences of paired observations, however a paired t-test would be more powerful. b) is True. The Wilcoxon signed-ranks test makes no assumptions about the distribution of the data. c) is False. The Mann-Whitney U test considers the differences in medians of two independent groups. The Wilcoxon signed-ranks test considers the difference between paired observations where the two groups are not independent. d) is False. The Sum-Rank Test is the same as the Mann Whitney U test it tests for differences in the medians of independent groups. However because of the confusion between Sum Rank and Signed Rank, most people use the terms Mann Whitney for independent groups and Wilcoxon (Signed-Rank Test) for paired groups of data.
Q19. A study was carried out of the impact of protease inhibitors (PIs) on the health of 19 patients infected with both hepatitis C virus (HCV) and human immunodeficiency virus (HIV). Baseline CD4 counts were compared with values taken 6 weeks after treatment commenced using the Wilcoxon signed ranks test. Over the six weeks, CD4 counts had increased significantly (P-value = 0.002). Select all the following statements which you believe to be true.
The answer is d). a) is False. If the differences in CD4 counts (value at 6 weeks minus value at baseline) were Normally distributed we would expect a paired t-test to be used. Since it was not we must conclude the differences showed a skewed distribution and this is why the Wilcoxon signed-ranks test was chosen. Note if the distribution of the CD4 counts taken at baseline was skewed or the distribution at 6 weeks was skewed this would not be a reason to choose the Wilcoxon signed-ranks test, it is the distribution of the difference that is important. b) is False. This is because it does not consider that we are using paired observations. It implies we will use a Mann-Whitney U test because we will be comparing medians. However we should consider the difference so a better hypothesis would be ‘There is no median difference in the value of CD4 counts after six weeks of treatment with a PI in patients with HCV and HIV.’ c) is False. The value were not independent, they were paired. If they were independent we would use the Mann-Whitney U test. d) is True. The P-value is less than 0.05 so the result is statistically significant. What we are not told is by how much the CD4 counts increased and whether this was clinically significant. e) is False. We assume that the distribution of the differences is skewed, so a t-test would not be appropriate because the assumption of a Normality of the difference is not satisfied.
Q20. Which of the following statements is true regarding correlation?
The correct answer is c). If we want to assess the relationship between other types of variables, then other statistical techniques must be used.
Q21. Which of the following pairs of variables should not be analysed using Correlation?
The answer is b). We should not use categorical variables, e.g. Gender, when calculating a correlation coefficient, only continuous variables as in cases a), c) and d).
Q22. When would it be appropriate to calculate the Spearman Rank Correlation coefficient?
The answer is c). We calculate the Spearman rank correlation coefficient when both variables have a skewed distribution. We calculate the Pearson correlation coefficient when one or both of the variables are reasonably Normally distributed. We should not use correlation if the relationship is not linear.
Q23. Which of the correlation coefficient best describes the relationship between the X and Y variables?

The answer is d). There is a trend from top left to bottom right so the correlation coefficient will be negative. If the data points were randomly organised with no trend then options a) or c) would have been possible.
Q24. Which of the correlation coefficient best describes the relationship between the X and Y variables?

The answer is c). The correlation coefficient will be closes to zero which mean there is no linear association. However this illustrates an important point, always plot the data to see what it tells you. Then calculate the correlation coefficient if appropriate. There is no linear association here, but there is a mathematical relationship. The plot is a parabola where Y= (X-10)2.
Q25. What is the Sensitivity of the new rapid test for HIV? Report the answer to 3 decimal places.
The answer is 0.995. The sensitivity of the test focuses on the HIV+ column of the table. It is the number of truly HIV+ patients who tested positive with the new rapid test divided by the total number of HIV+ patients. Therefore it is 378/380=0.995 to 3 decimal places.
Q26. What is the Specificity of the new rapid test for HIV? Report the answer to 3 decimal places.
The answer is 0.996. The specificity of the test focuses on the HIV- column of the table. It is the number of truly HIV- patients who tested negative with the new rapid test divided by the total number of HIV- patients. Therefore it is 98823/99220=0.996 to 3 decimal places.
Q27. What is the Positive predictive value (PPV) for the new rapid test for HIV in this cohort? Report the answer to 3 decimal places.
The answer is 0.488. The PPV of the test focuses on the Test+ row of the table. It is the number of truly HIV+ patients who tested positive with the new rapid test divided by the total number of positive tests using the new rapid test. Therefore it is 378/775=0.488 to 3 decimal places.
Q28. What is the Negative predictive value (NPV) for the new rapid test for HIV in this cohort? Report the answer to 5 decimal places.
The answer is 0.99998. The NPV of the test focuses on the Test- row of the table. It is the number of truly HIV- patients who tested negative with the new rapid test divided by the total number of negative tests using the new rapid test. Therefore it is 98823/98825=0.99998 to 5 decimal places.
Q29. Which of the following statements is NOT true regarding linear regression?
The answer is b).
Q30. We have a regression equation where Y = 10X + 20, if X is 5.3 what is Y?
The answer is d). We start with Y = 10X +20, replace X with 5.3, so we get Y=10*5.3 + 20 which becomes Y = 53+20 and then Y=73.
Q31. If the correlation coefficient output from linear regression is 0.64. How much of the variation of the Y axis variable is explained by the X axis variable?
The answer is c). The variation is given by r2, where r is the correlation coefficient. In this example r = .64 so r2 = 0.64 *0.64 = 0.41 = 41%. So the X or independent variable explains 41% of the Y or dependent variable. In which case 59% of the variation is the Y variable is unexplained in this simple linear regression model.
Q32. What would be the Null hypothesis be for a linear regression model of Gestational Age (Independent variable) and Birth Weight (dependent variable)?
The answer is c). Our Null hypothesis would state there no relationship between our independent and dependent variables. It is saying that the β, (beta coefficient) is zero. The Alternative Hypothesis would state that there is a relationship but would not specify the direction, so it could be a positive or negative linear relationship.
Q33. What does the following plot tell us about our regression model?

The answer is c). The plot of predicted value against residual for the Dependent variable is used to check the assumption of constant variance for the regression model. In the plot the data points are scattered randomly, there is no obvious trend so the assumption is met.
Q34. Calculate the predicted birth weight of a baby born at 40 weeks gestational age.
The answer is a). The prediction equation is Birthweight = α + β * Gestational Age. α = -230.34; β = 96.56. So we have Birthweight = -230.34 + 96.56 * Gestational Age, which becomes: Birthweight = -230.34 + 96.56 * 40 Birthweight = - 230.34 + 3862.4 remember to subtract 230.34 - it is negative Birthweight = 3632
Q35. If we wished to test whether there was any association between Gender and uptake of Flu vaccination which would be the best test to choose?
The answer is c). Both variables are categorical so the Chi square test is used.
Q36. If we wished to test if there was a difference between the gestational age of babies at birth and the use of a nutritional supplement by their mothers during pregnancy which would be the best test to choose?
The answer is d). We have two independent groups defined by a categorical and a continuous variable. The decision to be made is whether the continuous variable is Normally distributed. Gestational age is likely to be negatively skewed because pregnancies rarely go beyond 42 weeks. So a non-parametric test is required for independent samples which is the Mann-Whitney U. If the data followed a Normal distribution or could be transformed to follow one then an Independent Samples t-test could be used.
Q37. The cotinine level was measured in women at the beginning and end of their pregnancy. The change in cotinine was presented on a Log scale. Which would be the best test to use to see if there had been a change in cotinine level during pregnancy?
The answer is g). We have paired observations from the same subject taken on different occasions. When data is presented on a Log scale it is usually because the transformation results in the data following a Normal distribution, so a parametric test can be used. In this case the Paired Samples t-test. A Mann-Whitney U test would be used when the data does not follow a Normal distribution.
Q38. If we wished to estimate the strength of the linear relationship between the weight of a mother and the weight of her baby at birth which would be the best test to choose?
The answer is b). Pearson Correlation estimates the strength of a linear relationship. Linear Regression estimates the nature of the relationship and assumes that one of the variables, the independent variable can be used to predict the nature of the dependent or outcome variable. If the relationship was not linear then Spearman Rank Correlation could be used.
Q39. The pre-operative and post-operative anxiety levels of adolescent patients undergoing orthopaedic surgery were measured using the State-Trait Anxiety Inventory STAI scale. The authors reported the pre-operative anxiety levels as mean = 33.8 (SD = 5.1) and post-operative anxiety levels as mean = 38.8 (SD = 7.2). Which would have been the most appropriate test to assess the relationship between pre and post operative anxiety?
The answer is g). We have paired data from the same subjects so a Paired Samples t-test is appropriate assuming the differences in the data followed a Normal distribution which is implied by the presentation of the means and standard deviations. The authors of this study did not present the mean difference of the pre - post values which should be quoted together with 95% confidence intervals.
Q40. Which test would you choose to compare two groups with skewed unpaired continuous data?
The answer is d.
Q41. Which test would you choose to compare the relationship between two continuous variables which had skewed distributions?
The answer is a). The Spearman Rank Correlation coefficient is used. Pearson correlation assumes that at least one of the variables follows a Normal distribution.
Q42. Which test would you choose to compare a single sample of non-parametric values with a published value?
The answer is e). You would use the Wilcoxon Signed-Ranks test to compare the data values from your sample with the published value where the data does not follow a Normal distribution. You pair each sample value with the published value. If the data was Normally distributed you could use the one sample t-test.
You have completed the quiz and here is your result:
If you would like to try this quiz again, click here.
You have not answered questions .
Please go back and complete the questions.
Please remember to click submit for each question, and read the feedback comments!
Score 0/0