Introduction
The prediction is the process of determining the magnitude of predictors on the response variable. Prediction helps determine the value of the outcome variable in the future using predictors or factors included during the study. Moreover, multiple linear regression is a statistical test used in assessing the relationship between the response variable and more than one predictor variable (Keith, 2019). Also, Cacoullos (2014) argued that discriminant which is used in testing the equality of group centroids is associated with multivariate analysis of variance since it uses Wilks’ lambda applied in GLM multivariate
In this regard, multiple linear regression and discriminant analysis are the most appropriate statistical techniques in predicting the outcome of a dependent variable at different values of the predictor variables and assess the contribution of the predictors on the outcome and association between the independent variable respectively (Keith, 2019). Besides, the researcher can assess the variation of the outcome explained by the independent variables included in the model. The test is used in predicting the values of the response variable using more than one predictor variable. Discriminant analysis was used to assess the association between the two tests that were developed in a firm to examine employee performance. The study used a sample of 43 employees who were grouped as either successful or unsuccessful and performed the two test to assess their performance in a given position. Before conducting both multiple linear regression and discriminant, it is good to check whether the variables meet the necessary assumptions. For multiple linear regression dependent variable must be continuous and approximately normally distributed, while the predictor variables can either be continuous or categorical. For discriminant analysis the dependent variable must be divided into two or more groups. Besides, the predictor variables should not be correlated with each other, or there should be no multicollinearity (Alin, 2010) for both statistical tests. This can be assessed using the variance inflation factor, which should be less than ten, or through correlation coefficients between the predictor variables. The current study assessed the relationship between the cost of constructing an LWR plant and the three predictor variables S, N, and CT and assessed the association between the two test used to examine the employee performance.
Assumption of Regression Analysis
Multicollinearity
Correlation analysis is used in determining the strength and direction of association between two variables (Puth et al., 2014). The Pearson correlation coefficient is used for testing the strength and direction of association between two variables with a continuous level of measurement. However, when the variables have an ordinal level of measurement, we use the Spearman rank correlation coefficient (Puth et al., 2014). The Pearson correlation coefficient ranges from -1 to 1, with -1 or 1 indicating perfect correlation and zero indicating no correlation.
Table 1: Correlation Analysis
S
N
S
Pearson Correlation
1
.193
Sig. (2-tailed)
.289
N
32
32
N
Pearson Correlation
.193
1
Sig. (2-tailed)
.289
N
32
32
The correlation analysis in table 1 above indicates that the correlation between the two predictor variables (S and N) is positively weak and not significant at 0.05 level of significance (r = 0.289, p = 0.193). This suggested that multicollinearity does not exist and the assumption multicollinearity is not violated. Furthermore, based on the analysis, the variance inflation factor (VIF) is less than 10 (Daoud, 2017, December), suggesting that the multicollinearity assumption is not violated.
Normality test
Table 2: Tests of Normality
Kolmogorov-Smirnova
Shapiro-Wilk
Statistic
df
Sig.
Statistic
df
Sig.
ln_C
.104
32
.200*
.967
32
.414
*. This is a lower bound of the true significance.
a. Lilliefors Significance Correction
The test of normality of the dependent variable (ln(C)) revealed that the normality assumption is not violated after transforming the variable C, using natural log at 0.05 level of significance (Shapiro-Wilk = 0.967, p = 0.414).
Results and discussion
Regression Analysis
a. Use residual analysis and R2 to check your model.
Table 3: Model Summary
Model
R
R Square
Adjusted R Square
Std. Error of the Estimate
Change Statistics
R Square Change
F Change
df1
df2
Sig. F Change
1
.482a
.232
.179
.34240
.232
4.385
2
29
.022
2
.483b
.234
.151
.34814
.001
.052
1
28
.822
a. Predictors: (Constant), N, S
b. Predictors: (Constant), N, S, CT
The R-Squared of 0.232 indicates that the model can explain about 23.2% of ln(C) variation and 76.8% of the variation is explained by other variables not included in the model. Besides, the analysis indicated high residuals. The low R-Square and high residuals indicated that the model does not fit the data well (Brown, 2009).
b. State which variables are important in predicting the cost of constructing an LWR plant?
c. Table 4: Regression Coefficients
Model
Unstandardized Coefficients
Standardized Coefficients
t
Sig.
Collinearity Statistics
B
Std. Error
Beta
Tolerance
VIF
1
(Constant)
5.300
.277
19.161
.000
S
.001
.000
.406
2.447
.021
.963
1.039
N
.012
.010
.193
1.164
.254
.963
1.039
2
(Constant)
5.294
.283
18.718
.000
S
.001
.000
.403
2.385
.024
.958
1.044
N
.011
.010
.189
1.110
.276
.950
1.053
CT
.028
.125
.038
.227
.822
.978
1.022
a. Dependent Variable: ln_C
The regression analysis displayed in table 4 above indicates that S is a significant contributing factor in predicting ln(C) at a 0.05 level of significance (p = 0.021). However, the predictor variable N does not significantly predict the ln(C) at a 0.05 level of significance (p = 0.254). According to the analysis, there is no significant difference in ln(C) between the two levels of the cooling tower (p = 0.822), suggesting that the dummy variable CT does not have a significant effect in predicting ln(C). Therefore, the researcher used the S predictor to predict the cost of constructing an LWR plant but removed N and CT from the model.
c. State a prediction equation that can be used to predict ln(C).
After dropping N and CT from the model since they do not have a significance effect in predicting ln(C), the prediction equation is given by:
d. Does adding CT improve R2? If so, by what amount?
Based on the analysis displayed on table 3 above, there is no significant improvement in R-Squared after adding CT (p = 0.822). Adding CT in the model changes R-Square by 0.001 from 0.232 to 0.234 which is not significant different from zero.
Correlational Analysis
a. Evaluate the correlation between the two scores and state if there seems to be any association between the two.
Table 5: Pooled Within-Groups Matrices
Test1
Test2
Correlation
Test1
1.000
.187
Test2
.187
1.000
The correlation analysis is shown in table 5 above indicates that there was a weak positive correlation between the two tests (r = 0.187). This suggested that the two test scores were not correlated.
b. Find the probability of upgrading for each division of the sample by the Bayes’ theorem.
Given that: P(T1) = 43/86; P(T2) = 43/86
P (T1/Up) = 23/46; P (T2/Up) = 23/46
P(Up/T1) = P (T1/Up) P(Up) ÷ P(T1)
= (23/46*46/86) ÷43/86
= 23/43
P(Up/T2) = P (T2/Up) P(Up) ÷ P(T2)
= (23/46*46/86) ÷43/86
= 23/43
c. Find the probability of upgrading for each division of the sample by the naïve version of the Bayes’ theorem.
P(Up/T1) = P (T1/Up) P(Up) ÷ P(T1)
= (23/46*46/86) ÷43/86
= 23/43
P(Up/T2) = P (T2/Up) P(Up) ÷ P(T2)
= (23/46*46/86) ÷43/86
= 23/43
d. Compare your results in parts b and c and explain the difference or indifference based on observed probabilities
Since we have only one predictor in each sample division, the naïve version and Bayes theorem have similar probabilities. There is indifference based on observed probabilities. This is because it is applied with Bayes’s theorem with an assumption of independence between the features of predictor variables (Webb, 2010).
Conclusion and Recommendations
The analysis revealed that the model with the three predictors predicting the cost of constructing an LWR plant does not fit the data well. This suggested that most of the variations of the outcome variable are explained by variables not included in the model. Further analysis indicated that the S predictor had a significant effect in predicting the cost of constructing an LWR plant (C). However, N and CT did not have a significant effect in predicting. Therefore, the researcher should drop N and CT predictors from the model and only use the S predictor in predicting the cost of constructing the LWR plant. The analysis also indicated that the two test were not associated with each other but, the first test had the best potential in discriminating than the second test. Which suggested that the first test was the best to use in predicting whether employees will be unsuccessful or successful in the position.
Nevertheless, the study did not control for alternative explanations that would affect the validity of the findings. Further study is needed that will include variables with a good fit of the data to help in predicting the cost of constructing an LWR plant. In addition, a study that will control all possible confounding is required to help in the prediction of the outcome variable and assessing the best test in predicting employee performance.
References
Alin, A. (2010). Multicollinearity. Wiley Interdisciplinary Reviews: Computational Statistics, 2(3), 370-374.
Brown, J. D. (2009). The coefficient of determination.
Cacoullos, T. (Ed.). (2014). Discriminant analysis and applications. Academic Press.
Daoud, J. I. (2017, December). Multicollinearity and regression analysis. In Journal of Physics: Conference Series (Vol. 949, No. 1, p. 012009). IOP Publishing.
Keith, T. Z. (2019). Multiple regression and beyond: An introduction to multiple regression and structural equation modeling. Routledge.
Puth, M. T., Neuhäuser, M., & Ruxton, G. D. (2014). Effective use of Pearson’s product–moment correlation coefficient. Animal behaviour, 93, 183-189.
Webb, G. I. (2010). Naïve Bayes. Encyclopedia of machine learning, 15, 713-714.
Delivering a high-quality product at a reasonable price is not enough anymore.
That’s why we have developed 5 beneficial guarantees that will make your experience with our service enjoyable, easy, and safe.
You have to be 100% sure of the quality of your product to give a money-back guarantee. This describes us perfectly. Make sure that this guarantee is totally transparent.
Read moreEach paper is composed from scratch, according to your instructions. It is then checked by our plagiarism-detection software. There is no gap where plagiarism could squeeze in.
Read moreThanks to our free revisions, there is no way for you to be unsatisfied. We will work on your paper until you are completely happy with the result.
Read moreYour email is safe, as we store it according to international data protection rules. Your bank details are secure, as we use only reliable payment systems.
Read moreBy sending us your money, you buy the service we provide. Check out our terms and conditions if you prefer business talks to be laid out in official language.
Read more