8210 WK9 DISCUSSION – Explicitpapers

[ad_1]

Discussion: Multiple Regression

This Discussion assists in solidifying your understanding of statistical testing by engaging in some data analysis. This week you will work with a real, secondary dataset to construct a research question, estimate a multiple regression model, and interpret the results.

Whether in a scholarly or practitioner setting, good research and data analysis should have the benefit of peer feedback. For this Discussion, you will post your response to the hypothesis test, along with the results. Be sure and remember that the goal is to obtain constructive feedback to improve the research and its interpretation, so please view this as an opportunity to learn from one another.

To prepare for this Discussion:

Review this week’s Learning Resources and media program related to multiple regression.
Create a research question using the Afrobarometer Dataset or the HS Long Survey Dataset, that can be answered by multiple regression.

By Day 3

Use SPSS to answer the research question. Post your response to the following:

If you are using the Afrobarometer Dataset, report the mean of Q1 (Age). If you are using the HS Long Survey Dataset, report the mean of X1SES.
What is your research question?
What is the null hypothesis for your question?
What research design would align with this question?
What dependent variable was used and how is it measured?
What independent variables are used and how are they measured? What is the justification for including these predictor variables?
If you found significance, what is the strength of the effect?
Explain your results for a lay audience, explain what the answer to your research question.

Be sure to support your Main Post and Response Post with reference to the week’s Learning Resources and other scholarly evidence in APA Style.

Week Nine: Multiple Regressions

Posted on: Friday, July 22, 2022 9:31:03 AM EDT

As social scientists, we frequently have questions that require the use of multiple predictor variables. Moreover, we often want to include control variables (i.e., workforce experience, knowledge, education, etc.) in our model. Multiple regression allows the researcher to build on bivariate regression by including all of the important predictor and control variables in the same model. This, in turn, assists in reducing error and provides a better explanation of the complex social world.

Example: a local school system is trying to mitigate poor attendance. The researchers may look at several, possible interventions. In the end, a study may find a combination of interventions will work better than any single one. This finding is a typical product of multiple regression. In addition, because combinations of data may need to combined, a researcher can infer. The word is a power word in social sciences as it empowers a researcher to synthesize and speculate based upon responsible use of data.

In the end, having concluded your analysis of a regression, what has been learned? In two sentences or less what can you share with others?

Frankfort-Nachmias, C., Leon-Guerrero, A., & Davis, G. (2020). Social statistics for a diverse society (9th ed.). Thousand Oaks, CA: Sage Publications.

Chapter 12, “Regression and Correlation” (pp. 401-457) (previously read in Week8)

Wagner, III, W. E. (2020). Using IBM® SPSS® statistics for research methods and social science statistics (7th ed.). Thousand Oaks, CA: Sage Publications.

Chapter 8, “Correlation and Regression Analysis”
Chapter 11, “Editing Output” (previously read in Week 2, 3, 4, 5. 6, 7, and 8)

Walden University, LLC. (Producer). (2016g). Multiple regression [Video file]. Baltimore, MD: Author.

Note: The approximate length of this media piece is 7 minutes.

In this media program, Dr. Matt Jones demonstrates multiple regression using the SPSS software.

Multiple Regression Models

Topic 2 of 4

Learning Objective:
Interpret regression results when the regression model has more than one predictor.

The Purpose of Control Variables

A control variable in a statistical model is a variable that we are attempting to “hold constant” while we examine the association among other variables in our model. In essence, we want to know if our independent variable of interest (e.g., grit) is associated with our dependent variable after factoring in other variables (e.g., personality factors) that could also be related to the dependent variable.

In addition to our independent variables of interest, in regression models, we also include control variables as predictor variables because we suspect the control variable is related to our outcome variable and could explain the association between our independent variable and the outcome.

For example, suppose we want to understand factors that might predict an individual’s income. Education level seems like an obvious predictor variable that we would want to examine as it is probably a predictor of income. Might there be other variables, however, that would predict income besides education? And, if we find that education level is associated with income, could it partially be because those with more education are also likely to be older and more accomplished/established and therefore earn more money? For this reason, we probably want to include age as a control variable in our regression model predicting income with education.

Take a look at the output below from SPSS, which shows the results of a regression model based on data from the 2004 General Social Survey (http://sda.berkeley.edu/archive.htm ). Use an alpha value of .05 to interpret the results.

Model Summary
A model summary showing the results of a regression model based on data from the 2004 General Social Survey. Highest year of school completed, age of respondent.

Model	R	R Square	Adjusted R Square	Std. Error of the Estimate
1	.256^a	.066	.064	2.276

Predictors: (Constant), HIGHEST YEAR OF SCHOOL COMPLETED, AGE OF RESPONDENT.

Interpreting the Regression Coefficient

So, how would we interpret the regression coefficient in this model for education level, if we are controlling for age? Researchers would say that holding age constant, education level has a weak, positive association with income, β = .21, p < .05. Recall that a positive association indicates that as education level increases, the income also increases. Another way to say the same thing is to say that education level predicts income above and beyond an individual’s age.

Recall, too, that we need to look at the p-value for each predictor in the model in order to discern whether the predictor shows a statistically significant association with the outcome variable and that we can use the standardized regression coefficients to gauge the effect size for each predictor. In our results below, we can see that each predictor, age, and education level is statistically significant as the p-value is less than the alpha value of .05.

Coefficientsa
Table of coefficients showing both unstandardized coefficients and standardized coefficients for age of respondent, highest year of school completed.

Model		Unstandardized Coefficients		Standardized Coefficients	t	Sig.
Model		B	Std. Error	Beta	t	Sig.
1	(Constant)	6.535	.534	blank	12.245	.000
	AGE OF RESPONDENT	.025	.007	.120	3.703	.000
	HIGHEST YEAR OF SCHOOL COMPLETED	.202	.031	.209	6.470	.000

Legend for Coefficients^a
	Standardized regression coefficient for age; the closer this value is to 1, the stronger the effect size.
	p-value for age
	Standardized regression coefficient for education level; the closer this value is to 1, the stronger the effect size.
	p-value for educational level

If you look at the standardized regression coefficients, you can see that each predictor shows a weak relationship with income, as each predictor has a standardized regression coefficient that is about .1 or .2; stronger effects would be indicated if the regression coefficients had values closer to 1. Of the two predictors, the education level has a greater value for its standardized regression coefficient, indicating that it is a stronger predictor of income than age.

R-squared

Aside from looking at the individual regression coefficients and the p-values, another thing to note when you are discussing your multiple regression results is the R-squared value. R-squared is an important statistic that tells you the proportion of variability in the dependent variable that is accounted for by your model. In other words, it tells you how good of a job your predictors are doing at predicting your outcome variable. The R-squared value ranges from 0 to 1 and can be expressed as a percent. In the output shown below, you can see that the R-squared value is .066.

Model Summary

Model	R	R Square	Adjusted R Square	Std. Error of the Estimate
1	.256^a	.066	.064	2.276

Predictors: (Constant), HIGHEST YEAR OF SCHOOL COMPLETED, AGE OF RESPONDENT.

Consider the following scenario when answering the question below.

Using the SPSS output above, and assuming an alpha level of .05, suppose we wanted to control for education level instead of age this time around.

Hint: Look at the p -value in the “sig.” column of the output for age. Is that value less than the alpha value of .05?

How would we interpret the results if we were interested in predicting income with age while controlling for education level?

Holding education level constant, an increase in age predicts an increase in income.

Education level is a stronger predictor of income than age, indicating that age is not related to income after controlling for education level

Age does not predict income after controlling for education level.

SUBMIT

TAKE AGAIN

How Predictors Are Related to the Dependent Variable?

The above question emphasizes the fact that regardless of whether the researcher is thinking of age or education level as the control variable, the mathematical interpretation of how the predictors are related to the dependent variable does not change. When we interpret the coefficient for one predictor in the model, it is always in the context of holding the other variables “constant,” regardless of which variable, conceptually, we are thinking of as a control variable.

Sometimes, researchers include multiple predictors in a model and are not thinking of any of them, conceptually, as control variables. They are simply interested in how the predictors, together, are related to the outcome variable, or they may be interested in seeing which predictor variables show the strongest relationships with the outcome.

Numbered divider 2

Consider the following scenario when answering the question below.

Suppose we wanted to predict the number of slices of pepperoni pizza people ate at a party based on how many slices their friends ate. Suppose we also gathered data on three (3) additional variables: individual’s mood, how much they like pepperoni, and how hungry they reported being when they arrived at the party. Take a look at the correlation results below from SPSS, which is based on fictitious data.

Correlations

Blank		number of slices	positive mood	friends’ number of slices	like pepperoni	hunger
number of slices	Pearson Correlation	1	.036	.791**	.806**	-.080
	Sign. (2-tailed)	blank	.864	.000	.000	.702
	N	25	25	25	25	25
positive move	Pearson Correlation	.036	1	-.106	.174	.238
	Sig. (2-tailed)	.864	blank	.613	.406	.253
	N	25	25	25	25	25
friends’ number of slices	Pearson Correlation	.791**	-.106	1	.638**	-.139
	Sig. (2-tailed)	.000	.613	blank	.001	.507
	N	25	25	25	25	25
like pepperoni	Pearson Correlation	.806**	-.174	.638**	1	-.198
	Sig. (2-tailed)	.000	.613	blank	.001	.507
	N	25	25	25	25	25
hunger	Pearson Correlation	-.080	.238	-.139	-.198	1
	Sig. (2-tailed)	.702	.253	.507	.343	blank
	N	25	25	25	25	25

**. Correlation is significant at the 0.01 level (2-tailed).

Hint: Take a look at whether each variable is associated with the outcome variable of number of slices. Is the association between the variable and number of slices statistically significant?

Which of these three (3) variables would be most logical to control for in the regression model?

Hunger

How much individuals reported liking pepperoni

Positive mood

SUBMIT

TAKE AGAIN

Numbered divider 3

In the output shown below, which is based on predicting income with age and education level, the R-squared is .066.

Model Summary

Model	R	R Square	Adjusted R Square	Std. Error of the Estimate
1	.256^a	.066	.064	2.276

Predictors: (Constant), HIGHEST YEAR OF SCHOOL COMPLETED, AGE OF RESPONDENT.

Hint: Remember that to convert a decimal to a percent, you will need to move the decimal point two places to the right.

Which of the following is the appropriate interpretation of the R-squared value?

Age and education level account for 6.6% of the variability in income.

Income accounts for 6.6% of the variability in age and education level

Age and education level account for 66% of the variability in income.

Income accounts for 66% of the variability in age and education level completed.

SUBMIT

TAKE AGAIN

CONTINUE

How to Create Dummy-Coded Variables

by Robin KouvarasRobin Kouvaras

Topic 2 of 5

Learning Objective:
Interpret regression models with dummy-coded variables.

How to Create Dummy-Coded Variables

Dummy-coded variables are created by only using the values of 0 and 1. The general rule used for dummy coding is that you need one (1) fewer dummy-coded variables than you have groups (# total groups – 1). So, for our variable of marital status, we would need two (2) dummy-coded variables because we have chosen to focus on three (3) marital status groups (3 – 2 = 1). The group for which we do not create a dummy-coded variable is typically called the reference category. Often the reference category will be the one that researchers want to compare to other groups. For our research, we might choose “married” as our reference category if we want to compare non-married individuals to married individuals.

Before we conduct our regression analyses in SPSS, then, we will need to create two (2) dummy-coded variables for marital status:

one variable for the divorced group
one variable for the never-married group

We will use a 1 to indicate membership to that category (e.g., to indicate that someone is divorced for the “divorced” dummy-coded variable) and 0 to indicate non-membership.

The table below shows how we would dummy-code our marital status variables.

Notice the Following

If the original value for an individual’s marital status is a 1 (indicating married), that individual would have a 0 for the “divorced” variable and a 0 for the “never married” variable. This is because they are not a “member” of either of these groups, they are not divorced, and they are not in the never-married category. This same logic holds for the remaining two (2) values of marital status. If an individual is divorced, they get a 1 for the divorced group, for example, and a 0 for the never-married group.

Also, note that each individual in the data set will have a value (either a 0 or a 1) for each dummy-coded variable that the researcher creates.

CONTINUE TO ACTIVITY

Interpreting the Coefficients for Dummy-Coded Variables

by Robin KouvarasRobin Kouvaras

Topic 3 of 5

Learning Objective:
Interpret regression models with dummy-coded variables.

How to Interpret Regression Results

Now that you are familiar with how to create dummy-coded variables, we will discuss how to interpret your regression results. Below is the SPSS output using the marital status groups to predict the frequency of religious attendance using multiple regression. Below the regression output, there is also the SPSS output that shows the mean for religious attendance for each of the marital status groups.

SPSS output using the marital status groups to predict the frequency of religious attendance using multiple regression.

Coefficientsa

Model		Unstandardized Coefficients		Standardized Coefficients	t	Sig.
Model		B	Std. Error	Beta	t	Sig.
1	(Constant)	4.328	.095	blank	45.627	.000
	Divorced	-1.239	.206	-.166	-6.009	.000
	Never Married	-1.190	.174	-.189	-6.825	.000

Legend for Coefficients^a
	p-value for the Never Married predictor variable.
	p-value for the Divorced predictor variable.

SPSS output that shows the mean for religious attendance for each of the marital status groups.

Descriptives
HOW OFTEN R ATTENDS RELIGIOUS SERVICES

Blank	N	Mean	Std. Deviation	Std. Error	95% Confidence Interval for the Mean		Minimum	Maximum
Blank	N	Mean	Std. Deviation	Std. Error	Lower Bound	Upper Bound	Minimum	Maximum
MARRIED	789	4.33	2.731	.097	4.14	4.52	0	8
DIVORCED	212	3.09	2.687	.185	2.73	3.45	0	8
NEVER MARRIED	332	3.14	2.484	.136	2.87	3.41	0	8
Total	1333	3.83	2.728	.075	3.69	3.98	0	8

Let’s focus on the unstandardized regression coefficients in the output. Each coefficient will indicate how that particular group compares to the reference category (e.g., married) on the dependent variable. The coefficient reflects the comparison between the mean value of the dependent variable for the reference category and the mean value for the group represented by that particular coefficient. For example, first, take a look at the unstandardized regression coefficient for “divorced” (-1.239). This value reflects how the divorced group compares to the married group on religious attendance and indicates that the mean religious attendance for the divorced group is 1.239 units lower than that for the married group.

A few more things about the output:

bullet

If you subtract the mean for divorced (3.09) from the mean for married (4.33), you can see that you get the absolute value of the coefficient for the divorced variable: 4.33 – 3.09 = 1.24. (If you round 1.239, you get 1.24.)

bullet

If the value had been positive (1.239 instead of -1.239), it would indicate that the divorced group had a higher mean than the married group on the dependent variable.

bullet

Similar to when you are interpreting the coefficients for continuous predictor variables in a regression model, the difference between the reference category and the indicated group is only considered to be statistically significant if the p-value is less than alpha. In our results above, if we assume an alpha of .05 (or even .01), each predictor would be statistically significant, indicating that each group (divorced, never married) differs from the reference category of married on the dependent variable.

bullet

Also similar to when you are interpreting the coefficients for continuous predictor variables in a regression model, you can use the absolute value of the standardized regression coefficients to gauge the effect size for each variable; values closer to 0 indicate weaker effects, and values closer to 1 indicate stronger effects.

[ad_2]

8210 WK9 DISCUSSION

Testimonials