Why n 1 dummy variables




















The remaining material assumes familiarity with topics covered in previous lessons. Specifically, you need to know:. The first task in our analysis is to assign values to coefficients in our regression equation. Excel does all the hard work behind the scenes, and displays the result in a regression coefficients table:. For now, the key outputs of interest are the least-squares estimates for regression coefficients. They allow us to fully specify our regression equation:.

This is the only linear equation that satisfies a least-squares criterion. That means this equation fits the data from which it was created better than any other linear equation.

The fact that our equation fits the data better than any other linear equation does not guarantee that it fits the data well. We still need to ask: How well does our equation fit the data? To answer this question, researchers look at the coefficient of multiple determination R 2.

When the regression equation fits the data well, R 2 will be large i. Luckily, the coefficient of multiple determination is a standard output of Excel and most other analysis packages. Here is what Excel says about R 2 for our equation:. The coefficient of muliple determination is 0. Translation: Our equation fits the data pretty well. At this point, we'd like to assess the relative importance our independent variables. We do this by testing the statistical significance of regression coefficients.

Before we conduct those tests, however, we need to assess multicollinearity between independent variables. If multicollinearity is high, significance tests on regression coefficient can be misleading. But if multicollinearity is low, the same tests can be informative. To measure multicollinearity for this problem, we can try to predict IQ based on Gender. That is, we regress IQ against Gender. The resulting coefficient of multiple determination R 2 k is an indicator of multicollinearity.

When R 2 k is greater than 0. For this problem, R 2 k was very small - only 0. Given this result, we can proceed with statistical analysis of our independent variables.

With multiple regression, there is more than one independent variable; so it is natural to ask whether a particular independent variable contributes significantly to the regression after effects of other variables are taken into account. The answer to this question can be found in the regression coefficients table:.

The regression coefficients table shows the following information for each coefficient: its value, its standard error, a t-statistic, and the significance of the t-statistic. In this example, the t-statistics for IQ and gender are both statistically significant at the 0. This means that IQ predicts test score beyond chance levels, even after the effect of gender is taken into account.

You could also create dummy variables for all levels in the original variable, and simply drop one from each analysis. In order to create these variables, we are going to take 3 of the levels of "year of school", and create a variable corresponding to each level, which will have the value of yes or no i. In this instance, we can create a variable called "sophomore," "junior," and "senior.

Interpreting results The decision as to which level is not coded is often arbitrary. The level which is not coded is the category to which all other categories will be compared. As such, often the biggest group will be the not- coded category. For example, often "Caucasian" will be the not-coded group if that is the race of the majority of participants in the sample. In that case, if you have a variable called "Asian", the coefficient on the "Asian" variable in your regression will show the effect being Asian rather than Caucasian has on your dependant variable.

In our example, "freshman" was not coded so that we could determine if being a sophomore, junior, or senior predicts a different depressive level than being a freshman. Whenever you have a regression model with dummy variables, you can always see how the variables are being used to represent multiple subgroup equations by following the two steps described above:. We send an occasional email to keep our users informed about new developments on Conjoint.

You can always unsubscribe later. Your email will not be shared with other companies. Request consultation. Looking for a free online survey tool? Get started for free. Trochim hosted by Conjoint. Free Survey Tool Fully-functional online survey tool with various question types, logic, randomisation, and reporting for unlimited number of responses and surveys.

Start now View details. List of articles Back.



0コメント

  • 1000 / 1000