Data Analysis Using Epi Info TM 7: Hypothesis Testing and Regression

In Week Two you became familiar with the YRBS dataset by completing some basic data analysis on three research questions. That assignment helped you to become familiar with your data and to start to understand some preliminary results which may be concluded based on your data.  In Week Three we are going to take the data one step further and do some hypothesis testing using logistic regression. Upon completion of this assignment you will be able to utilize these skills with the topic you choose for your Final Paper.

Assignment Instructions:
Utilize the Epi InfoTM 7 Quick Start Guide, v.0.2.2 as a resource and complete the tasks below.

First, you will need to open your saved canvas file:

    1. Launch the Epi Info TM 7 program that you saved to your computer in the beginning of the Week Two assignment and, from the Menu screen, select Visual Dashboard.
    2. Retrieve the Epi Info TM 7 canvas file that you saved for the Week Two assignment. The dataset should already be saved within this file.

Next, you will need to recode your dependent variable:

    1. When doing advanced statistics, sometimes you have to recode your data so that your statistical software is able to recognize which variables make up the reference group. Epi Info TM 7 requires that when you are using a logistic regression statistical test, your outcome/dependent variable must be coded as a 1 or 0 or as yes/no. For the purposes of this assignment, we are going to recode our dependent variable, qn48 (marijuana use in the past 30 days), as 1 and 0.
    2. To do this, locate the “Defined Variables” feature on the left-hand side of the Visual Dashboard canvas, place your mouse over it and a box should emerge. Click on New Variable, then click on With Assigned Expression. A new box titled, “Add Variable with Expression” should appear. In this box, under “Assigned Field”, enter 2-qn48. Under “Data type”, choose Numeric. Click OK and the box will disappear. This creates a new variable titled qn48_recoded.

      You may want to check to see what you’ve done. To do this, access the “Analysis” gadget and select the Frequency of qn48 and of qn48_recoded. Compared to qn48, the 1’s (yeses) are unchanged, but the 2’s (noes) in qn48 have been recoded to 0’s in qn48_recoded. You should use qn48_recoded as your outcome variable (dependent variable) in all of the subsequent logistic regressions in this assignment.

      EpiInfo is quite flexible, therefore you could also recode qn48 using the Defined Variables > New Variable > With Recorded Value option. You may want to explore this option if algebra is not your forte. If you recall from the Week Two assignment, we used the expression “2-qn48” when creating a new variable. Whenever a qn48 value is 1, 2-qn48 also equals 1, so the 1 value remains unchanged. However, when a qn48 value is 2, 2-qn48 equals 0, so the 2 values have a 0 value with the new variable.

Now, you will use advanced statistics to obtain the logistic regression output for your outcome variable:

    1. Right click on the blank center canvas screen, hover over the “Add Analysis Gadget,” hover over “Advanced Statistics,” and select Logistic Regression from the menu.
    2. Using your recoded outcome variable of “marijuana use” (qn48_recoded), determine which demographic variables are significant predictors of marijuana use. Use the following variables as your demographic variables:
      • Age (q1)
      • Gender (q2)
      • Grade (q3)
      • Race (RACEETH)
    3. Once you have a logistic regression table, you can export the output to Excel by right-clicking with your mouse cursor on a blank (white) portion of the canvas screen, then choose the option “Send output to > Microsoft Excel.” From Excel, copy and paste each table output that you have created into one tab within the same Excel file. The tab should be labeled Wk3_LogisticRegression. Save the entire Excel file as Firstname_Lastname_Week 3_Assignment.

Additionally, you will analyze risk factors in conjunction with your outcome variable:

    1. Using the controls of gender (q2), grade (q3), and race (RACEETH), determine if cigarette smoking (qn31) is a risk factor of marijuana use. That is, run a logistic regression with gender, grade, RACEETH, and cigarette smoking as your independent variables.
    2. Use the following additional variables for other risky behaviors: sexual behavior (qn60), alcohol use (qn43), and ecstasy (qn55). Determine whether they are associated with marijuana use (qn48). Use the same controls as listed above (gender, grade, and race).
    3. To start, run your models separately, one each for sexual behavior (qn60), alcohol use (qn43), and ecstasy (qn55), respectively, while controlling for the demographic variables (gender, grade, and race).
    4. Then, run one model with all three variables (sexual behavior, alcohol use, and ecstasy) and the demographic variables (gender, grade, and race).
    5. Once you have your data tables, you can export the output to Excel by right-clicking with your mouse cursor on a blank (white) portion of the canvas screen, then choose the option “Send output to > Microsoft Excel.” From Excel, copy and paste each table output that you have created into one tab within the same Excel file. The tab should be labeled Wk3_RiskFactors. Save the entire Excel file as Firstname_Lastname_Week 3_Assignment.

Finally, address the following as you summarize your results:

      • What was your null and alternative hypothesis for each research question? The research questions were: “
        • What demographic variables are predictors of marijuana use?” (Step #6) “
        • Is cigarette smoking a risk factor for marijuana use?” (Step #8) “
        • Are other risky behaviors associated with marijuana use?” (Steps #10-11)
      • Summarize the results of your logistic models.
      • Based on these results, should you accept or reject your null hypotheses?
      • From Steps #10-11, compare your three separate models to the model that included all of the variables. If any differences were found between the models, summarize the differences you saw in the results.
      • Is it best to use separate models or one model with all the variables included? Justify your answer.

       

      Your assignment must be at least two to three pages (excluding title, reference, and analysis output pages) and formatted according to APA style