Disclaimer: This is an example of a student written essay.
Click here for sample essays written by our professional writers.

This essay may contain factual inaccuracies or out of date material. Please refer to an authoritative source if you require up-to-date information on any health or medical issue.

Relationship Between Behavioral, Sociodemographic Factors and Body Mass Index in Adults

Paper Type: Free Essay Subject: Health
Wordcount: 10032 words Published: 8th Feb 2020

Reference this

Using Regression Analysis to Establish the Relationship Between Behavioral, Sociodemographic Factors and Body Mass Index in Adults.


Accelerated rates of urbanization, population conditions and general evolution in behavior patterns are altering epidemiological and demographic outlook throughout the world (Cournot etal., 2004). Body  mass index  (BMI)  is  used  as  an  indicator  of  weight gain. Some factors that drive the increase in adult BMI levels are components of unhealthy lifestyles. Lifelong behaviors are formed during the transition period between dependent and independent living, which occurs in the college years, and this should be the time to focus on creating healthy habits and eliminating unhealthy ones (Beerman, 1991). For example, advances in technology have led to a marked increase in screen time and sedentary behaviour among adolescents, which are often coupled with lower energy expenditure. Other risk behaviours that tend to emerge in adolescence, such as alcohol consumption and cigarette smoking have also been linked to an increase in percent body fat, overweight, and obesity (Pasch et al., 2012). Research has shown that individuals with a higher BMI are more likely to experience obesity-related health problems.

Get Help With Your Essay

If you need assistance with writing your essay, our professional essay writing service is here to help!

Essay Writing Service

Concern about obesity among older adults is growing, and research to examine behaviors associated with risk for increased weight in this population is needed. Because of the growth of the aging population and a rise in the prevalence of overweight and obesity, modifying risk factors for and consequences of excess weight in older adults is critical (Kruger et al., 2008). We examined certain variables associated with body mass index. We hypothesized that certain behavioral and sociodemographic factors can cause an increase in BMI for adults in the United States. We performed model selection for a linear regression model to identify the behavioral and sociodemographic variables that are associated with our health outcome of interest and provided interpretation for our model based on our analysis. The aim of this project is to use regression analysis to establish the relationship between behavioral and sociodemographic variables on body mass index in adult population aged 18 years or older in the United States.

Data Characteristics

This project utilised the Behavioral Risk Factor Surveillance System (BRFSS) data collected from the Center for Disease Control website (www.cdc.gov/brfss). The Behavioral Risk Factor Surveillance System (BRFSS) data is a cross-sectional telephone survey conducted by state health departments in all 50 states in the United States, the District of Columbia, Puerto Rico, and the U.S. Virgin Islands (Laflamme and Vanderslice, 2004). The BRFSS questionnaires are developed by CDC and collects prevalence data amongst US residents regarding risk behaviors and preventive health practices that can affect their health status. The BRFSS primarily collects data on chronic diseases, injuries, infectious illnesses, and the behavioral factors underlying these conditions (Figgs et al. 2000). Respondent data are forwarded to CDC to be aggregated for each state, returned with standard tabulations, and published at year’s end by each state. The BRFSS data is available for public use.

For this linear regression model, we utilized the BRFSS data for the United States collected in the year 2016. Our target population is adults aged 18 years or older in the United States. Specifically, our study seeks to answer the following two major questions:

a) What health behavior and socio-demographic variables are predictive of increased body mass index among adults in the United States?

b) How strong are health behaviour and socio-demographic variables at predicting body mass index among adults in the United States?

Using the BRFSS codebook, we selected a dependent variable and twenty(20) independent variables.

Dependent Variable: Body mass index (BMI) measured by dividing a person’s weight in kilograms by their height in metres. The unit of measurement for BMI is Kg/m2.

Independent variables: (1) Age (2) Weight in pounds (3) Education Level (4) Income Level (5) Sleep Pattern/Time (6) Smoking status (7) Use of  E-cigarettes (8) Race/Ethnicity (9) Physical Activity (10) Asthma Status (11)Health Care Coverage (12) Heavy Alcohol Consumption (13) Health Literacy (14) Access to Medical Advice (15) Routine Medical Checkup (16) Coronary Heart Disease/Myocardial Infarction (17) Diabetes status (18)Kidney Disease (19) Depressive Disorder (20) Chronic obstructive Pulmonary Disease.

Treatment of Data

The original data obtained from BRFSS contained variables with some missing values identified using the codebook. The twenty independent variables were screened for missing values and none had missing values that were more than 10% of the observations in the variable. We identified the numerical codes for missing values using the codebook and assigned missing value codes recognized by STATA. We conducted univariate analysis on the dependent and independent variables in order to describe and find patterns in our data set. The independent variables in the data were renamed for ease of interpretation and we also constructed histograms for each independent variable to examine their frequency distributions and to check for skewness. All variables showed approximately normal distributions. We created another variable for our dependent variable, BMI, by dividing the observations by 100 to get the measurements for BMI in two decimal places. We assigned label values to the different levels or categories of the independent variables. We generated summary statistics for the independent variables using STATA (version 15.1; Stata Corporation, College Station, TX) to examine the mean and standard deviations of the all variables.


Regression Model Building Process

Bivariate Analysis

We examined the relationship between the dependent variable and each of the independent variables. For the continuous independent variable, weight, we constructed a scatter plot and lowess curve with the dependent variable (BMI) to check for departure from the linearity assumption, after taking account of the other predictors in the model. The sleep time variable was treated as a continuous variable and scatter plot with lowess curve was constructed to check for departure from the linearity assumption.  The lowess curve showed a linear relationship after using the localized data and mean hence, the linear fit was satisfactory. The two way scatter plot of bmi and sleep time showed that the homoscedasticity assumption was not met. For the weight variable, we observed that the lowess curve was approximately a straight line. This indicated that the linearity assumption between weight and BMI is satisfied.

Box plots constructed for age, income, race/ethnicity, education, heavy alcohol consumption, smoking status, health care coverage, medical advice and health literacy variables, showed approximately equal variances among the categories for each variable. Box plots of  e-cigarette usage, physical activity, current asthma status, myocardial infarction, kidney disease, diabetes status and coronary heart pulmonary disease showed approximately unequal variances among the categories for each variable. Hence, these variables did not meet the homoscedasticity assumption. We fitted a regression model for each of the independent variables and BMI and saved the number of observations (N), F value, p-value and R square.


Table 2. Summary of bivariate analysis showing number of observations (N) for each variable, coefficient of determination (R-square), F value and p value.

Model          N  Coefficient             R-square   F value                 p value

Age                     4647   -0.15     0.0264   25.19  <0.0001

Weight in pounds          4644         -0.12            0.7643             13657.20       <0.0001

Education Level     4639   -0.43      0.0054   8.46  <0.0001

Income Categories     3759    -0.47     0.0129   12.22  <0.0001

Sleep Pattern/Time    3222  -0.23             0.0042   13.51    0.0002

Smoking Status     4546                 -1.09       0.0042   19.01  <0.0001

 E-cigarette Usage            4560                   -0.01               0.000                0.00                         0.98

Race/Ethnicity     4602                 -0.99            0.0381   45.48  <0.0001

  Physical Activity      4635                  1.82      0.0166    78.26                 <0.0001

Current Asthma Status    4616                 2.05     0.0085   39.70  <0.0001

 Health Care Coverage        2883                  -0.45              0.0005               1.37              0.24

Heavy Alcohol Cons.    4446                     -1.54       0.0025   11.17    0.0008

Health Literacy     4150                 0.27          0.0018   2.50    0.0581

Medical Advice           4388                 -0.006             0.0003             0.35           0.8412

Routine Checkup     4541                    -0.32        0.0016   2.48    0.0594

Coronary Heart Disease          4583                -0.52       0.0007   3.19    0.0740

Diabetes Status     4639                -1.86       0.0513   250.18  <0.0001

Kidney Disease     4625                -1.13        0.0012   5.61    0.0179

Depressive Disorder    4625                    -1.35       0.0076   35.24                 <0.0001

COPD      4630                -1.023       0.0026   12.01    0.0005


Through our bivariate analysis, we decided to discard any variables that had univariate p-values greater than 0.25. The variables highlighted above were deleted. The main purpose for doing this is to rule out confounding of the association between the predictor of primary interest and the outcome as much as possible. We observed that the R square and F value for the weight variable is extremely large compared to the other independent variables. This suggests that weight is a strong predictor of BMI. We decided to eliminate the weight variable because it is essentially an alternative measure of body mass index.

Variable Selection

For our model, we used the forward stepwise selection method and specified our p value for entry as 0.15; p.e(.15) while our p value for removal was specified as 0.20; p.r(0.20). We want to explain the data in the simplest way by eliminating any redundant predictors because they will complicate the estimation of other quantities that we are interested in. The result from our forward stepwise selection is summarized in the table below.

Table 3. Summary of forward stepwise variable selection selection showing parameter estimates, standard error estimates, 95% confidence interval and p value.

Variable Code  Scale   Coefficient     Standard error estimate       p value  95% Confidence Interval


35 – 44 yrs     3      2.35   0.46                          < 0.0001    1.45                3.24 

45 – 54 yrs          0.68  0.41         <0.0001  -0.12         1.48 

> 65 yrs        -1.35                    0.34                                    <0.0001                 -2.02              -0.68

Education Level           

Did not graduate high school   3     0.67                          0.34                                     0.051                    -0.002             1.34

Graduated high school      -0.35                    0.38                                     0.343*                    -1.09               0.38 

Income Level      

$15-25k     2                 -0.57                          0.37                                     0.128                     -1.30              0.16

>$50k+    5                  -0.81                          0.35                                      0.021                     -1.50             -0.13

Sleep Pattern/Time                                        -0.20                         0.08                                      0.007                     -0.35             -0.06                            Smoking Status                                                -2.09                         0.38                                     <0.0001                   -2.82             -1.35

Race/ Ethnicity    

Black only, Non-Hispanic 2                   2.23                          0.33                                    <0.0001                   1.59               2.87

Other race only, Non-Hispanic 3                  -1.77                         1.03                                       0.086                    -3.78              0.25

Physical Activity                                         1.39                          0.31                            <0.0001                  0.79              2.01  

Heavy Alcohol Consumption                          -1.71                          0.65                                      0.008                    -2.97             -0.44

Diabetes                                                            -1.67                          0.18                                    <0.0001                  -2.01             -1.32

Depressive Disorder                                        -0.73                          0.34                                      0.031                    -1.39             -0.07       Chronic Obstructive Pulmonary Disease    -0.82                   0.44                                      0.067   -1.69              0.06

Due to the large p value for education level for those who graduated high school, we performed a partial F-test for education level while holding the values of other variables constant. The F-test produced a p value of  0.4513, hence, we deleted the covariate education level from the model.

Covariates for final model: age, income level, sleep pattern/time, smoking status, race/ethnicity, physical activity, diabetes status, heavy alcohol consumption, depressive disorder and chronic obstructive pulmonary disease.


Model Checking

We created a histogram, kernel density plot and overlaid a normal plot for the model residuals in order to check whether the residuals are normally distributed. From our histogram/kernel density/normal graph, we observed that the distribution of the residuals looked bell-shaped and approximately normally distributed. From the Q-Q plot, we observed that the residuals slightly deviated from the inverse normal quantile line in the tails ( left and right edges). However, since the departure from normality was not too severe and the sample size was large, we did not transform the model. We computed the variance inflation factor (VIF) to check for mulitcolinearity among our independent variables. The lowest VIF was 1.02, the highest was 5.65 and the mean VIF was 1.99. None of the VIFs were greater than 10 so we concluded that there was no issue of  multicollinearity in the model. We constructed a component plus residual plot to check for linear relationship between BMI and sleep time (continuous variable) after controlling for other variables. From the cpr plot, we observed that there was a small amount of divergence of the lowess curve from the linear fit. We concluded that the linear assumption was not met for the model containing the continuous predictor sleeptime.We addressed this issue of non-linearity by centering BMI on the sample mean and generating a quadratic term. The linear and quadratic terms in the centered BMI were both significant (p < 0.001). A new cpr plot showed that the quadratic fit was  an improvement on the model with a linear term on BMI. We also checked for homoscedasticity and outliers using scatter plot and residual vs predictor plot respectively. The model met the homoscedasticity assumption and we identified no outliers in the model. We checked for interactions between variables age and diabetes status; smoking and chronic obstructive pulmonary disorder; age and physical activity; depressive disorder and sleep time. Significant interactions were observed between: age and diabetes (p < 0.001); age and physical activity (p=0.004). Finally, we computed the linear regression analysis using the variables in our final model. The table below shows the unadjusted analysis (bivariate analysis) and the adjusted analysis (multivariable analysis) result of body mass index with the covariates. Significant variables are highlighted in the table below.


Table 4: Unadjusted and adjusted analysis of body mass index with behavioral and sociodemographic variables


Covariates                  Coefficient   p value    95% C.I   Covariates                  Coefficient    p value    95% C.I

Intercept                    38.67    <0.0001     35.71   41.65

Age           Age               0.15       <0.0001   -0.27  -0.029

18-24                      Reference

25-34                                   1.11        0.125       -0.31      2.53  

35-44                                   3.27           <0.0001       1.88      4.66

45-54                    1.42              0.037        0.09      2.76

55-64                                   0.96              0.151       -0.35      2.26

65 or older                               -0.57              0.379       -1.84      0.70

Race/Ethnicity        Race/Ethnicity                 -0.99        <0.0001    0.71    1.26    

white only, non-hispanic                    Reference 

black only, non-hispanic   2.13     <0.0001        1.52     2.74

other race only, non-hispanic  -1.89        0.044        -3.72   -0.05

multiracial, non-hispanic   0.46        0.652        -1.55    2.47

Hispanic     0.62        0.661        -2.14    3.38

Income         Income              -0.47      <0.0001    -0.61   -0.33     

less than $15,000                     Reference

$15,000 to less than $25,000    -0.54        0.198         1.52    2.74

$25,000 to less than $35,000   -0.22             0.661       -3.72    -0.05

$35,000 to less han $50,000   -0.0004        0.999       -1.55     2.47 

$50,000 or more    -1.02        0.015       -2.14     3.38

Heavy Alcohol Consumption                Heavy Alcohol Consumption  -1.54     0.001     -2.44   -0.64

Yes                     Reference

No     -0.95            0.104        -2.09     0.19

Smoking Status        Smoking Status              -1.09      <0.0001   -1.57   -0.59

Smoker                    Reference

Non-smoker                   -2.10     <0.0001     -2.78    -1.42

Sleeptime    -0.17       0.013       -0.31    -0.04  

Physical activity        Physical Activity               1.82       <0.0001    1.42    2.22

Yes                    Reference

No      1.34    <0.0001       0.78     1.91

Diabetes Status        Diabetes Status                -1.86      <0.0001    -2.09    1.63

Have diabetes                   Reference

Do not have diabetes   -1.72     <0.0001      -2.04  -1.39

Depressive Disorder       Depressive Disorder         -1.35      <0.0001   -1.80   -0.91

Yes                    Reference

No                                  -0.82            0.009        -1.44   -0.20

Chronic Obstructive        Chronic Obstructive        -1.02         0.001    -1.60   -0.44

Pulmonary Disorder       Pulmonary Disorder

Yes                   Reference

No                                  -0.72             0.081       -1.52     0.09

Age_Diabetes                           0.55         <0.0001        0.34     0.76       

Age_Physical Activity             0.12        < 0.0001       0 .06    0.18


Final Results Interpretation

Final Model:

BMI =  0 + i.age * 1 + i.income * 2 + sleeptime * 3 + smoking status * 4 + i.race * 5 + physical activity * 6 + diabetes status * 7 + heavy alcohol consumption * 8 + depressive disorder * 9+ chronic obstructive pulmonary disorder * 10+ (age*diabetes) * 11 + (age*physical activity) * 12 + e

Results Interpretation:

Unlike the bivariate analysis, our final model takes into account the association of all the variables simultaneously. Our final model predicts that adults aged 18-24 years, who are diabetic with low income, consuming heavy amount of alcoholic beverages and smoking, with limited sleep time have an expected BMI of 38.67 kg/m2 ( 95% confidence interval: 36.44– 41.35) kg/m2. We also find that those who have had no physical activity in the last 30 days have a higher body mass index than those who have, holding other variables constant. The bivariate analysis provides some insight into the variables that may be important indicators of sociodemographic and behaviors. For example, weight appears to be  an important variable, in that it explains about 76.4 % of the variation in body mass index. With ethnicity, the coefficient ( ) 2.13 for non-hispanic, black only, indicates that  the body mass index for this race is higher than in non-hispanic whites, while those for multiracial non-hispanics and hispanics report no difference. With income status, those who earn $50,000 or more have a significant reduction in body mass index (p = 0.015) while body mass index for those who earn below $50,000 was not significant. Non-smokers also show a significantly lower body mass index (p < 0.0001) compared to those who smoke, after controlling for other variables. Also, it appears that a 1 hour change in sleep time reduces body mass index by 0.17 which is significant at (p = 0.013) level. No heavy alcohol consumption reports a p-value = 0.104 indicating that it is not statistically significant to increase in body mass index compared to the reference group of those who report heavy alcohol consumption. Not surprisingly, those who do not have diabetes have a lower body mass index than those who have diabetes (reference group). Absence of depressive disorder show a significantly lower (p = 0.009) compared to the reference group. Adults with chronic obstructive pulmonary disorder in indicated no statistically significant relationship to increased body mass index (p= 0.081). Our final model including age, income level, sleep pattern/time, smoking status, race/ethnicity, physical activity, diabetes status, heavy alcohol consumption, depressive disorder and chronic obstructive pulmonary disease explains 14.2% of the variability in body mass index in adults aged 18 years or older in the United States, F(20, 2411)= 19.96, p < 0.001.


Summary and Concluding Remarks

In this study, we hypothesized that certain behavioral and sociodemographic factors are predictive of body mass index in adults in the United States. In considering age and its relation to BMI, both the 35-44 and 45-54 year age group showed higher levels of body mass index when compared to the younger adults ages 18 – 24. This could be related to the fact that younger adults tend to be more physically active and hence, higher energy expenditure which result in lower body mass indices. With respect to education level, even though study participants who had graduated high school had a higher education level than participants who did not graduate, our data showed that having a high school education was not a significant variable in predicting BMI levels in adults. Furthermore, when considering ethnicity, studies have shown that black adults show a higher body mass index to than all other ethnicity. We know that genetics play a huge role in the health status of african american adults as it leads them to be more susceptible to health issues such as high blood pressure, diabetes, gout, high cholesterol levels, etc. For instance, all other races not black, showed lower body mass index and for hispanics and multiracial, non-hispanics, this relationship with increased body mass index was not statistically significant. Smoking cigarettes, as a negative health behavior, also provides a strong indicator of increased body mass index, which is consistent with research findings. The quality of data provided for this study was extensive. Selecting between which variables to use be they continuous or categorical, in a big data set caused us to establish a hierarchical structure through eliminating missing information, inconsistencies, noise and various other errors. By applying these data cleaning methods,it improved the consistency and quality of the variables in our dataset. Some limitations faced could be attributed to data collection process and variables selected for analysis.

Find Out How UKEssays.com Can Help You!

Our academic experts are ready and waiting to assist with any writing project you may have. From simple essay plans, through to full dissertations, you can guarantee we have a service perfectly matched to your needs.

View our services

To summarize the implications of this study’s findings, Table 3 estimates that  an individual 35-44 years of age, who is white and has an annual income above 50,000, who doesn’t smoke or have a high alcohol consumption level, and is physically active would have a body mass index of 37.87. Compared to this individual, a person of the same race and the same income, but who smokes, has a heavy alcohol consumption and is not physically active, will have a body mass index of 42.26; there is a 10% increase in BMI level based upon those six variables. This provides some perspective into the budding impact of sociodemographic and behavioral factors on body mass index and in determining the quality of life and health status of adults in the United States. Hence, our model denotes the effectiveness of behavioral and sociodemographic variables in predicting body mass index and health issues in adults.




  • Beerman KA. Variation in nutrient intake of college students: a comparison by student’s residence. J Am Dietetic Assoc. 1991;91(3):343–344. [PubMed] [Google Scholar]
  • Cournot, M., Ruidavets, J et., al (2004). Environmental factors associated with body mass index in a population of Southern France. “European Journal of Cardiovascular Prevention & Rehabilitation”. New York.  11(4), 291-297.
  • Figgs LW, Bloom Y, Dugbatey K, Stanwyck CA, Nelson DE, Brownson RC. Uses of behavioral risk factor surveillance system data, 1993–1997. Am J Public Health. 2000;90:774–776.
  • Kruger, J., Ham, S. A., & Prohaska, T. R. (2008). Behavioral risk factors associated with overweight and obesity among older adults: the 2005 National Health Interview Survey. Preventing chronic disease, 6(1), A14.
  • Pasch KE, Velazquez CE, Cance JD, Moe SG, Lytle LA. Youth substance use and body composition: does risk in one area predict risk in the other? Journal of Youth and Adolescence. 2012. January 1;41(1):14–26. doi: 10.1007/s10964-011-9706-y [PMC free article] [PubMed] [Google Scholar]







Cite This Work

To export a reference to this article please select a referencing stye below:

Reference Copied to Clipboard.
Reference Copied to Clipboard.
Reference Copied to Clipboard.
Reference Copied to Clipboard.
Reference Copied to Clipboard.
Reference Copied to Clipboard.
Reference Copied to Clipboard.

Related Services

View all

DMCA / Removal Request

If you are the original writer of this essay and no longer wish to have your work published on UKEssays.com then please: