Objective Explore the implementation of generalized estimation equations (GEE) and mixed linear models (MLM) in longitudinal data analysis using Python software, and expand its application in statistical analysis.
Methods GEE and MLM were constructed by Python software to explore the impact of PM2.5 on lung function (forced expiratory volume in1second, FEV1) with an example of environmental epidemiology, and compared with the results of R software.
Results With PM2.5 increases of 1 μg/m3, the FEV1 of the subjects decreased by 8 mL after 2 days. Python software can use a statsmodels library to analyze MLM and GEE, and the program language is concise, the program logic has a certain similarity when compared with R, the calculation results of parameter estimation and confidence interval are almost the same, and the Python result is reliable.
Conclusion Python software can flexibly construct MLM and GEE, which has a certain reference value in practical research.
Please download the PDF version to read the full text:
download
1.周婷, 兰蓝, 邱建青, 等. GEE、GLMM和MLM分析卫生重复测量资料的效果比较[J]. 现代预防医学, 2017, 44(16): 2881-2885, 2899. [Zhou T, Lan L, Qiu JQ, et al. Effect comparison of GEE, GLMM, and MLM in analyzing repeated measures of health-related field[J].Modern Preventive Medicine, 2017, 44(16): 2881-2885, 2899.] DOI: CNKI:SUN:XDYF.0.2017-16-001.
2.陈樱, 黄碧芬, 郑建清, 等. 基于SAS软件的混合效应模型实现重复测量数据的Meta分析[J]. 中国循证医学杂志, 2019, 19(8): 998-1005. [Chen Y, Huang BF, Zheng JQ, et al. Meta-analysis of repeated measurement data based on mixed effects model of SAS software[J]. Chinese Journal of Evidence-Based Medicine, 2019, 19(8): 998-1005.] DOI: 10.7507/1672-2531.201902004.
3.特维斯克. 实用流行病学纵向数据分析方法(第2版)[M]. 北京: 人民卫生出版社, 2016. [Twisk. Applied longgitu-dinal data analysis for epidemiology (Version 2)[M]. Beijing: People's Medical Publishing House, 2016.]
4.平凯珂, 陈平雁. Python与R语言联合应用的实现[J]. 中国卫生统计, 2017, 34(2): 358-360. [Ping KK, Chen PY. Realization of joint application of Python and R language[J]. Chinese Journal of Health Statistics, 2017, 34(2): 358-360.] http://med.wanfangdata.com.cn/Paper/Detail?id=PeriodicalPaper_zgwstj201702054.
5.杨俊秀, 赵文来, 姚青, 等.高频电子线路实验数据的Python处理分析[J]. 实验技术与管理, 2021, 38(10): 227-231, 240. [Yang JX, Zhao WL, Yao Q, et al. Experimental data processing and analysis of high frequency electronic circuit by Python[J]. Experimental Technology and Management, 2021, 38(10): 227-231, 240.] DOI: 10.16791/j.cnki.sjg.2021.10.041.
6.周江杰, 王胜锋, 李立明. Python爬虫技术在信息流行病学中的应用[J]. 中华流行病学杂志, 2020, 41(6): 952-956. [Zhou JJ, Wang SF, Li LM. Application of Python web crawler technology in infodemiology[J]. Chinese Journal of Epidemiology, 2020, 41(6): 952-956.] DOI: 10.3760/cma.j.cn112338-20190901-00643.
7.朱雪宁. R语言: 从数据思维到数据实战[M]. 北京: 中国人民大学出版社, 2018. [Zhu XN. Playing with R: data thinking to practice[M]. Beijing: China Renmin University Press, 2018.]
8.朱玉, 王静, 何倩. 广义估计方程在SPSS统计软件中的实现[J]. 中国卫生统计, 2011, 28(2): 199-201. [Zhu Y, Wang J, He Q. Implementation of generalized estimating equation in SPSS statistical software[J]. Chinese Journal of Health Statistics, 2011, 28(2): 199-201.] DOI: 10.3969/j.issn.1002-3674.2011.02.031.
9.Zeger SL, Liang KY, Albert PS. Models for longitudinal data: a generalized estimating equation ap-proach[J]. Biometrics, 1988, 44(4): 1049-1060. DOI: 10.2307/2531734.
10.李洪艳, 谭珊, 高晓, 等. 基于广义估计方程的婴儿超重的影响因素分析[J]. 中国卫生统计, 2016, 33(2): 222-225, 230. [Li HY, Tan S, Gao X, et al. A study about the influence factors of infant overweight based on generalized estimating equation[J]. Chinese Journal of Health Statistics, 2016, 33(2): 222-225, 230.] DOI: CNKI:SUN:ZGWT.0.2016-02-011.
11.陈博文, 赵培信, 唐新蓉, 等. 线性混合效应模型的有效稳健经验似然推断[J]. 应用数学, 2020, 33(4): 886-893. [Chen BW, Zhao PX, Tang XR, et al. Efficient and robust empirical likelihood inference for linear mixed effects models[J]. Mathematica Applicata, 2020, 33(4): 886-893.]DOI: 10.13642/j.cnki.42-1184/o1.2020.04.008.
12.高莹, 沈玉, 周慧婵, 等. 个体PM2.5暴露与老年人肺功能关系[J]. 环境卫生学杂志, 2019, 9(5): 482-488. [Gao Y, Shen Y, Zhou HC, et al. Relationship between individual PM2.5 exposure and lung function in the el-derly[J]. Journal of Environmental Hygiene, 2019, 9(5): 482-488.] DOI: 10.13421/j.cnki.hjwsxzz.2019.05.014.
13.杨珉, 李晓松. 医学和公共卫生研究常用多水平统计模型[M]. 北京: 北京大学医学出版社, 2007. [Yang M, Li XS. Multilevel statistical models commonly used in medical and public health research[M]. Beijing: Peking University Medical Press, 2007.]
14.庄东辰,茆诗松. 混合系数线性模型的参数估计[J]. 应用概率统计, 1996, (1): 81-87. [Zhuang DC, Mao SS. Pa-rameter estimation of mixed coefficient linear model[J]. Chinese Journal of Applied Probability and Statistics, 1996, (1): 81-87.] DOI: CNKI:SUN:YYGN.0.1996-01-011.
15.余松林. 混合线性模型的应用[J]. 中国医院统计, 2006, 13(1): 70-75. [Yu SL. Application of mixed linear mod-el[J].Chinese Journal of Hospital Statistics, 2006, 13(1): 70-75.]DOI: 10.3969/j.issn.1006-5253.2006.01.030.
16.Wallis, Sean. z-squared: the origin and application of χ2[J]. Journal of Quantitative Linguistics, 2013, 20(4): 350-378. DOI: 10.1080/09296174.2013.830554.
17.智冬晓, 许晓娟, 张皓博. z检验与t检验方法的比较[J]. 统计与决策, 2014, (20): 31-34. [Zhi DX, Xu XJ, Zhang HB. Comparison of Z-test and t-test[J]. Statistics and Decision, 2014, (20): 31-34.] DOI: 10.13546/j.cnki.tjyjc.2014.20.007.
18.陈伟,勾东升,徐发亮.基于文本数据分析的大数据审计方法研究[J].中国注册会计师, 2018, (11): 80-84. [Chen W, Gou DS, Xu FL. Research on big data audit method on basis of text data analysis[J]. The Chinese Certified Public Accountant, 2018, (11): 80-84.] DOI: 10.16292/j.cnki.issn1009-6345.2018.11.016.