Welcome to visit Zhongnan Medical Journal Press Series journal website!

Research on real-world knowledge mining and knowledge graph completion v(III):structured information extraction from real world data of bladder cancer based on regular expression

Published on Mar. 29, 2024Total Views: 781 timesTotal Downloads: 682 timesDownloadMobile

Author: MA Wenhao 1, 2 SHI Hanyu 3 HUANG Qiao 1 HUANG Xing 4 WANG Yongbo 1 WANG Shichun 1 REN Xiangying 1 SHI Yue 5 JIN Yinghui 1 YAN Siyu 1

Affiliation: 1. Center for Evidence-Based and Translational Medicine, Zhongnan Hospital of Wuhan University, Wuhan 430071, China 2. The Second Clinical College of Wuhan University, Wuhan 430071, China 3. HongYi Honor College of Wuhan University, Wuhan 430072, China 4. Department of Urology, The First Affiliated Hospital, Zhejiang University School of Medicine, Hangzhou 310003, China 5. Information Center, Zhongnan Hospital of Wuhan University, Wuhan 430071, China

Keywords: Real-world data Information extraction Regular expression Natural language processing Electronic medical record data Bladder cancer

DOI: 10.12173/j.issn.1004-5511.202308006

Reference: Ma WH, Shi HY, Huang Q, Huang X, Wang YB, Wang SC, Ren XY, Shi Y, Jin YH, Yan SY. Research on real-world knowledge mining and knowledge graph completion (III): structured information extraction from real world data of bladder cancer based on regular expression[J]. Yixue Xinzhi Zazhi, 2024, 34(3): 312-321. DOI:10.12173/j.issn.1004-5511.202308006.[Article in Chinese]

  • Abstract
  • Full-text
  • References
Abstract

With the development of medical big data, the real-world study (RWS) has received increasing attention in recent years, and has a good promising prospect. However, there are still some challenges in the implementation of RWS that has led to extensive discussion among scholars. The most urgent issue currently to be addressed is the unstructured nature of real-world data (RWD). Based on regular expressions, this study used rule-based information extraction method to extract structured information from admission records, pathological reports, surgical records, and image records of bladder cancer patients in Zhongnan Hospital of Wuhan University in recent years, and evaluated the extraction effects with accuracy and recall as indicators, aiming to provide reference for subsequent research.

Full-text
Please download the PDF version to read the full text: download
References

1.US FDA. Real-world evidence program framework[EB/ OL]. (2019-05) [2022-07-13]. https://www.fda.gov/drugs/webinar-framework-fdas-real-world-evidence-program-mar-15-2019.

2.杨羽, 詹思延. 上市后大数据药品安全主动监测模式研究的必要性和可行性[J]. 药物流行病学杂志, 2016, 25(7): 401-404, 413. [Yang Y, Zhan SY. Analysis of necessity and feasibility in studies of post-marketing drug safety active surveillance based on big data[J]. Chinese Journal of Pharmacoepidemiology, 2016, 25(7): 401-404, 413.] DOI: 10.19960/j.cnki.issn1005-0698.2016.07.001.

3.阎思宇,李绪辉,陈沐坤,等. 面向真实世界的知识挖掘与知识图谱补全研究(二): 非结构化电子病历信息抽取方法及进展[J]. 医学新知, 2023, 33(5): 358-365. [Yan SY, Li XH, Chen MK, et al. Research on real-world knowledge mining and knowledge graph completion (II): methods and progress of information extraction from unstructured electronicmedical records[J]. Yixue Xinzhi Zazhi, 2023, 33(5): 358-365.] DOI: 10.12173/j.issn.1004-5511.202301016.

4.胡军伟,秦奕青,张伟. 正则表达式在Web信息抽取中的应用[J]. 北京信息科技大学学报(自然科学版), 2011, 26(6): 86-89. [Hu JW, Qin YQ, Zhang W. Regular expression and its applications to web information extraction[J]. Journal of Beijing Institute of Machinery, 2011, 26(6): 86-89.] DOI: 10.3969/j.issn.1674-6864. 2011.06.019.

5.Cheung ATM, Kurland DB, Neifert S, et al. Developing an automated registry (Autoregistry) of spine surgery using natural language processing and health system scale databases[J]. Neurosurgery. 2023, 93(6): 1228-1234. DOI: 10.1227/neu.0000000000002568.

6.Flores CA, Figueroa RL, Pezoa JE. FREGEX: a feature extraction method for biomedical text classification using regular expressions[J]. Annu Int Conf IEEE Eng Med Biol Soc. 2019, 2019: 6085-6088. DOI: 10.1109/EMBC.2019.8857471.

7.范玉玲,顾进广,黄智生. 中文医学指南的事件处理及其语义数据自动生成[J]. 中国数字医学, 2015(9): 76-78, 112. [Fan YL, Gu JG, Huang ZS. Event handling of Chinese medical guide and the automatic generation of its semantic data[J]. China Digital Medicine, 2015(9): 76-78, 112.] DOI: 10.3969/j.issn.1673-7571.2015.09.026.

8.Humphrey PA, Moch H, Cubilla AL, et al. The 2016 WHO classification of tumours of the urinary system and male genital organs-part b: prostate and bladder tumours[J]. Eur Urol. 2016, 70(1): 106-119. DOI: 10.1016/j.eururo. 2016.02.028.

9.Amin MB, Greene FL, Edge SB, et al. The eighth edition AJCC cancer staging manual: continuing to build a bridge from a population-based to a more "personalized" approach to cancer staging[J]. CA Cancer J Clin. 2017, 67(2): 93-99. DOI: 10.3322/caac.21388.

10.徐荣飞. Python正则表达式研究[J]. 电脑编程技巧与维护, 2015(9): 45, 49. [Xu RF. Python research on regular expressions[J]. Computer Programming Skills & Maintenance, 2015(9): 45, 49.] DOI: 10.3969/j.issn.1006-4052.2015.09.020.

11.梁立荣,李长伟,沈晔,等. 基于层叠条件随机场模型的电子病历文本信息抽取[J]. 计算机应用与软件, 2019, 36(10): 47-54, 112. [Liang LR, Li CW, Shen Y, et al. Text information extraction for electronic medical record based on cascaded conditional random field model[J]. Computer Applications and Software, 2019, 36(10): 47-54, 112.] DOI: 10.3969/j.issn.1000-386x.2019.10.009.

12.吴欢,应俊,王逸飞,等. 乳腺癌病理文本的结构化信息提取[J]. 解放军医学院学报, 2020, 41(7): 746-751. [Wu H, Ying J, Wang YF, et al. Structured information extraction from breast cancer pathological report texts[J]. Academic Journal of Chinese PLA Medical School, 2020, 41(7): 746-751.] DOI: 10.3969/j.issn.2095-5227.2020.07.022.

13.杨金荣,喻杰,叶豪,等. 正则表达式在提取冠状动脉CTA和钙化积分报告结构化信息中的应用[J]. 中国数字医学, 2022, 17(11): 38-44. [Yang JR, Yu J, Ye H, et al. Application of regular expression in extracting structured information of coronary artery CTA and calcification score reports[J]. China Digital Medicine, 2022, 17(11): 38-44.] DOI: 10.3969/j.issn.1673-7571.2022.11.008.

14.安辉. 健康评估中医学知识的可视化呈现与交互[D]. 浙江: 杭州师范大学, 2019. [An H. Visual presentation and interaction of medical knowledge in health assessment[D]. Zhejiang: Hangzhou Normal University, 2019.]

15.王晓琳. 正则表达式生成与复杂正则表达式识别技术研究[D]. 北京:中国科学院大学, 2022. [Wang XL. Research on regular expression generation and complex regular expression recognition techniques[D]. Beijing: University of Chinese Academy of Sciences, 2022.]

16.鲍彤,章成志. ChatGPT中文信息抽取能力测评——以三种典型的抽取任务为例[J/OL]. 数据分析与知识发现, 1-16. [Bao T, Zhang CZ. Extracting Chinese information with ChatGPT: an empirical study by three typical tasks[J/QL]. Data Analysis and Knowledge Discovery, 1-16. DOI: 10.11925/infotech.2096-3467. 2023.0473.

17.吴骋,徐蕾,秦婴逸,等. 中文电子病历多层次信息抽取方法的探索[J]. 中国数字医学, 2020, 15(6): 29-31. [Wu P, Xu L, Qin YY. Exploration on the multi-level information extraction method of Chinese electronic medical records[J]. China Digital Medicine, 2020, 15(6): 29-31.] DOI: 10.3969/j.issn.1673-7571.2020.06.009.

18.Adamson B, Waskom M, Blarre A, et al. Approach to machine learning for extraction of real-world data variables from electronic health records[J]. Front Pharmacol. 2023, 14: 1180962. DOI: 10.3389/fphar.2023.1180962.

19.周虎子威,张云静,于玥琳,等. 机器学习方法在预测麻精药品不合理使用风险中的应用现状和思考[J]. 药物流行病学杂志, 2023, 32(4): 446-457. [Zhou HZW, Zhang YJ, Yu YL, et al. Application of machine learning methods in predicting the risk of irrational use of narcotic and psychotropic drugs:current status and considerations[J]. Chinese Journal of Pharmacoepidemiology, 2023, 32(4): 446-457.] DOI: 10.19960/j.issn.1005-0698.202304010.