Welcome to visit Zhongnan Medical Journal Press Series journal website!

Research on real-world knowledge mining and knowledge graph completion (II): Methods and pro-gress of information extraction from unstructured electronic medical records

Published on Mar. 13, 2023Total Views: 3769 timesTotal Downloads: 1414 timesDownloadMobile

Author: Si-Yu YAN 1 Xu-Hui LI 1 Mu-Kun CHEN 2 Hai-Feng ZHU 2 Jie-Jun TAN 2 Kuang GAO 2 Yong-Bo WANG 1 Qiao HUANG 1 Xiang-Ying REN 1 Ying-Hui JIN 1 Xing-Huan WANG 1

Affiliation: 1. Center for Evidence-Based and Translational Medicine, Zhongnan Hospital of Wuhan University, Wuhan 430071, China 2. School of Computer Science, Wuhan University, Wuhan 430072, China

Keywords: Unstructured data Electronic medical record Information extraction Text mining Natural language processing Ontology Real-world data

DOI: 10.12173/j.issn.1004-5511.202301016

Reference: Yan SY, Li XH, Chen MK, Zhu HF, Tan JJ, Gao K, Wang YB, Huang Q, Ren XY, Jin YH, Wang XH. Research on real-world knowledge mining and knowledge graph completion (II): Methods and progress of information extraction from unstructured electronic medical records[J]. Yixue Xinzhi Zazhi, 2023, 33(5): 358-365. DOI: 10.12173/j.issn.1004-5511.202301016. [Article in Chinese]

  • Abstract
  • Full-text
  • References
Abstract

With the popularization and promotion of information technology, healthcare big data is growing exponentially, and clinical real-world research based on healthcare big data is receiving increasing attention. The hospital electronic medical record (EMR) records the whole process of diagnosis and treatment of patients in the "real-world", and is one of the most supportive data sources for clinical decision-making. However, the existence of a large number of unstructured text data in EMR data increases the difficulty of data processing and restricts the development of research based on EMR data. Advanced methods such as information technology and artificial intelligence need to be applied to the processing of unstructured EMR data to accelerate the transformation of data value. This paper summarizes the current common methods of unstructured medical data processing, including methods based on dictionaries and rules, methods based on traditional machine learning and deep learning, and methods based on cognitive models represented by ontology, and also discusses the problems of standardization and transparent reporting when processing unstructured EMR data and looks forward to the relevant development.

Full-text
Please download the PDF version to read the full text: download
References

1. 国务院办公厅 . 国务院办公厅关于促进和规范健康医疗大数据应用发展的指导意见(国办发〔2016〕47 号 )[EB/OL]. (2016-06-24) [2022-12-02]. http://www.gov.cn/zhengce/content/2016-06/24/content_5085091.htm.

2. 施秀青, 阎思宇, 黄桥, 等. 真实世界研究:弥合临床实践指南与临床决策之间的距离[J]. 协和医学杂志,  1-18. [Shi XQ, Yan SY, Huang Q, et, al. Real world research: helping clinical practice guidelines span the distance between itself and clinical decision making[J]. Medical Journal of Peking Union Medical College Hos-pital,  1-18.] DOI: 10.12290/xhyxzz.2022-0217.

3. CD Mack, L Parmenter, E Brinkley, et al. 利用补充真实世界数据研究获得更深层次的认识[J]. 药物流行病学杂志, 2016, 25(1): 27-37. [CD Mack, L Parmenter, E Brinkley, et al. Using enriched real-world research for deeper insights[J]. Chinese Journal of Pharmacoepidemiology, 2016, 25(1): 27-37.] DOI: 10.19960/j.cnki.issn1005-0698.2016.01.009.

4. Zhang L, Wang H, Li Q, et al. Big data and medical research in China[J]. BMJ, 2018, 360: j5910. DOI: 10.1136/bmj.j5910.

5. Consultant AH. Why unstructured data holds the key to intelligent healthcare systems[EB/OL]. (2015-3-31)[2022-12-02]. https://hitconsultant.net/2015/03/31/tapping-unstructured-data-healthcares-biggest-hurdle-realized/.

6. Kong HJ. Managing unstructured big data in healthcare system[J]. Healthc Inform Res, 2019, 25(1):1-2. DOI: 10.4258/hir.2019.25.1.1.

7. Wang Y, Wang L, Rastegar-Mojarad M, et al. Clinical information extraction applications: a literature review[J]. J Biomed Inform, 2018, 77: 34-49. DOI: 10.1016/j.jbi.2017.11.011.

8. Vollmer S, Mateen BA, Bohner G, et al. Machine learning and artificial intelligence research for patient benefit: 20 critical questions on transparency, replicability, ethics, and effectiveness[J]. BMJ, 2020, 368: l6927. DOI: 10.1136/bmj.l6927.

9. He J, Baxter SL, Xu J, et al. The practical implementation of artificial intelligence technologies in medi-cine[J]. Nat Med, 2019, 25(1): 30-36. DOI: 10.1038/s41591-018-0307-0.

10.程显毅, 朱倩, 王进. 中文信息抽取原理及应用[M].               北京: 科学出版社, 2010. [Cheng XY, Zhu Q, Wang J. Principle and application of chinese information extraction[M]. Beijing: China Science Publish-ing &Media, 2010.]

11.Malmasi S, Hosomura N, Chang LS, et al. Extracting healthcare quality information from unstructured data[J]. AMIA Annu Symp Proc, 2018, 2017: 1243-1252. https://pubmed.ncbi.nlm.nih.gov/29854193/.

12.吴宗友, 白昆龙, 杨林蕊, 等. 电子病历文本挖掘研究综述[J]. 计算机研究与发展, 2021, 58(3): 513-527. [Wu ZY, Bai KL, Yang LR, et, al. Review on text mining of electronic medical record[J]. Journal of Computer Re-search and Development, 2021, 58(3): 513-527.] DOI: 10.7544/issn1000-1239.2021.20200402.

13.崔博文, 金涛, 王建民. 自由文本电子病历信息抽取综述[J]. 计算机应用, 2021, 41(4): 1055-1063. [Cui BW, Jin T, Wang JM. Overview of information extraction of free-text electronic medical records[J]. Journal of Computer Applications, 2021, 41(4): 1055-1063.] DOI: 10.11772/j.issn.1001-9081.2020060796.

14.朱彦, 朱玲, 王俊慧, 等. 基于信息抽取的历代方剂药物知识发现方法及应用[J]. 中华中医药杂志, 2015, 30(5): 1447-1451. [Zhu Y, Zhu L, Wang JH, et, al. An efficient approach of acquiring knowledge from ancient prescriptions and medicines based on information extraction[J]. China Journal of Traditional Chinese Medicine and Pharmacy, 2015, 30(5): 1447-1451.] DOI: CNKI:SUN:BXYY.0.2015-05-017.

15.Anzaldi LJ, Davison A, Boyd CM, et al. Comparing clinician descriptions of frailty and geriatric syn-dromes using electronic health records: a retrospective cohort study[J]. BMC Geriatr, 2017, 17(1): 248. DOI: 10.1186/s12877-017-0645-7.

16.Miettinen J, Tanskanen T, Degerlund H, et al. Accurate pattern-based extraction of complex Gleason score expressions from pathology reports[J]. J Biomed Inform,  2021, 120: 103850. DOI: 10.1016/j.jbi.2021.103850.

17.包小源, 黄婉晶, 张凯, 等. 非结构化电子病历中信息抽取的定制化方法[J]. 北京大学学报(医学版) , 2018, 50(2): 256-263. [Bao XY, Huang WJ, Zhang K, et, al. A customized method for information extraction from unstructured text data in the electronic medical records[J]. Journal of Peking University(Health Scienc-es),  2018, 50(2): 256-263.] DOI: 10.3969/j.issn.1671-167X.2018.02.010.

18.吴欢, 应俊, 王逸飞, 等. 乳腺癌病理文本的结构化信息提取[J]. 解放军医学院学报, 2020, 41(7): 746-751. [Wu H, Ying J, Wang YF, et, al. Structured information extraction from breast cancer pathological report texts[J]. Academic Journal of Chinese Pla Medical School, 2020, 41(7): 746-751.] DOI: 10.3969/j.issn.2095-5227.2020.07.022.

19.朱玲, 朱彦, 杨峰. 基于中医疾病相关语义关系的正则表达式及知识抽取研究[J]. 世界科学技术-中医药现代化, 2016, 18(8): 1241-1250. [Zhu L, Zhu Y, Yang F. Knowledge extraction research for semantic expression of diseases in chinese medicine[J]. World Science and Technology-Modernization of Traditional Chinese Medicine, 2016, 18(8):1241-1250.] DOI: 10.11842/wst.2016.08.004.

20.Fu S, Chen D, He H, et al. Clinical concept extraction: A methodology review[J]. J Biomed Inform, 2020, 109: 103526.  DOI: 10.1016/j.jbi.2020.103526.

21.Gupta A, Banerjee I, Rubin DL. Automatic information extraction from unstructured mammography reports using distributed semantics[J]. J Biomed Inform, 2018, 78: 78-86. DOI: 10.1016/j.jbi.2017.12.016.

22.Khaleghi T, Murat A, Arslanturk S, et al. Automated surgical term clustering: a text mining approach for unstructured textual surgery descriptions[J]. IEEE J Biomed Health Inform, 2020, 24(7): 2107-2118. DOI: 10.1109/JBHI.2019.2956973.

23.Sarker A, Gonzalez G. Portable automatic text classification for adverse drug reaction detection via multi-corpus training[J]. Biomed Inform, 2015, 53: 196-207. DOI: 10.1016/j.jbi.2014.11.002.

24.Deleger L, Brodzinski H, Zhai H, et al. Developing and evaluating an automated appendicitis risk strat-ification algorithm for pediatric patients in the emergency department[J]. J Am Med Inform Assoc, 2013, 20(e2): e212-20.  DOI: 10.1136/amiajnl-2013-001962.

25.Yoon HJ, Ramanathan A, Tourassi G. Multi-task deep neural networks for automated extraction of primary site and laterality information from cancer pathology reports[C]. Advances in Big Data: Pro-ceedings of the 2nd INNS Conference on Big Data, 2016, Thessaloniki, Greece 2. Springer Internation-al Publishing, 2017: 195-204.

26.Alawad M, Yoon H J, Tourassi G D. Coarse-to-fine multi-task training of convolutional neural networks for automated information extraction from cancer pathology reports[C]. 2018 IEEE EMBS Internation-al Conference on Biomedical & Health Informatics (BHI). IEEE, 2018: 218-221.

27.Yang X, Bian J, Hogan WR, et al. Clinical concept extraction using transformers[J]. J Am Med Inform Assoc, 2020, 27(12): 1935-1942. DOI: 10.1093/jamia/ocaa189.

28.Health NCIatNIo. Joint design of advanced computing solutions for cancer (JDACS4C). https://datascience.cancer.gov/collaborations/joint-design-advanced-computing.

29.张伟, 张展鹏, 张明淘, 等. 医疗健康知识挖掘中的语义资源、数据集和工具[J]. 计算机技术与发展, 2022, 32(4):21-27. [Zhang W, Zhang ZP, Zhang MT, et al. Semantic resource,dataset and tool for medical health knowledge mining[J]. Computer Technology and Development, 2022, 32(4):21-27.] DOI: 10.3969/j.issn.1673-629X.2022.04.004.

30.Studer R, Benjamins VR, Fensel D. Knowledge engineering: principles and methods[J]. Data & Knowledge Engineering, 1998, 25(1): 161-197. DOI: 10.1016/S0169-023X(97)00056-6.

31.姜丽华, 张宏斌, 杨晓蓉. 基于领域本体的文本挖掘研究[J]. 情报科学, 2014, 32(12). [Jiang LH, Zhang HB, Yang XR. Research on text mining based on domain ontology[J]. Information Science,2014, 32(12). DOI: 10.13833/j.cnki.is.2014.12.024.

32.阳广元. 国内基于本体的信息抽取研究现状与热点分析[J]. 图书馆理论与实践, 2017(5): 38-43. [Yang GY. Re-search Status and Hotspot Analysis of Ontology Based Information Extraction in China[J]. Library The-ory and Practice, DOI: 10.14064/j.cnki.issn1005-8214.2017.05.008.

33.Arguello Casteleiro M, Demetriou G, Read W, et al. Deep learning meets ontologies: experiments to anchor the cardiovascular disease ontology in the biomedical literature[J]. J Biomed Semantics, 2018, 9(1):13. DOI: 10.1186/s13326-018-0181-1.

34.Shen F, Peng S, Fan Y, et al. HPO2Vec+: Leveraging heterogeneous knowledge resources to enrich node embeddings for the Human Phenotype Ontology[J]. J Biomed Inform, 2019, 96: 103246. DOI: 10.1016/j.jbi.2019.103246.

35.Popejoy LL, Khalilia MA, Popescu M, et al. Quantifying care coordination using natural language pro-cessing and domain-specific ontology[J]. J Am Med Inform Assoc,  2015, 22(e1): e93-103. DOI: 10.1136/amiajnl-2014- 002702.

36.赖俊恺, 王斌, 姚晨, 等. 从真实世界数据到临床研究数据的标准转化研究[J]. 中国食品药品监管, 2021, (11): 39-46. [Lai JK, Wang B, Yao C, et al. Research on standards transformation from real-world data to clinical research data[J]. China Food Drug Administration, 2021(11): 39-46.] DOI: 10.3969/j.issn.1673-5390.2021.11.005.

37.周晓音. SNOMED CT在临床路径中应用探讨[J]. 医学信息学杂志, 2010, 31(9): 8-12. [Zhou XY. Discussion and research on the application of SNOMED CT in clinical pathway[J]. Journal of Medical Intelligence, 2010, 31(9): 8-12.] DOI: 10.3969/j.issn.1673-6036.2010.09.002.

38.Wang SV, Patterson OV, Gagne JJ, et al. Transparent reporting on research using unstructured elec-tronic health record data to generate 'real world' evidence of comparative effectiveness and safety[J]. Drug Saf, 2019, 42(11): 1297-309. DOI: 10.1007/s40264-019-00851-0.