Skip to content

Postgraduate Experience

From the fourth year of undergraduate studies through the second year of doctoral studies, the research focus shifted to medical big data and knowledge engineering, with 13 papers published or in progress. The research objective is to harness massive medical data for disease prediction and knowledge discovery, addressing science and engineering problems related to people's livelihoods. The work spans from underlying data processing and algorithmic modeling to high-level system design, supporting the deployment of medical intelligence applications.

After the second year of doctoral studies, the research direction further shifted toward the underlying reasoning capabilities of large language models, with 3 papers currently in progress. Beginning with expansion from medical scenarios to public security, multilingual, and other domains of large model training, it was found that both model reliability and generalization are significantly constrained by underlying reasoning capabilities. This led to systematic research on reasoning abilities, exploring from three perspectives: evaluation methods, data construction, and algorithmic mechanisms.

Phase 2 Research: Theory-Oriented

Since 2024, the primary focus has been on large language model underlying technologies, under the supervision of Prof. Jie Tang.

LLM research surpassing top human expert levels: Towards Reliable and Generalizable Reasoning in Foundation Models

Existing Challenges

OpenAI's view: From current human-achieved intelligence to superintelligence will be accomplished through five steps: AI learns language ✅ — AI can solve problems ✅ — AI can use tools ✅ — AI can self-learn ⌛️ — AI can self-organize.

Zhipu's view: From current human-achieved intelligence to superintelligence will be accomplished through five steps: Language capability ✅ — Reasoning capability ✅ — Learning capability ⌛️ — Cognitive capability — Conscious intelligence.

Research Path

As large-scale language models (LLMs) achieve breakthroughs in critical domains such as healthcare, public security, multilingual, mathematics, and code intelligence, their underlying reasoning capabilities have become the core bottleneck affecting model reliability, generalization, and application boundaries. Current models often exhibit issues such as "appearing to solve problems without knowing why they are correct" and "unstable capabilities when switching scenarios," limiting model trustworthiness in high-risk scenarios.

Based on practical experience from cross-domain model training and data system construction, this research aims to systematically study from both theoretical and engineering perspectives: how to precisely measure, continuously improve, and stably transfer the logical reasoning capabilities of large models across scenarios, laying theoretical and methodological foundations for trustworthy AI and interpretable intelligence.

Research Outputs

Main Research: Large Model Reasoning

Mathematical reasoning and multimodal reasoning in large models

Paper Venue Author Position Category
Glm-4.5: Agentic, reasoning, and coding (arc) foundation models Arxiv Non-first author Preprint
A Survey of Post-Training Scaling in Large Language Models ACL Non-first author CCF-A
Project Specific Output Period
Logic Paper under blind review 2025.03-present
Math Enhancing Mathematical Reasoning in Multimodal Large Language Models 2024.03-2024.10
General Self-Learning: Evaluation & Data & New Scaling Law 2024.12-2025.01

Main Research: Large Model Training and Evaluation

Project Specific Output Period
Internationalization Developed "Belt and Road Sovereign Large Model" MalayGLM, partnering with Malaysia's leading family conglomerate to help Malaysia build a national-level sovereign large model and empower the Malaysian large model industry ecosystem. This represents a breakthrough as the first meaningful deployment of a Chinese sovereign large model in a friendly country. (Achieved SOTA results in evaluation.) 2024.09-2025.03
Internationalization Resolved ChatGLM mixed Chinese-English response issues 2024.03-2024.06
Evaluation Meta-evaluation 2024.12-2025.02
Evaluation Leveraging Models as Teachers: A Comprehensive Evaluation of Reasoning Abilities in Large Language Models (Arts, Education, Multilingual, etc.) 2024.03-2024.09
Evaluation Generalization evaluation and research of o1-like models 2024.09-2025.03

Main Research: Large Model Applications

Applications of large models in public security, healthcare, and sports

  • Establishment of crime prediction large model
  • Construction of sports multi-agent system

Phase 1 Research: Engineering-Oriented

System architecture for 2021–2023: Research System

Natural language processing technology research for intelligent healthcare with health big data: Digital Twin Patient & AI Digital Doctor

Existing Challenges

Demand Challenges — Who builds it, who uses it

  • Market demand: Profit model design, cost control, resource acquisition
  • National strategy: Data assets, people's livelihood and welfare (Healthy China, etc.)
  • Physician needs: Similar patients for handling current cases, patient cohorts for research
  • Patient needs: Fastest, most convenient, cheapest, most effective, most comprehensive solutions

Data Challenges — Massive, multi-source, heterogeneous

  • Health and medical data come from diverse sources with varied types, including electronic health records, basic public health, health exams, clinical diagnosis and treatment, disease detection, health insurance, and more.
  • This multi-source data exhibits characteristics such as large volume, broad sources, diverse structures, dispersed storage, and uneven quality.
  • These factors significantly reduce the usability of health and medical data, making direct analysis and mining difficult.

Algorithm Challenges — Accurate, efficient, interpretable, self-optimizing

  • Whether for disease prediction, knowledge recommendation, or similar case recommendation, maximum accuracy is required
  • Algorithms must be efficient to handle time-sensitive queries
  • Algorithms must be strongly interpretable to support medical scenarios
  • Algorithms should be self-optimizing, as medical data is dynamically updated and expanded in real time

Engineering Challenges — Technical and product deployment

  • Frontend: UI, interaction
  • Backend: Algorithm module encapsulation and coupling
  • Database: Graph database, big data engines
  • Operations, iteration, and optimization

Research Path

Processing object: Medical big data

Processing technology: Natural language processing (machine learning + deep learning + large-scale pre-trained models + knowledge graphs)

Processing principle: Start with a broad research vision — Implement an initial demo using mainstream technologies — Identify issues during the process — Refine research — Realize the final system

Specific path:

  1. Knowledge acquisition: Data mining — Extract as much information as possible from massive electronic medical record data
  2. Knowledge management: Graph/LLM — Construct patient graphs and knowledge graphs using patient information and related knowledge
  3. Knowledge reasoning: Prediction/Generation/Recommendation — Conduct prediction, generation, and recommendation on graphs and large models
  4. Knowledge engine: Search/Dialogue — User service and related retrieval and generation based on knowledge

Research Outputs

Main Research: Large Models + Big Data + Natural Language Processing

Research on natural language processing technology for medical big data analysis and utilization (Doctoral thesis)

Responsible for topic selection, literature review, theoretical innovation, experiments, and paper writing

Named Entity Recognition in Electronic Medical Records Sub-topic Note
Paper Research on Deep Learning Models for Chinese Electronic Medical Record Named Entity Recognition Beijing Outstanding Thesis
Paper Research and Progress in Chinese Electronic Medical Record Named Entity Recognition Peking University Core Chinese Journal of Electronics
Paper KrNER: A Novel Named Entity Recognition Method Based on Knowledge Enhancement and Remote Supervision CCF-C CSE2023
Medical Knowledge Graph Supporting Health Data Elements Sub-topic Note
Paper KLDP: A Data Profiling Technique Based on Knowledge Graph and Large Language Modeling CCF-C CSE2023
Intelligent Healthcare Large Model Research Driven by Data and Knowledge Sub-topic Code
Report ChatGPT Created the AI Wave Microsoft Invitation
Paper AiMed: Artificial Intelligent Large Language Model for Medicine in China IEEE AiMed
Paper ChatFUV: Chat Chain for Follow-Up Visit
System AiMed Medical Knowledge Large Model Application Service System Software Copyright
Paper MedRad: A Framework for Reliable Assisted Decision Making in a Medical Large Language Model MedRad
Paper Toward a Large Language Model-Driven Medical Knowledge Retrieval and QA System: Framework Design and Evaluation Engineering
Paper Towards Artificial Intelligence for Science: A Case Study of Using ChatGPT for Disease Causality Discovery from Biomedical Literature SCI Q1
Paper Large Language Models Driven Reliable Clinical Decision-Making: Framework and Application SCI Q3
Paper NewMed: Large Language Modeling Technology Enables Full Process Digital Intelligence in Medical Care
Paper Doctor: The Most Reliable Digital Intelligence Healthcare Large Language Model System
Paper MedLib: Research on the construction of a knowledge library for medical large language modeling

Participating Research: Medical Engineering + Network

Large models and blockchain (sub-topic)

Responsible for cross-disciplinary research, theoretical innovation, application deployment, and paper writing

Key Technologies for Trusted On-chain and Off-chain Data Interaction in Blockchain Vertical - Ministry of Science and Technology
Website Open Data Entry Public Service
Paper Med-Eval: Blockchain Assessment Platform for Medical Large Language Model
Paper OpenMonet: Open Model Orchestration Network

Intelligent healthcare of the new generation emerging from digital intelligence (sub-topic)

Responsible for cross-disciplinary research, scenario innovation, application deployment, and paper writing

Clinical Comprehensive Assessment Management and Early Warning System for Elderly Renal Function Decline Vertical - Ministry of Science and Technology
Paper Research of Client Selection Algorithm in Cross-device Federated Learning Peking University Core
Medical data annotation system, Chinese word segmentation system, disease prediction system (1000+ diseases), intelligent self-diagnosis system Related Services

Coursework

All project code is open-sourced on GitHub (continuously updated)

Type Specific Area Main Skills Related Outputs
Machine Learning Advanced Machine Learning (A) ML MedRad: A Reliable Assisted Decision Making Framework for Medical LLMs
RAG-NEWS: Using RAG to Help LLMs Access Latest News [Code]
Digital Intelligence Security and Standardization (A) Tech Law AI Algorithm Transparency Implementation and Evaluation—Taking Recommendation Systems as Example
Computational Linguistics (A) NLP-Base Fine-tuning Chinese Pre-trained Models for Text Classification
Frontiers of Information Retrieval (A) IR Intelligent Medical Search Engine for Multi-source Heterogeneous Health Big Data
Big Data Analysis and Processing (A) Big Data Analysis Multimodal Dialogue Scenarios and Topic Switching Understanding
Data Mining: Principles and Algorithms Data Mining Online News Popularity Prediction
Clustering Analysis of Diabetic Patient Admission Data [Code]
Knowledge Engineering NLP-KG Event Extraction Based on MAVEN Dataset
Cross-lingual Knowledge Graph
Principles of Artificial Intelligence AI Machine Translation (Chinese-English) Based on Machine Learning
Reading Comprehension Based on Prior Knowledge [Code]
Parallel Computing, Algorithm and Complexity Theory, Combinatorics High-Performance Computing Parallelization of Π Solving
Applications of Combinatorics in AI
Other Related Skills Comprehensive Chinese Marxism and Contemporary (A), Dialectics of Nature (A), Big Data Analysis, Big Data and Biostatistics, Big Data Practice, Doctoral English (Exempted), Professional Capability Extension Training

Published Academic Works

[11] Zeng A, Lv X, Zheng Q, et al. Glm-4.5: Agentic, reasoning, and coding (arc) foundation models[J]. arXiv preprint arXiv:2508.06471, 2025. (Arxiv)

[10] Du J, Li X, Liu Y, et al. Large language models driven reliable clinical decision-making: Framework and application[J]. Informatics and Health, 2025. (SCI Q3, First Author)

[9] X. Li, J. Du, Y. Liu, H. Yin and H. Liu, "Towards Artificial Intelligence for Science: A Case Study of Using ChatGPT for Disease Causality Discovery from Biomedical Literature," in Big Data Mining and Analytics, vol. 9, no. 2, pp. 554-562, April 2026, doi: 10.26599/BDMA.2025.9020086. (SCI Q1 Big Data Mining and Analytics, Co-first Author)

[8] Liu Y, Li X, Luo Y, et al. Toward a Large Language Model-Driven Medical Knowledge Retrieval and QA System: Framework Design and Evaluation[J]. Engineering, 2025. (SCI Q1 Engineering, Fourth Author)

[7] Hanyu Lai, Xiao Liu, Junjie Gao, Jiale Cheng, Zehan Qi, Yifan Xu, Shuntian Yao, Dan Zhang, Jinhua Du, Zhenyu Hou, Xin Lv, Minlie Huang, Yuxiao Dong, and Jie Tang. 2025. A Survey of Post-Training Scaling in Large Language Models. In Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 2771–2791, Vienna, Austria. Association for Computational Linguistics. (CCF-A: ACL)

[6] J. Du, X. Li, Z. Jiang, Y. Liu, H. Yin and H. Liu, "AiMed: Artificial Intelligent Large Language Model for Medicine in China," 2024 IEEE International Conference on Medical Artificial Intelligence (MedAI), Chongqing, China, 2024, pp. 360-365, doi: 10.1109/MedAI62885.2024.00054. (IEEE, First Author)

[5] Lü Tingyu, Li Xiaoying, Zhang Ying, Liu Yuyang, Du Jinhua, et al. Research on Construction of Chinese Medical Knowledge Large Model Q&A Corpus Dataset[J]. Journal of Medical Informatics, 2024, 45(5):20-25. DOI:10.3969/j.issn.1673-6036.2024.05.004. (Chinese Science and Technology Core Journal, Fifth Author)

[4] Zhang Ruilin, Du Jinhua, Yin Hao. Research of Client Selection Algorithm in Cross-device Federated Learning[J]. Journal of Software. (CCF-A Recommended Chinese Science Journal, THU-B Journal, Peking University Core, Second Author)

[3] Jinhua Du and Hao Yin. KLDP: A Data Profiling Technique Based on Knowledge Graph and Large Language Modeling. 2023 IEEE 22nd International Conference on Trust, Security and Privacy in Computing and Communications (TrustCom), 2023.11 (DOI 10.1109/TrustCom60117.2023.00329) (CCF-C International Conference, First Author)

[2] Du J, Yin H. KrNER: A Novel Named Entity Recognition Method Based on Knowledge Enhancement and Remote Supervision[C]//2023 IEEE 22nd International Conference on Trust, Security and Privacy in Computing and Communications (TrustCom). IEEE, 2023: 2323-2332. (CCF-C International Conference, First Author)

[1] Du Jinhua, Yin Hao, Feng Song. Research and Progress in Chinese Electronic Medical Record Named Entity Recognition[J]. Chinese Journal of Electronics, 2022, 50(12): 3030-3053. (CCF-A Recommended Chinese Science Journal, THU-B Journal, Peking University Core, First Author)

Unpublished Academic Works

[7-9] Under anonymous review

[6] Jinhua Du. ChatFUV: Chat Chain for Follow-Up Visit

[5] Jinhua Du. NewMed: Large Language Modeling Technology Enables Full Process Digital Intelligence in Medical Care

[4] Jinhua Du. Doctor: The Most Reliable Digital Intelligence Healthcare Large Language Model System

[3] Jinhua Du. OpenMonet: Open Model Orchestration Network

[2] Jinhua Du. Med-Eval: Benchmarks for the Medical Large Language Model

[1] Jinhua Du. MedLib: Research on the construction of a knowledge library for medical large language modeling

Published Software Copyrights

[1] AiMed Medical Knowledge Large Model Application Service System — Software Copyright (2024.02.29). Service available at the Institute of Medical Information, Chinese Academy of Medical Sciences: AiMed Medical Knowledge Large Model — Jointly developed by the Institute of Medical Information, Chinese Academy of Medical Sciences and Tsinghua University OpenDE team, providing medical knowledge Q&A and intelligent literature services for medical research and innovation.

Media Coverage

[3] 2025 "Large Model Wild Goose Migration Plan" Initiator | Year-end Review and Outlook: Journeying Together Toward a New Era

[2] 2024 Outstanding Practice Individual | Computer Science Du Jinhua: Iron Will, Pursuing Dreams in Public Security

[1] 3rd China Medical Informatics Discipline Development Conference (2023.11.25) As first completer, AiMed large model was released. GitHub code and Hugging Face parameters have been open-sourced. Corresponding paper AiMed: Artificial Intelligent Large Language Model for Medicine in China was published at MedAI.

Honors & Awards

Category Content Unit Date
Recognition Outstanding Communist Youth League Member Tsinghua University Communist Youth League Committee 2025.10
Led class to Excellence Class & League (1st place), Paired Class & League Tsinghua University 2025.09
Outstanding Trainee in Social Work Seminar Tsinghua University 2025.04
Led class to Dedication Class & League Tsinghua University Graduate Communist Youth League Committee 2025.03
Outstanding Graduate Student Cadre Tsinghua University Computer Science Department 2024.06
Outstanding Department Member, Computer Science Graduate League Tsinghua University Computer Science Department 2023.05
Outstanding Department Member, Tanzhen Technology Department Tsinghua University 2022.12
Outstanding Trainee, Graduate New Student Cadre Training & League School Tsinghua University Graduate Committee 2022.08
Awards 2024 Journal of Software Highly Cited Researcher Journal of Software Editorial Board 2026.01
Tsinghua Friends - Hefei Elite Scholarship Tsinghua University 2025.12
Tsinghua University Ma Yuehan Cup Bodybuilding Competition 5th Place Tsinghua University Student Bodybuilding Association 2025.12
"Computing Future" Doctoral Forum Excellence Award (1st) Tsinghua University Computer Science Department 2025.10
Social Work Second Prize Scholarship Tsinghua University 2024.12
University Huiyan Elite Scholarship (Second Class) Tsinghua University 2024.12
Outstanding Social Practice Award Tsinghua University 2024.12
Social Practice Gold Award Team (2nd university-wide) Tsinghua University Party Graduate Work Department 2024.11
Beijing Challenge Cup Gold, National Challenge Cup Third Beijing Communist Youth League Committee 2024.09
Thank-you Letters Zhoushan City, Zhejiang Zhoushan Municipal Talent Work Leading Group 2024.09
Zhoushan Dinghai District Public Security Bureau Public Security Bureau Cyber Security Team 2024.09
Hangzhou High-tech Zone, Zhejiang Human Resources Bureau 2023.07

Work Experience

Category Location Position Period
Lab Tsinghua University Knowledge Engineering Group (KEG) Doctoral Student 2024.02-present
Lab Beijing National Research Center for Information Science and Technology — Big Data-driven Knowledge Management and Decision Team Research Assistant 2021.09-2024.02
Internship Beijing Zhipu Huazhang Technology — AI Institute Intern 2024.03-present
Internship Zhoushan Dinghai District Public Security Bureau Cyber Security Team, Zhejiang External Expert 2024.07-2024.08
Internship Hunan Wangshu Technology AI Algorithm Engineer 2022.08
Social Work Tsinghua University Communist Youth League Practice Department — Platform Group Team Leader 2024.09-2025.03
Social Work Tsinghua University Computer Science Department — Class 53 Class Assistant 2024.09-present
Social Work Tsinghua University Computer Science Department — Class 53 Party Branch Secretary 2024.09-present
Social Work Tsinghua University Computer Science Department — Class 52 League Branch Secretary 2022.08-2023.09
Social Work Tsinghua University Computer Science Department Communist Youth League Practice Department Secretary 2023.05-2024.06
Social Work Tsinghua University Computer Science Department Communist Youth League Practice Department Department Member 2022.09-2023.05
Social Work Tsinghua University Tanzhen Technology Review Society Editorial Department Staff: AI Community Lead 2022.08-2023.08
Social Work Tsinghua University Graduate Student Union Sports Department Staff 2022.08-2023.08
  • Served as course teaching assistant (AML & ML course — bilingual Chinese-English graduate course taught by Prof. Tang Jie). Tasks include: creating course materials, course design, assignment design, website, lectures, book writing, teaching, serving as session chair at two paper conferences, and maintaining contact with each student and guest for communication.
  • Participated twice in Tsinghua think tank seminars and spoke as representative.
  • Summer 2024: As team captain, led a six-week summer social practice in Zhoushan, Zhejiang. Successfully completed the "Crime Prediction Large Model Establishment" project at Dinghai District Public Security Bureau, applying public security big data to real-world needs. Organized cross-regional, cross-unit activities, demonstrating Tsinghua spirit of learning, practice, and responsibility. Received university Gold Award and high praise from both the Public Security Bureau and team members.
  • Served as Computer Science Department Communist Youth League Practice Department core member. Responsible for: summer graduate social practice award materials review, press release writing, university-enterprise visit planning, liaison, and organization. Published one article as first correspondent on Tsinghua University News website; assisted with multiple practice announcements.
  • Served as University Student Union Sports Department core member. Responsible for on-site promotional materials collection. Participated and received 2022 Campus Marathon 10km medal, 2023 Campus Marathon half-marathon medal.
  • Served as University Tanzhen Technology Review Society Editorial Department core member. Responsible for tech hotspot research and commentary and interviewing academicians with press release writing. Handled weekly tech consultation work 6 times, corresponding to 6 published articles.
  • Served as Computer Science Department Graduate League Practice Vice Secretary. Participated in all "Welcoming the 20th Congress, Industrial New Forces" seven-department joint industry visits. 2023.03.16 ByteDance OpenDay captain, liaison and publicity lead; 2023.04.13 Tencent OpenDay captain, liaison and publicity lead; 2023.04.24 Meituan OpenDay captain, liaison and publicity lead; 2023.04.27 Ninecube Investment OpenDay deputy captain; 2023.06.17 NetEase Youdao OpenDay captain, liaison and publicity lead; 2023.07.10-12 Hangzhou regional visit: 13 enterprises, captain, liaison and publicity lead. Total output: 1 report (50,000 words), 23 main articles.
  • Completed 2 social work courses and trainings: 2022.09 Tsinghua University Doctoral Lecturer Corps "Liyan Plan" 6th Session (Fall); 2022.08 Tsinghua University 16th Graduate New Student Cadre Training & 37th League School (Graduate Class).