Med-Eval 榜单

模型 Med-Eval CEval-MMLU-CMMLU MedQA-USMLE MedQA-MCMLE PubMedQA MedMCQA USMLE
AiMed 59.33 40.38 61.62 42.86
Human(pass) 60.00 - 50.00
Human(expert) 78.00 90.00 87.00
GPT-4 77.16 80.28 74.58 72.51
GPT-3.5 Turbo 61.17 53.81 52.92 63.90 56.25 57.00
LLaMA-13B-PEFT 35.14 28.83 23.38 65.40 39.52 38.73
LLaMA2-13B 47.42 35.04 29.74 42.12
Vicuna-13B 40.99 34.80 27.67 40.66
Chinese-Alpaca-Plus-13B 46.31 27.49 32.66 35.87
XVERSE-13B 58.08 32.99 58.76 41.34
Med-Flamingo
BioMedLM 50.30
BioGPT
PMC-LLaMA (7B)-PEFT 68.20 34.33 30.64
0%