Med-Eval 榜单
模型 | Med-Eval | CEval-MMLU-CMMLU | MedQA-USMLE | MedQA-MCMLE | PubMedQA | MedMCQA | USMLE |
---|---|---|---|---|---|---|---|
AiMed | 59.33 | 40.38 | 61.62 | 42.86 | |||
Human(pass) | 60.00 | - | 50.00 | ||||
Human(expert) | 78.00 | 90.00 | 87.00 | ||||
GPT-4 | 77.16 | 80.28 | 74.58 | 72.51 | |||
GPT-3.5 Turbo | 61.17 | 53.81 | 52.92 | 63.90 | 56.25 | 57.00 | |
LLaMA-13B-PEFT | 35.14 | 28.83 | 23.38 | 65.40 | 39.52 | 38.73 | |
LLaMA2-13B | 47.42 | 35.04 | 29.74 | 42.12 | |||
Vicuna-13B | 40.99 | 34.80 | 27.67 | 40.66 | |||
Chinese-Alpaca-Plus-13B | 46.31 | 27.49 | 32.66 | 35.87 | |||
XVERSE-13B | 58.08 | 32.99 | 58.76 | 41.34 | |||
Med-Flamingo | |||||||
BioMedLM | 50.30 | ||||||
BioGPT | |||||||
PMC-LLaMA (7B)-PEFT | 68.20 | 34.33 | 30.64 |