18th IIAI International Congress on Advanced Applied Informatics, pp. 806-809, July 14, 2025
3rd International Conference on Computational and Data Sciences in Economics and Finance (CDEF 2025) in 18th IIAI International Congress on Advanced Applied Informatics (IIAI AAI 2025)
With the advancement of large language models (LLMs), there has been growing demand for performance evaluation across various domains. This study proposes pfmt-benchf in-ja, a preferred multi-turn benchmark for finance in Japanese. pfmt-bench-fin-ja is a specialized multi-turn Japanese generation benchmark with 12 categories and 360 questions, analogous to MT-bench in LLM evaluation. Evaluation was conducted using GPT-4o-mini as an LLM-as-a-judge, with scores measured on a 10-point scale. Experimental results demonstrate that pfmtbench-fin-ja enables consistent performance evaluation of LLMs across multiple models. The benchmark is publicly available on GitHub.
Large Language Model; LLM; benchmark; finance; Japanese;
10.1109/IIAI-AAI67470.2025.00147
@inproceedings{Hirano2025-pfmt, title={{pfmt-bench-fin-ja: Preferred Multi-turn Benchmark for Finance in Japanese}}, author={Masanori Hirano and Kentaro Imajo}, booktitle={18th IIAI International Congress on Advanced Applied Informatics}, isbn={979-8-3315-9937-9}, pages={806-809}, publisher={IEEE}, doi={10.1109/IIAI-AAI67470.2025.00147}, year={2025} }