< back

pfmt-bench-fin-ja: Preferred Multi-turn Benchmark for Finance in Japanese

Masanori Hirano, Kentaro Imajo

18th IIAI International Congress on Advanced Applied Informatics, pp. 806-809, July 14, 2025


Conference

3rd International Conference on Computational and Data Sciences in Economics and Finance (CDEF 2025) in 18th IIAI International Congress on Advanced Applied Informatics (IIAI AAI 2025)

Abstract

With the advancement of large language models (LLMs), there has been growing demand for performance evaluation across various domains. This study proposes pfmt-benchf in-ja, a preferred multi-turn benchmark for finance in Japanese. pfmt-bench-fin-ja is a specialized multi-turn Japanese generation benchmark with 12 categories and 360 questions, analogous to MT-bench in LLM evaluation. Evaluation was conducted using GPT-4o-mini as an LLM-as-a-judge, with scores measured on a 10-point scale. Experimental results demonstrate that pfmtbench-fin-ja enables consistent performance evaluation of LLMs across multiple models. The benchmark is publicly available on GitHub.

Keywords

Large Language Model; LLM; benchmark; finance; Japanese;

doi

10.1109/IIAI-AAI67470.2025.00147


bibtex

@inproceedings{Hirano2025-pfmt,
  title={{pfmt-bench-fin-ja: Preferred Multi-turn Benchmark for Finance in Japanese}},
  author={Masanori Hirano and Kentaro Imajo},
  booktitle={18th IIAI International Congress on Advanced Applied Informatics},
  isbn={979-8-3315-9937-9},
  pages={806-809},
  publisher={IEEE},
  doi={10.1109/IIAI-AAI67470.2025.00147},
  year={2025}
}