English

Developing a Scalable Benchmark for Assessing Large Language Models in Knowledge Graph Engineering

Artificial Intelligence 2023-09-01 v1 Computation and Language Databases

Abstract

As the field of Large Language Models (LLMs) evolves at an accelerated pace, the critical need to assess and monitor their performance emerges. We introduce a benchmarking framework focused on knowledge graph engineering (KGE) accompanied by three challenges addressing syntax and error correction, facts extraction and dataset generation. We show that while being a useful tool, LLMs are yet unfit to assist in knowledge graph generation with zero-shot prompting. Consequently, our LLM-KG-Bench framework provides automatic evaluation and storage of LLM responses as well as statistical data and visualization tools to support tracking of prompt engineering and model performance.

Keywords

Cite

@article{arxiv.2308.16622,
  title  = {Developing a Scalable Benchmark for Assessing Large Language Models in Knowledge Graph Engineering},
  author = {Lars-Peter Meyer and Johannes Frey and Kurt Junghanns and Felix Brei and Kirill Bulert and Sabine Gründer-Fahrer and Michael Martin},
  journal= {arXiv preprint arXiv:2308.16622},
  year   = {2023}
}

Comments

To be published in SEMANTICS 2023 poster track proceedings. SEMANTICS 2023 EU: 19th International Conference on Semantic Systems, September 20-22, 2023, Leipzig, Germany