NeFut Logo NeFut
Admin Login

[CS.AI] FactoryLLM: A Safe Open-Source Playground for LLM Evaluation

Published at: 2026-06-15 22:00 Last updated: 2026-06-16 12:14
#AI #Machine Learning #Open Source

Fault diagnostics and recovery in smart factories is challenging due to critical information being dispersed across manuals of multiple machines interconnected through the manufacturing process. Large Language Models (LLMs) present a promising approach. This paper introduces FactoryLLM, a safe and open-source AI playground designed for evaluating various LLM-based retrieval-augmented generation (RAG) models by analyzing documents from multiple machines.

FactoryLLM allows users to configure the LLM and assess performance through a dual evaluation setup using both RAGAS and NVIDIA's LLM-as-a-Judge metrics. It is considered safe as it enables users to run local or open-source LLMs without sharing sensitive industrial data, providing a controlled environment for experimentation.

The efficacy of FactoryLLM is demonstrated through a case study involving an Autonomous Intelligent Vehicle and its Mobile Planner software, evaluating three LLMs across 30 maintenance queries derived from approximately 600 pages of cross-machine documentation. The results indicate that FactoryLLM is effective in cross-machine document reasoning, with every model achieving a groundedness score above 0.88. Full code and documentation are publicly available for the community to test FactoryLLM in their manufacturing-specific scenarios.

Blogger's Review: FactoryLLM offers an innovative tool for the smart manufacturing sector, enabling the assessment and optimization of LLM performance while safeguarding data privacy. This openness and flexibility empower enterprises to adapt to evolving technological demands, driving advancements in smart factories.

Original Source: https://arxiv.org/abs/2606.14119

[h] Back to Home