Retrieval-Grounded HDFS Log Anomaly Detection and Deterministic Failure Narrative Generation
DOI:
https://doi.org/10.64229/j6d7fr94Keywords:
HDFS and system logs, Anomaly detection, Retrieval-augmented generation, Selective refusal; Failure narratives, Reproducible evaluationAbstract
Objectives/Scope: This paper evaluates Hadoop Distributed File System (HDFS) log anomaly detection on the public HDFS_100k structured-log subset derived from LogHub, with emphasis on detection, evidence-grounded explanation, and selective refusal under label ambiguity. Methods, Procedures, and Process: After block-level sessionization, the benchmark contains 7,940 traces and 313 anomalous sessions (3.94%). A reproducible hybrid detector that combines linear discriminative scoring, pattern-memory posteriors, trace statistics, and a calibrated stacking stage was implemented. A retrieval-augmented generation-inspired layer then assembles evidence bundles and renders deterministic failure narratives; no external large language model is used. Results, Observations, and Conclusions: On a fixed 60/20/20 split, the proposed model obtains F1-score (the harmonic mean of precision and recall) = 0.6452, precision-recall area under the curve (PR-AUC) = 0.5440, and receiver operating characteristic area under the curve (ROC-AUC) = 0.7562. Although logistic regression and linear support vector machine baselines reach slightly higher fixed-threshold F1-score values of 0.6596, the proposed model provides the best cross-validation ranking quality, with mean PR-AUC = 0.5318 and mean ROC-AUC = 0.7558. Exact-pattern ambiguity explains 32 of the 33 fixed-test errors. Selective refusal improves F1-score to 0.6742 at 89.8% coverage and to 0.9375 at 23.9% coverage. Novel/Additive Information: The study contributes an ambiguity-aware operational pipeline that integrates scoring, explanation, and abstention, providing a transferable evaluation pattern for data-intensive infrastructure, including storage and monitoring systems used in petroleum-industry digital operations.
References
[1]Xu W, Huang L, Fox A, Patterson D, Jordan MI. Detecting large-scale system problems by mining console logs. Proceedings of the ACM SIGOPS 22nd symposium on Operating systems principles, 2009, 117-132. DOI: 10.1145/1629575.1629587
[2]Zhu JM, He SL, He PJ, Liu JY, Lyu MR, LogHub: A large collection of system log datasets for AI-driven log analytics. 2023 IEEE 34th International Symposium on Software Reliability Engineering, 2023, 138-149. DOI: 10.1109/ISSRE59848.2023.00071
[3]He SL, Zhu JM, He PJ, Lyu MR. Experience report: System log analysis for anomaly detection. 2016 IEEE 27th International Symposium on Software Reliability Engineering, 2016, 207-218. DOI: 10.1109/ISSRE.2016.21
[4]He PJ, Zhu JM, Zheng ZB, Lyu MR. Drain: An online log parsing approach with fixed depth tree. 2017 IEEE International Conference on Web Services, 2017, 33-40. DOI: 10.1109/ICWS.2017.13
[5]Du M, Li FF, Zheng GN, Srikumar V. DeepLog: Anomaly detection and diagnosis from system logs through deep learning. Proceedings of the 2017 ACM SIGSAC Conference on Computer and Communications Security, 2017, 1285-1298. DOI: 10.1145/3133956.3134015
[6]Meng WB, Liu Y, Zhu YC, Zhang SL, Pei D, Liu YQ, et al. LogAnomaly: Unsupervised detection of sequential and quantitative anomalies in unstructured logs. Proceedings of the Twenty-Eighth International Joint Conference on Artificial Intelligence, 2019, 4739-4745. DOI: 10.24963/ijcai.2019/658
[7]Yang L, Chen JJ, Wang Z, Wang WJ, Jiang JJ, Dong XY, et al. Semi-supervised log-based anomaly detection via probabilistic label estimation. 2021 IEEE/ACM 43rd International Conference on Software Engineering, 2021, 1448-1460. DOI: 10.1109/ICSE43902.2021.00130
[8]Guo HX, Yuan SH, Wu XT. LogBERT: Log anomaly detection via BERT. 2021 International Joint Conference on Neural Networks, 2021, 1-8. DOI: 10.1109/IJCNN52387.2021.9534113
[9]Huang SH, Liu Y, Fung C, He R, Zhao YN, Yang HL, et al. HitAnomaly: Hierarchical transformers for anomaly detection in system log. IEEE Trans. Network and Service Management, 2020, 17(4), 2064-2076. DOI: 10.1109/TNSM.2020.3034647
[10]Geifman Y, El-Yaniv R. Selective classification for deep neural networks. Proceedings of the 31st International Conference on Neural Information Processing Systems, 2017, 4885-4894. DOI: 10.5555/3295222.3295241
[11]Guo C, Pleiss G, Sun Y, Weinberger KQ. On calibration of modern neural networks. Proceedings of the 34th International Conference on Machine Learning, 2017, 70, 1321-1330. DOI: 10.5555/3305381.3305518
[12]Doshi-Velez F, Kim B. Towards a rigorous science of interpretable machine learning. ArXiv, 2017. Available form: https://api.semanticscholar.org/CorpusID:11319376 (accessed on 3 June 2025).
[13]Guidotti R, Monreale A, Ruggieri S, Turini F, Giannotti F, Pedreschi D. A survey of methods for explaining black box models. ACM Computing Surveys, 2019, 51(5), 1-42. DOI: 10.1145/3236009
[14]Gunning D, Aha DW. DARPA's explainable artificial intelligence program," AI Magazine, 2019, 40(2), 44-58. DOI: 10.1609/aimag.v40i2.2850
[15]Ribeiro MT, Singh S, Guestrin C. "Why should I trust you?": Explaining the predictions of any classifier. Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, 2016, 1135-1144. DOI: 10.1145/2939672.2939778
[16]Lundberg SM, Lee SI. A unified approach to interpreting model predictions. Proceedings of the 31st International Conference on Neural Information Processing Systems, 2017, 4768-4777. Available form: https://arxiv.org/abs/1705.07874 (accessed on 22 May 2025).
[17]Lewis P, Perez E, Piktus A, Petroni F, Karpukhin V, Goyal N, et al. Retrieval-augmented generation for knowledge-intensive NLP tasks. Proceedings of the 34th International Conference on Neural Information Processing Systems, 2020, 9459-9947. DOI: 10.5555/3495724.3496517
[18]Loglizer: A machine learning toolkit for log-based anomaly detection. GitHub Repository, 2016. Available form: https://github.com/logpai/loglizer (accessed on 20 January 2026).
[19]logpai, "HDFS_v1 dataset README," GitHub Repository, 2023. Available form: https://github.com/logpai/loghub/tree/master/HDFS (accessed on 20 January 2026).
[20]Fu Q, Lou JG, Wang Y, and Li J. Execution anomaly detection in distributed systems through unstructured log analysis. 2009 Ninth IEEE International Conference on Data Mining, 2009, 149-158. DOI: 10.1109/ICDM.2009.60
[21]Cortes C, Vapnik V. Support-vector networks. Machine Learning, 1995, 20(3), 273-297. DOI: 10.1007/BF00994018
[22]Chen T, Guestrin C. XGBoost: A scalable tree boosting system. Association for Computing Machinery, 2016, 785-794. DOI: 10.1145/2939672.2939785
[23]Liu FT, Ting KM, Zhou ZH. Isolation forest. 2008 Eighth IEEE International Conference on Data Mining, 2008, 413-422. DOI: 10.1109/ICDM.2008.17
[24]Vaswani A, Shazeer N, Parmar N, Uszkoreit J, Jones L, Gomez AN, et al. Attention is all you need. 31st Conference on Neural Information Processing Systems (NIPS 2017), 2017. DOI: 10.48550/arXiv.1706.03762
[25]Devlin J, Chang MW, Lee K, Toutanova K. BERT: Pre-training of deep bidirectional transformers for language understanding. Association for Computational Linguistics, 2019, 4171-4186. DOI: 10.18653/v1/N19-1423
[26]Brown TB, Mann B, Ryder N, Subbiah M, Kaplan J, Dhariwal P, et al. Language models are few-shot learners. NIPS'20: Proceedings of the 34th International Conference on Neural Information Processing System, 2020, 159, 1877-1819. DOI: 10.5555/3495724.3495883
[27]Pei CH, Liu ZH, Li JH, Zhang E, Zhang L, Zhang HM, et al. Self-evolutionary group-wise log parsing based on large language model. 2024 IEEE 35th International Symposium on Software Reliability Engineering (ISSRE), 2024, 49-60. DOI: 10.1109/ISSRE62328.2024.00016
[28]Yamanaka Y, Takahashi T, Minami T, Nakajima Y. LogELECTRA: Self-supervised anomaly detection for unstructured logs. ArXiv, 2024. Available form: https://arxiv.org/abs/2402.10397 (accessed on 16 February 2026).
[29]Ma LP, Yang WD, Jiang SH, Fei B, Zhang MJ, Li SH, et al., LUK: Empowering log understanding with expert knowledge from large language models. IEEE Transactions on Software Engineering, 2025, 51, 2764-2786. DOI: 10.1109/TSE.2025.3594046
[30]Ji YH, Liu YL, Yao FY, He MG, Tao SM, Zhao XF, et al. Adapting large language models to log analysis with interpretable domain knowledge. CIKM '25: Proceedings of the 34th ACM International Conference on Information and Knowledge Management, 2025, 1135-1144. DOI: 10.1145/3746252.3761189
[31]Guan W, Cao J, Qian S, Gao J, Ouyang C, LogLLM: Log-based anomaly detection using large language models. ArXiv, 2024. Available form: https://arxiv.org/abs/2411.08561 (accessed on 3 March 2026).
[32]Astekin M, Hort M, Moonen L. An exploratory study on how non-determinism in large language models affects log parsing. 2024 IEEE/ACM 2nd International Workshop on Interpretability, Robustness, and Benchmarking in Neural Software Engineering (InteNSE), 2024. DOI: 10.1145/3643661.3643952
[33]Beck V, Landauer M, Wurzenberger M, Skopik F, Rauber A. System log parsing with large language models: A review. ArXiv, 2025. Available form: https://arxiv.org/abs/2504.04877 (accessed on 24 March 2026).
[34]Liu YL, Chen Z, Xu S, He MG, Tao SM, Meng WB, et al. R-Log: Incentivizing log analysis capability in LLMs via reasoning-based reinforcement learning. ArXiv, 2025. Available form: https://arxiv.org/abs/2509.25987 (accessed on 17 February 2026).
[35] Huang JJ, He MH, Liu JY, Huo YT, Bianculli D, Lyu MR. CodeAD: Synthesize code of rules for log-based anomaly detection with LLMs. ArXiv, 2025. Available form: https://arxiv.org/abs/2510.22986 (accessed on 15 March 2026).
[36]Gupta P, Bhukar K, Kumar H, Nagar S, Mohapatra P, Kar D. Scalable and efficient large-scale log analysis with LLMs: An IT software support case study. ArXiv, 2025. Available form: https://arxiv.org/abs/2511.14803 (accessed on 9 February 2026).
[37]Cadet X, Singh AV, Mamania H, Koh E, Fitts A, Bruggen DV, et al. Retrieval-augmented LLMs for security incident analysis. ArXiv, 2026. Available form: https://arxiv.org/abs/2603.18196 (accessed on 18 March 2026).
[38]Rossi MT, Mariani L, Riganelli O, Filomeno G, Giannone D, Gavazzo P. "Where is My Troubleshooting Procedure?": Studying the potential of RAG in assisting failure resolution of large cyber-physical system. IEEE/ACM 48th Int. Conf. Software Engineering: Software Engineering in Practice (ICSE-SEIP), 2026. DOI: 10.1145/3786583.3786890
Downloads
Published
Issue
Section
License
Copyright (c) 2026 Xinzhuo Sun, Ziliang Samuel Zhong, Qiyou Wu (Author)

This work is licensed under a Creative Commons Attribution 4.0 International License.