Retrieval-Grounded HDFS Log Anomaly Detection and Deterministic Failure Narrative Generation

Xinzhuo Sun; Ziliang Samuel Zhong; Qiyou Wu

doi:10.64229/j6d7fr94

Authors

Xinzhuo Sun Computer Engineering, Cornell Tech, Cornell University, 2 West Loop Road, New York, NY 10044, USA Author
Ziliang Samuel Zhong New York University, 70 Washington Square South, New York, NY 10012, USA Author
Qiyou Wu Artificial Intelligence, Northeastern University, 440 Huntington Avenue, Boston, MA 02115, USA Author

DOI:

https://doi.org/10.64229/j6d7fr94

Keywords:

HDFS and system logs, Anomaly detection, Retrieval-augmented generation, Selective refusal; Failure narratives, Reproducible evaluation

Abstract

Objectives/Scope: This paper evaluates Hadoop Distributed File System (HDFS) log anomaly detection on the public HDFS_100k structured-log subset derived from LogHub, with emphasis on detection, evidence-grounded explanation, and selective refusal under label ambiguity. Methods, Procedures, and Process: After block-level sessionization, the benchmark contains 7,940 traces and 313 anomalous sessions (3.94%). A reproducible hybrid detector that combines linear discriminative scoring, pattern-memory posteriors, trace statistics, and a calibrated stacking stage was implemented. A retrieval-augmented generation-inspired layer then assembles evidence bundles and renders deterministic failure narratives; no external large language model is used. Results, Observations, and Conclusions: On a fixed 60/20/20 split, the proposed model obtains F1-score (the harmonic mean of precision and recall) = 0.6452, precision-recall area under the curve (PR-AUC) = 0.5440, and receiver operating characteristic area under the curve (ROC-AUC) = 0.7562. Although logistic regression and linear support vector machine baselines reach slightly higher fixed-threshold F1-score values of 0.6596, the proposed model provides the best cross-validation ranking quality, with mean PR-AUC = 0.5318 and mean ROC-AUC = 0.7558. Exact-pattern ambiguity explains 32 of the 33 fixed-test errors. Selective refusal improves F1-score to 0.6742 at 89.8% coverage and to 0.9375 at 23.9% coverage. Novel/Additive Information: The study contributes an ambiguity-aware operational pipeline that integrates scoring, explanation, and abstention, providing a transferable evaluation pattern for data-intensive infrastructure, including storage and monitoring systems used in petroleum-industry digital operations.

References

[1]Xu W, Huang L, Fox A, Patterson D, Jordan MI. Detecting large-scale system problems by mining console logs. Proceedings of the ACM SIGOPS 22nd symposium on Operating systems principles, 2009, 117-132. DOI: 10.1145/1629575.1629587

[2]Zhu JM, He SL, He PJ, Liu JY, Lyu MR, LogHub: A large collection of system log datasets for AI-driven log analytics. 2023 IEEE 34th International Symposium on Software Reliability Engineering, 2023, 138-149. DOI: 10.1109/ISSRE59848.2023.00071

[3]He SL, Zhu JM, He PJ, Lyu MR. Experience report: System log analysis for anomaly detection. 2016 IEEE 27th International Symposium on Software Reliability Engineering, 2016, 207-218. DOI: 10.1109/ISSRE.2016.21

[4]He PJ, Zhu JM, Zheng ZB, Lyu MR. Drain: An online log parsing approach with fixed depth tree. 2017 IEEE International Conference on Web Services, 2017, 33-40. DOI: 10.1109/ICWS.2017.13

[5]Du M, Li FF, Zheng GN, Srikumar V. DeepLog: Anomaly detection and diagnosis from system logs through deep learning. Proceedings of the 2017 ACM SIGSAC Conference on Computer and Communications Security, 2017, 1285-1298. DOI: 10.1145/3133956.3134015

[6]Meng WB, Liu Y, Zhu YC, Zhang SL, Pei D, Liu YQ, et al. LogAnomaly: Unsupervised detection of sequential and quantitative anomalies in unstructured logs. Proceedings of the Twenty-Eighth International Joint Conference on Artificial Intelligence, 2019, 4739-4745. DOI: 10.24963/ijcai.2019/658

[7]Yang L, Chen JJ, Wang Z, Wang WJ, Jiang JJ, Dong XY, et al. Semi-supervised log-based anomaly detection via probabilistic label estimation. 2021 IEEE/ACM 43rd International Conference on Software Engineering, 2021, 1448-1460. DOI: 10.1109/ICSE43902.2021.00130

[8]Guo HX, Yuan SH, Wu XT. LogBERT: Log anomaly detection via BERT. 2021 International Joint Conference on Neural Networks, 2021, 1-8. DOI: 10.1109/IJCNN52387.2021.9534113

[9]Huang SH, Liu Y, Fung C, He R, Zhao YN, Yang HL, et al. HitAnomaly: Hierarchical transformers for anomaly detection in system log. IEEE Trans. Network and Service Management, 2020, 17(4), 2064-2076. DOI: 10.1109/TNSM.2020.3034647

[10]Geifman Y, El-Yaniv R. Selective classification for deep neural networks. Proceedings of the 31st International Conference on Neural Information Processing Systems, 2017, 4885-4894. DOI: 10.5555/3295222.3295241

[11]Guo C, Pleiss G, Sun Y, Weinberger KQ. On calibration of modern neural networks. Proceedings of the 34th International Conference on Machine Learning, 2017, 70, 1321-1330. DOI: 10.5555/3305381.3305518

[12]Doshi-Velez F, Kim B. Towards a rigorous science of interpretable machine learning. ArXiv, 2017. Available form: https://api.semanticscholar.org/CorpusID:11319376 (accessed on 3 June 2025).

[13]Guidotti R, Monreale A, Ruggieri S, Turini F, Giannotti F, Pedreschi D. A survey of methods for explaining black box models. ACM Computing Surveys, 2019, 51(5), 1-42. DOI: 10.1145/3236009

[14]Gunning D, Aha DW. DARPA's explainable artificial intelligence program," AI Magazine, 2019, 40(2), 44-58. DOI: 10.1609/aimag.v40i2.2850

[15]Ribeiro MT, Singh S, Guestrin C. "Why should I trust you?": Explaining the predictions of any classifier. Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, 2016, 1135-1144. DOI: 10.1145/2939672.2939778

[16]Lundberg SM, Lee SI. A unified approach to interpreting model predictions. Proceedings of the 31st International Conference on Neural Information Processing Systems, 2017, 4768-4777. Available form: https://arxiv.org/abs/1705.07874 (accessed on 22 May 2025).

[17]Lewis P, Perez E, Piktus A, Petroni F, Karpukhin V, Goyal N, et al. Retrieval-augmented generation for knowledge-intensive NLP tasks. Proceedings of the 34th International Conference on Neural Information Processing Systems, 2020, 9459-9947. DOI: 10.5555/3495724.3496517

[18]Loglizer: A machine learning toolkit for log-based anomaly detection. GitHub Repository, 2016. Available form: https://github.com/logpai/loglizer (accessed on 20 January 2026).

[19]logpai, "HDFS_v1 dataset README," GitHub Repository, 2023. Available form: https://github.com/logpai/loghub/tree/master/HDFS (accessed on 20 January 2026).

[20]Fu Q, Lou JG, Wang Y, and Li J. Execution anomaly detection in distributed systems through unstructured log analysis. 2009 Ninth IEEE International Conference on Data Mining, 2009, 149-158. DOI: 10.1109/ICDM.2009.60

[21]Cortes C, Vapnik V. Support-vector networks. Machine Learning, 1995, 20(3), 273-297. DOI: 10.1007/BF00994018

[22]Chen T, Guestrin C. XGBoost: A scalable tree boosting system. Association for Computing Machinery, 2016, 785-794. DOI: 10.1145/2939672.2939785

[23]Liu FT, Ting KM, Zhou ZH. Isolation forest. 2008 Eighth IEEE International Conference on Data Mining, 2008, 413-422. DOI: 10.1109/ICDM.2008.17

[24]Vaswani A, Shazeer N, Parmar N, Uszkoreit J, Jones L, Gomez AN, et al. Attention is all you need. 31st Conference on Neural Information Processing Systems (NIPS 2017), 2017. DOI: 10.48550/arXiv.1706.03762

[25]Devlin J, Chang MW, Lee K, Toutanova K. BERT: Pre-training of deep bidirectional transformers for language understanding. Association for Computational Linguistics, 2019, 4171-4186. DOI: 10.18653/v1/N19-1423

[26]Brown TB, Mann B, Ryder N, Subbiah M, Kaplan J, Dhariwal P, et al. Language models are few-shot learners. NIPS'20: Proceedings of the 34th International Conference on Neural Information Processing System, 2020, 159, 1877-1819. DOI: 10.5555/3495724.3495883

[27]Pei CH, Liu ZH, Li JH, Zhang E, Zhang L, Zhang HM, et al. Self-evolutionary group-wise log parsing based on large language model. 2024 IEEE 35th International Symposium on Software Reliability Engineering (ISSRE), 2024, 49-60. DOI: 10.1109/ISSRE62328.2024.00016

[28]Yamanaka Y, Takahashi T, Minami T, Nakajima Y. LogELECTRA: Self-supervised anomaly detection for unstructured logs. ArXiv, 2024. Available form: https://arxiv.org/abs/2402.10397 (accessed on 16 February 2026).

[29]Ma LP, Yang WD, Jiang SH, Fei B, Zhang MJ, Li SH, et al., LUK: Empowering log understanding with expert knowledge from large language models. IEEE Transactions on Software Engineering, 2025, 51, 2764-2786. DOI: 10.1109/TSE.2025.3594046

[30]Ji YH, Liu YL, Yao FY, He MG, Tao SM, Zhao XF, et al. Adapting large language models to log analysis with interpretable domain knowledge. CIKM '25: Proceedings of the 34th ACM International Conference on Information and Knowledge Management, 2025, 1135-1144. DOI: 10.1145/3746252.3761189

[31]Guan W, Cao J, Qian S, Gao J, Ouyang C, LogLLM: Log-based anomaly detection using large language models. ArXiv, 2024. Available form: https://arxiv.org/abs/2411.08561 (accessed on 3 March 2026).

[32]Astekin M, Hort M, Moonen L. An exploratory study on how non-determinism in large language models affects log parsing. 2024 IEEE/ACM 2nd International Workshop on Interpretability, Robustness, and Benchmarking in Neural Software Engineering (InteNSE), 2024. DOI: 10.1145/3643661.3643952

[33]Beck V, Landauer M, Wurzenberger M, Skopik F, Rauber A. System log parsing with large language models: A review. ArXiv, 2025. Available form: https://arxiv.org/abs/2504.04877 (accessed on 24 March 2026).

[34]Liu YL, Chen Z, Xu S, He MG, Tao SM, Meng WB, et al. R-Log: Incentivizing log analysis capability in LLMs via reasoning-based reinforcement learning. ArXiv, 2025. Available form: https://arxiv.org/abs/2509.25987 (accessed on 17 February 2026).

[35] Huang JJ, He MH, Liu JY, Huo YT, Bianculli D, Lyu MR. CodeAD: Synthesize code of rules for log-based anomaly detection with LLMs. ArXiv, 2025. Available form: https://arxiv.org/abs/2510.22986 (accessed on 15 March 2026).

[36]Gupta P, Bhukar K, Kumar H, Nagar S, Mohapatra P, Kar D. Scalable and efficient large-scale log analysis with LLMs: An IT software support case study. ArXiv, 2025. Available form: https://arxiv.org/abs/2511.14803 (accessed on 9 February 2026).

[37]Cadet X, Singh AV, Mamania H, Koh E, Fitts A, Bruggen DV, et al. Retrieval-augmented LLMs for security incident analysis. ArXiv, 2026. Available form: https://arxiv.org/abs/2603.18196 (accessed on 18 March 2026).

[38]Rossi MT, Mariani L, Riganelli O, Filomeno G, Giannone D, Gavazzo P. "Where is My Troubleshooting Procedure?": Studying the potential of RAG in assisting failure resolution of large cyber-physical system. IEEE/ACM 48th Int. Conf. Software Engineering: Software Engineering in Practice (ICSE-SEIP), 2026. DOI: 10.1145/3786583.3786890

Retrieval-Grounded HDFS Log Anomaly Detection and Deterministic Failure Narrative Generation

Authors

DOI:

Keywords:

Abstract

References

Downloads

Published

Issue

Section

License

How to Cite

Information

Language

Make a Submission