AI on Zili Shen

AI on Zili Shen https://zilishen.com/tags/ai/ Recent content in AI on Zili Shen Hugo en-us Thu, 07 May 2026 00:00:00 +0000 Agentic AI evals: lessons from real life https://zilishen.com/blog/agentic-ai-evals/ Thu, 07 May 2026 00:00:00 +0000 https://zilishen.com/blog/agentic-ai-evals/ AI products can change under your feet. Here’s what I learned about measuring whether they do what you think they should. Automatic failure diagnosis https://zilishen.com/blog/probellm-failure-diagnosis/ Tue, 28 Apr 2026 00:00:00 +0000 https://zilishen.com/blog/probellm-failure-diagnosis/ An eval score going down tells you something broke. It doesn’t tell you what. ProbeLLM is a new approach to automatic failure diagnosis that treats AI evaluation like an oral exam. Grading the graders: how do we know if an AI judge is any good? https://zilishen.com/blog/llm-judge-validation/ Fri, 23 Jan 2026 00:00:00 +0000 https://zilishen.com/blog/llm-judge-validation/ We use AI systems to evaluate other AI systems. But validating those judges is harder than it looks — especially when the right answer isn’t as clear as it seems.