<?xml version="1.0" encoding="utf-8" standalone="yes"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:content="http://purl.org/rss/1.0/modules/content/">
  <channel>
    <title>Blog on Zili Shen</title>
    <link>https://zilishen.com/blog/</link>
    <description>Recent content in Blog on Zili Shen</description>
    <generator>Hugo</generator>
    <language>en-us</language>
    <lastBuildDate>Thu, 07 May 2026 00:00:00 +0000</lastBuildDate>
    <atom:link href="https://zilishen.com/blog/index.xml" rel="self" type="application/rss+xml" />
    <item>
      <title>Agentic AI evals: lessons from real life</title>
      <link>https://zilishen.com/blog/agentic-ai-evals/</link>
      <pubDate>Thu, 07 May 2026 00:00:00 +0000</pubDate>
      <guid>https://zilishen.com/blog/agentic-ai-evals/</guid>
      <description>AI products can change under your feet. Here&amp;rsquo;s what I learned about measuring whether they do what you think they should.</description>
    </item>
    <item>
      <title>Automatic failure diagnosis</title>
      <link>https://zilishen.com/blog/probellm-failure-diagnosis/</link>
      <pubDate>Tue, 28 Apr 2026 00:00:00 +0000</pubDate>
      <guid>https://zilishen.com/blog/probellm-failure-diagnosis/</guid>
      <description>An eval score going down tells you something broke. It doesn&amp;rsquo;t tell you what. ProbeLLM is a new approach to automatic failure diagnosis that treats AI evaluation like an oral exam.</description>
    </item>
    <item>
      <title>Grading the graders: how do we know if an AI judge is any good?</title>
      <link>https://zilishen.com/blog/llm-judge-validation/</link>
      <pubDate>Fri, 23 Jan 2026 00:00:00 +0000</pubDate>
      <guid>https://zilishen.com/blog/llm-judge-validation/</guid>
      <description>We use AI systems to evaluate other AI systems. But validating those judges is harder than it looks — especially when the right answer isn&amp;rsquo;t as clear as it seems.</description>
    </item>
    <item>
      <title>Cloud Computing for (Observational) Astronomy</title>
      <link>https://zilishen.com/blog/cloud-computing-astro/</link>
      <pubDate>Thu, 26 Jan 2023 00:00:00 +0000</pubDate>
      <guid>https://zilishen.com/blog/cloud-computing-astro/</guid>
      <description>You&amp;rsquo;ve used the cloud, but have you thought about using it for astronomy? A roundup from a panel at #AAS241.</description>
    </item>
    <item>
      <title>How Not to Bury Ourselves Under Space Trash</title>
      <link>https://zilishen.com/blog/space-sustainability/</link>
      <pubDate>Thu, 24 Feb 2022 00:00:00 +0000</pubDate>
      <guid>https://zilishen.com/blog/space-sustainability/</guid>
      <description>Our planet is already blanketed by space debris. As small commercial satellites rapidly multiply, will humans block ourselves from space?</description>
    </item>
    <item>
      <title>From Star Parties to Observatories: An Astronomer&#39;s Journey</title>
      <link>https://zilishen.com/blog/star-parties-to-observatories/</link>
      <pubDate>Fri, 12 Nov 2021 00:00:00 +0000</pubDate>
      <guid>https://zilishen.com/blog/star-parties-to-observatories/</guid>
      <description>Reflecting on observatory trips and what I love about observing the night sky.</description>
    </item>
  </channel>
</rss>
