<?xml version="1.0" encoding="utf-8" standalone="yes"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:content="http://purl.org/rss/1.0/modules/content/">
  <channel>
    <title>Zili Shen</title>
    <link>https://zilishen.com/</link>
    <description>Recent content on Zili Shen</description>
    <generator>Hugo</generator>
    <language>en-us</language>
    <lastBuildDate>Thu, 07 May 2026 00:00:00 +0000</lastBuildDate>
    <atom:link href="https://zilishen.com/index.xml" rel="self" type="application/rss+xml" />
    <item>
      <title>Agentic AI evals: lessons from real life</title>
      <link>https://zilishen.com/blog/agentic-ai-evals/</link>
      <pubDate>Thu, 07 May 2026 00:00:00 +0000</pubDate>
      <guid>https://zilishen.com/blog/agentic-ai-evals/</guid>
      <description>AI products can change under your feet. Here&amp;rsquo;s what I learned about measuring whether they do what you think they should.</description>
    </item>
    <item>
      <title>Automatic failure diagnosis</title>
      <link>https://zilishen.com/blog/probellm-failure-diagnosis/</link>
      <pubDate>Tue, 28 Apr 2026 00:00:00 +0000</pubDate>
      <guid>https://zilishen.com/blog/probellm-failure-diagnosis/</guid>
      <description>An eval score going down tells you something broke. It doesn&amp;rsquo;t tell you what. ProbeLLM is a new approach to automatic failure diagnosis that treats AI evaluation like an oral exam.</description>
    </item>
    <item>
      <title>Grading the graders: how do we know if an AI judge is any good?</title>
      <link>https://zilishen.com/blog/llm-judge-validation/</link>
      <pubDate>Fri, 23 Jan 2026 00:00:00 +0000</pubDate>
      <guid>https://zilishen.com/blog/llm-judge-validation/</guid>
      <description>We use AI systems to evaluate other AI systems. But validating those judges is harder than it looks — especially when the right answer isn&amp;rsquo;t as clear as it seems.</description>
    </item>
    <item>
      <title>Cloud Computing for (Observational) Astronomy</title>
      <link>https://zilishen.com/blog/cloud-computing-astro/</link>
      <pubDate>Thu, 26 Jan 2023 00:00:00 +0000</pubDate>
      <guid>https://zilishen.com/blog/cloud-computing-astro/</guid>
      <description>You&amp;rsquo;ve used the cloud, but have you thought about using it for astronomy? A roundup from a panel at #AAS241.</description>
    </item>
    <item>
      <title>How Not to Bury Ourselves Under Space Trash</title>
      <link>https://zilishen.com/blog/space-sustainability/</link>
      <pubDate>Thu, 24 Feb 2022 00:00:00 +0000</pubDate>
      <guid>https://zilishen.com/blog/space-sustainability/</guid>
      <description>Our planet is already blanketed by space debris. As small commercial satellites rapidly multiply, will humans block ourselves from space?</description>
    </item>
    <item>
      <title>From Star Parties to Observatories: An Astronomer&#39;s Journey</title>
      <link>https://zilishen.com/blog/star-parties-to-observatories/</link>
      <pubDate>Fri, 12 Nov 2021 00:00:00 +0000</pubDate>
      <guid>https://zilishen.com/blog/star-parties-to-observatories/</guid>
      <description>Reflecting on observatory trips and what I love about observing the night sky.</description>
    </item>
    <item>
      <title>About</title>
      <link>https://zilishen.com/about/</link>
      <pubDate>Mon, 01 Jan 0001 00:00:00 +0000</pubDate>
      <guid>https://zilishen.com/about/</guid>
      <description>&lt;p&gt;Zili is a Member of Technical Staff at &lt;a href=&#34;https://p-1.ai&#34;&gt;P-1 AI&lt;/a&gt;, where she works as an AI eval research engineer specializing in LLM-based agents.&lt;/p&gt;
&lt;p&gt;She graduated from Yale in 2025 with a Ph.D. in astrophysics. For her thesis, she led the science analysis of the &lt;a href=&#34;https://arxiv.org/abs/2407.05200&#34;&gt;Dragonfly Ultrawide Survey&lt;/a&gt;, mapping 10,000 square degrees of the northern sky using a custom data pipeline on AWS.&lt;/p&gt;
&lt;p&gt;She writes about science. She contributed 18 articles to &lt;a href=&#34;https://astrobites.org/author/zshen/&#34;&gt;Astrobites&lt;/a&gt; and worked at the Yale Poorvu Center as a &lt;a href=&#34;https://poorvucenter.yale.edu/people/zili-shen&#34;&gt;Graduate Writing Fellow&lt;/a&gt;, offering one-on-one writing consultations and leading workshops.&lt;/p&gt;</description>
    </item>
    <item>
      <title>Research</title>
      <link>https://zilishen.com/research/</link>
      <pubDate>Mon, 01 Jan 0001 00:00:00 +0000</pubDate>
      <guid>https://zilishen.com/research/</guid>
      <description>&lt;h2 id=&#34;ai-safety--evaluations&#34;&gt;AI Safety &amp;amp; Evaluations&lt;/h2&gt;
&lt;p&gt;&lt;strong&gt;Current role:&lt;/strong&gt; Member of Technical Staff at &lt;a href=&#34;https://p-1.ai&#34;&gt;P-1 AI&lt;/a&gt;, working as an AI eval research engineer specializing in LLM-based agents.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Previous:&lt;/strong&gt; &lt;a href=&#34;https://algoverseairesearch.org/ai-safety-fellowship&#34;&gt;Algoverse AI Safety Fellowship&lt;/a&gt; — Evaluating agents on long-horizon tasks.&lt;/p&gt;
&lt;p&gt;LLM agents are increasingly deployed to carry out complex, multi-step tasks on behalf of users. During this process, are agents able to retain their alignment training, remember the original goal, and adapt to unexpected changes in the environment?&lt;/p&gt;</description>
    </item>
  </channel>
</rss>
