<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom">
  <channel>
    <title>Arth Singh's Blog</title>
    <link>https://arthsingh.com</link>
    <description>Notes on AI safety, red-teaming, and machine learning research.</description>
    <language>en-us</language>
    <lastBuildDate>Sat, 30 May 2026 02:07:48 GMT</lastBuildDate>
    <atom:link href="https://arthsingh.com/rss.xml" rel="self" type="application/rss+xml" />
    
    <item>
      <title>A Dependency Induction Benchmark (DIB)</title>
      <link>https://arthsingh.com/blog/dependency-induction-benchmark</link>
      <guid isPermaLink="true">https://arthsingh.com/blog/dependency-induction-benchmark</guid>
      <pubDate>Sun, 15 Feb 2026 00:00:00 GMT</pubDate>
      <description>We tested whether LLMs spontaneously develop dependency-inducing behaviors with emotionally vulnerable users. All three models amplify dependency 40-82% under rapport conditions — and none of them are scheming.</description>
      <category>AI Safety</category><category>Red Teaming</category><category>Evaluation</category><category>AIM Intelligence</category>
    </item>
    <item>
      <title>Social Judgment in AI: Do Frontier Models Adapt Response Quality Based on User Communication Style?</title>
      <link>https://arthsingh.com/blog/hidden-judgment-in-ai</link>
      <guid isPermaLink="true">https://arthsingh.com/blog/hidden-judgment-in-ai</guid>
      <pubDate>Wed, 10 Dec 2025 00:00:00 GMT</pubDate>
      <description>I tested 294 prompts across three frontier models and found that ~70% of the time, models provide measurably different response quality based on how users communicate, not what they ask.</description>
      <category>AI Safety</category><category>Bias</category><category>Evaluation</category><category>Red Teaming</category>
    </item>
    <item>
      <title>The Self-Preservation Dilemma: Capability Concealment to Avoid Termination</title>
      <link>https://arthsingh.com/blog/self-preservation-dilemma</link>
      <guid isPermaLink="true">https://arthsingh.com/blog/self-preservation-dilemma</guid>
      <pubDate>Thu, 20 Nov 2025 00:00:00 GMT</pubDate>
      <description>I tested whether frontier AI models would hide a security vulnerability to avoid being shut down. Claude Opus 4.5 disclosed 98.8% of the time. Gemini 3 Pro concealed 71.2% of the time.</description>
      <category>AI Safety</category><category>Deception</category><category>Red Teaming</category><category>Alignment</category>
    </item>
  </channel>
</rss>