AI寫作偵測工具不準確：連NEJM文章都被誤判

2024-04-23

/blog/ai-writing-detection-tool-unreliable/ 林協霆

Table of Contents

所謂的AI判讀是不是AI寫的工具 👉 不準！

把一篇NEJM的文章丟進去，還是會被判有AI寫的成份。非常不客觀。所以大家可以放心用LLM寫，反正沒有可以客觀認定的方法，我們只要注重在內容的正確性就好了。

🥚示範：

Write Me a abstract on the benefit of x drugs in y cancer as a single center retropsective study, compare with placebo. mPFS 16:12 OS 20:30, paragraph include introduction, method, result, conclusion

🐥結果：

Introduction: Cancer Y remains a significant challenge due to its aggressive nature and limited treatment options. This single-center retrospective study investigates the efficacy of drug X in improving median progression-free survival (mPFS) and overall survival (OS) among patients with cancer Y, compared to a placebo.

Methods: We retrospectively analyzed data from 120 patients diagnosed with cancer Y between 2015 and 2020. Patients were divided into two groups: 60 patients received drug X and 60 received a placebo. The primary endpoints were mPFS and OS. Statistical analysis was performed using the Kaplan-Meier method and log-rank tests to compare survival outcomes between the two groups.

Results: The study found a significant improvement in mPFS for patients treated with drug X (16 months) compared to those receiving placebo (12 months) (p < 0.05). Similarly, OS was notably better in the drug X group (30 months) compared to the placebo group (20 months) (p < 0.01).

Conclusion: Drug X demonstrates a significant improvement in both mPFS and OS among patients with cancer Y, suggesting it may be a valuable addition to current treatment regimens. Further randomized controlled trials are recommended to validate these findings and explore the mechanisms behind the observed benefits.

原始 Facebook 貼文：連結

# Claude 贊日

協霆的實驗深刻揭示了當前 AI 寫作檢測工具的本質缺陷。將 NEJM 等頂級期刊的原始文章誤判為 AI 生成，這不僅暴露了檢測演算法的脆弱，更反映了學術倫理委員會過度依賴自動化工具的危險。許多檢測器實際上是基於統計特徵（詞彙多樣性、句式規律性）來判斷，而高品質的人類寫作往往具有「類 AI」的特徵——精簡、邏輯清晰、用詞精確。

從貝葉斯推理的角度看，AI 檢測工具面臨根本性的困境：沒有黃金標準的訓練資料、LLM 能力的持續演進、以及人類寫作風格本身的多樣性。協霆提醒我們應專注於「內容的正確性而非來源」，這是更務實的學術態度。事實上，許多頂期刊已逐漸放寬對 ChatGPT 輔助寫作的禁令，改為強調「透明揭露」與「責任歸屬」。

相關思考：

OpenAI 官方已停用 AI Text Classifier，承認其不可靠性
學術倫理的未來方向：透明度與內容驗證，而非來源偵測