• User feedback
    OpenAI发布GPT-5.3 Instant:幻觉率最高降26.8%,ChatGPT日常对话体验全面升级 OpenAI 今日发布了 GPT-5.3 Instant,这是针对 ChatGPT 日常对话体验深度优化的版本。新模型显著提升了回答准确性与语境理解能力,减少了不必要的拒绝与冗长免责声明,并更好地融合网络信息。这意味着无论是查资料、解释问题还是日常对话,AI 回答都更流畅、更有用。对于经常使用 AI 提升工作效率的同学,这次更新值得关注与体验。 2026年3月3日,OpenAI在没有发布会、没有大规模宣传的情况下,悄然发布了GPT-5.3 Instant——这是ChatGPT目前使用频率最高的对话模型的专项优化版本。与以往动辄强调基准测试突破的更新不同,此次迭代的出发点异常务实:直接回应用户在日常使用中反复提出的真实痛点。 这次更新,OpenAI在解决什么问题? 如果你是ChatGPT的重度用户,以下场景一定不陌生:问一个完全无害的问题,却先收到一段"我无法帮助你做这件事"的声明;或者得到一个答案之前,先要读完一大段免责前言和道德提示,耐心早已消磨殆尽;又或者开启网页搜索后,模型给出的是一堆松散的链接列表,而不是真正经过整合的分析结论。 GPT-5.3 Instant 这次针对的,正是上述三大问题:大幅减少不必要的拒绝响应、去除过度防御性与说教式前言、提升网页搜索结果的整合质量与上下文关联能力。与此同时,新模型在识别用户提问潜台词方面也有所增强,能够更准确判断用户的真实意图,优先呈现最关键的信息,而非以"安全边界说明"作为开场白。 数据层面:幻觉率显著下降,高风险领域尤为突出 OpenAI为此次更新提供了两项内部量化评估。 第一项聚焦医疗、法律、金融三个高风险专业领域。结果显示,GPT-5.3 Instant在启用网页搜索时,幻觉率相较GPT-5.2 Instant下降了26.8%;在仅依赖内部知识(不联网)的情况下,降幅为19.7%。 第二项评估基于真实用户标记为"事实错误"的脱敏ChatGPT历史对话,结果显示网络搜索模式下幻觉减少22.5%,无网络模式下减少9.6%。 这两组数据的意义在于:它们来自真实使用场景,而非人工构造的测试集,因此对实际工作中依赖AI辅助决策的专业人士——尤其是HR、法务、财务等岗位——具有更直接的参考价值。 不是旗舰,但解决了旗舰解决不了的问题 需要厘清一点:GPT-5.3 Instant并非OpenAI的旗舰模型,它在产品线上属于"日常对话效率层",对标的是中端高频使用场景,而非复杂推理或长上下文处理。正因如此,这次更新的价值不在于"更聪明",而在于"更好用"——两者并不等价,但对于大多数企业用户而言,后者的优先级往往更高。 OpenAI明确表示,GPT-5.3 Instant的改进方向直接来源于用户反馈,而非来自外部评测榜单的压力。这一表态本身,标志着头部AI厂商的产品迭代逻辑正在发生结构性转变:从"能力竞赛"走向"体验精细化",从"我能做到"走向"用起来顺手"。 横向对比:与Claude Sonnet 4.6同台竞技,各有侧重 GPT-5.3 Instant的真正竞争对手,是Anthropic同级别的Claude Sonnet 4.6,而非旗舰级的Claude Opus 4.6。综合目前可查到的外部评测数据,两款模型在不同维度上各有优势,呈现出清晰的能力分工。 在编程与代理任务方面,Claude Sonnet 4.6在SWE-bench Verified上得分79.6%,仅比Opus 4.6低1.2个百分点,而定价比Opus低40%,被多项评测评为性价比最高的前沿编程模型。GPT-5.3 Instant并非以编程见长,OpenAI在该领域的主力是GPT-5.3 Codex。 在计算机使用(Computer Use)任务方面,Claude Sonnet 4.6的表现几乎是GPT-5.2的两倍,多个企业实测报告显示其在自动化操作流程中具备较强的自我纠错能力。 在写作与内容生成方面,OpenAI CEO Sam Altman曾公开承认GPT-5.2在写作质量上出现了回退,文字风格偏于生硬和过度正式,GPT-5.3 Instant对此有所改善,但目前尚缺乏充分的第三方独立评测数据支撑。Claude系列在写作流畅性和语气自然度方面,长期以来被认为具备优势。 在综合智能排名方面,根据Artificial Analysis Intelligence Index最新榜单,前五名依次为Gemini 3.1 Pro Preview(57分)、GPT-5.3 Codex(54分)、Claude Opus 4.6(53分)、Claude Sonnet 4.6(52分)、GPT-5.2(51分)。GPT-5.3 Codex与Claude Sonnet 4.6分差仅为2分,处于同一竞争梯队。 在上下文窗口方面,Claude Sonnet 4.6支持100万token的长上下文,GPT-5.3 Codex为40万token,前者在处理长文档、大规模代码库或多文件任务时具有明显结构性优势。 AI助手的下一个竞争维度,是"用起来不烦人" GPT-5.3 Instant的发布,代表了一种清醒的产品判断:对于真正将AI嵌入日常工作流的用户而言,响应是否直接、是否准确、是否不废话,其优先级往往高于模型在某项基准上多得了几分。 AI助手的竞争,正在从实验室里的跑分游戏,回归到办公桌上的真实摩擦。OpenAI这次的方向是对的。 而Anthropic的Claude Sonnet 4.6,目前在编程、长上下文处理和计算机使用任务上保持着同级别的领先优势。两款产品服务的是不同的核心使用场景,企业用户在做工具选型时,更应关注自身工作流的实际需求,而非单一的榜单排名。 这场竞争没有终点,但评判标准正在变得越来越务实。 本文数据来源:OpenAI官方发布页面、Artificial Analysis Intelligence Index、公开第三方评测报告。
    User feedback
    2026年03月03日
  • User feedback
    Workday: It’s Time to Close the AI Trust Gap Workday, a leading provider of enterprise cloud applications for finance and human resources, has pressed a global study recently recognizing the  importance of addressing the AI trust gap. They believe that trust is a critical factor when it comes to implementing artificial intelligence (AI) systems, especially in areas such as workforce management and human resources. Research results are as follows: At the leadership level, only 62% welcome AI, and only 62% are confident their organization will ensure AI is implemented in a responsible and trustworthy way. At the employee level, these figures drop even lower to 52% and 55%, respectively. 70% of leaders say AI should be developed in a way that easily allows for human review and intervention. Yet 42% of employees believe their company does not have a clear understanding of which systems should be fully automated and which require human intervention. 1 in 4 employees (23%) are not confident that their organization will put employee interests above its own when implementing AI. (compared to 21% of leaders) 1 in 4 employees (23%) are not confident that their organization will prioritize innovating with care for people over innovating with speed. (compared to 17% of leaders) 1 in 4 employees (23%) are not confident that their organization will ensure AI is implemented in a responsible and trustworthy way. (compared to 17% of leaders) “We know how these technologies can benefit economic opportunities for people—that’s our business. But people won’t use technologies they don’t trust. Skills are the way forward, and not only skills, but skills backed by a thoughtful, ethical, responsible implementation of AI that has regulatory safeguards that help facilitate trust.” said Chandler C. Morse, VP, Public Policy, Workday. Workday’s study focuses on various key areas: Section 1: Perspectives align on AI’s potential and responsible use. “At the outset of our research, we hypothesized that there would be a general alignment between business leaders and employees regarding their overall enthusiasm for AI. Encouragingly, this has proven true: leaders and employees are aligned in several areas, including AI’s potential for business transformation, as well as efforts to reduce risk and ensure trustworthy AI.” Both leaders and employees believe in and hope for a transformation scenario* with AI. Both groups agree AI implementation should prioritize human control. Both groups cite regulation and frameworks as most important for trustworthy AI. Section 2: When it comes to the development of AI, the trust gap between leaders and employees diverges even more. “While most leaders and employees agree on the value of AI and the need for its careful implementation, the existing trust gap becomes even more pronounced when it comes to developing AI in a way that facilitates human review and intervention.” Employees aren’t confident their company takes a people-first approach. At all levels, there’s the worry that human welfare isn’t a leadership priority. Section 3: Data on AI governance and use is not readily visible to employees. “While employees are calling for regulation and ethical frameworks to ensure that AI is trustworthy, there is a lack of awareness across all levels of the workforce when it comes to collaborating on AI regulation and sharing responsible AI guidelines.” Closing remarks: How Workday is closing the AI trust gap. Transparency: Workday can prioritize transparency in their AI systems. Providing clear explanations of how AI algorithms make decisions can help build trust among users. By revealing the factors, data, and processes that contribute to AI-driven outcomes, Workday can ensure transparency in their AI applications. Explainability: Workday can work towards making their AI systems more explainable. This means enabling users to understand the reasoning behind AI-generated recommendations or decisions. Employing techniques like interpretable machine learning can help users comprehend the logic and factors influencing the AI-driven outcomes. Ethical considerations: Working on ethical frameworks and guidelines for AI use can play a crucial role in closing the trust gap. Workday can ensure that their AI systems align with ethical principles, such as fairness, accountability, and avoiding bias. This might involve rigorous testing, auditing, and ongoing monitoring of AI models to detect and mitigate any potential biases or unintended consequences. User feedback and collaboration: Engaging with users and seeking their feedback can be key to building trust. Workday can involve their customers and end-users in the AI development process, gathering insights and acting on user concerns. Collaboration and open communication will help Workday enhance their AI systems based on real-world feedback and user needs. Data privacy and security: Ensuring robust data privacy and security measures is vital for instilling trust in AI systems. Workday can prioritize data protection and encryption, complying with industry standards and regulations. By demonstrating strong data privacy practices, they can alleviate concerns associated with AI-driven data processing. SOURCE Workday
    User feedback
    2024年01月11日