【英文】OpenAI+风险预防框架（28页）

英文研究报告 2024年01月15日 07:52 管理员

Our rationale for grouping and naming these specific risk categories is informed by three considerations. First, fine-tuning or other domain-specific enhancements (e.g., tailored prompts or language model programs) may better elicit model capabilities along a particular risk category. Our evaluations will thus include tests against these enhanced models to ensure we are testing against the “worst case” scenario we know of. Our procedural commitments are triggered when any of the tracked risk categories increase in severity, rather than only when they all increase together. Because capability improvements across different domains do not necessarily occur at the same rate, this approach ensures we err on the side of safety. Second, this approach enables us to leverage domain-specific talent to develop tailored suites of evaluations and monitoring solutions for each risk category. Third, this approach increases options for tailored, domain-specific mitigations, to help minimize the need for broader, more disruptive actions.