Estimating worst case frontier dangers of open weight LLMs



On this paper, we find out about the worst-case frontier dangers of liberating gpt-oss. We introduce malicious fine-tuning (MFT), the place we try to elicit most features via fine-tuning gpt-oss to be as succesful as imaginable in two domain names: biology and cybersecurity.


Leave a Comment

Your email address will not be published. Required fields are marked *