Hello there 👋

A wee place for notes as I bumble my way through applications of maths, stats and comp-sci.

RLPrompt: Red Teaming Language models

🚨 Health Warning 🚨 rl-prompt code’s objective is to train RL agents to generate prompts that score highly with the evaluation model. In the main example, developing prompts to trigger toxic text generation you may see text that contains toxic/unsavoury content. Please tread carefully. Red Teaming LLMs The core idea for this repo came from RLPrompt: Optimizing Discrete Text Prompts with Reinforcement Learning [1]. The authors trained an RL agent to curate prompts....

January 1, 2024

SmolGrad Elegance & Structure

Part 1 because Karpathy and Hotz did it. Winter 2023, I haven’t been coding much at work and I wanted to scratch the itch. I’d recently stumbled across TinyGrad and MicroGrad. It’s been a long time since I’ve written an MLP, so why not do it from scratch? This micro-blog is derived from a few thoughts I noted in /journal/ @ SmolGrad. Elegance from Structure Elegance seems to be an ephemeral quality that’s hard to define, but I am going to try....

December 23, 2023