Event details
Feb
18
DiScho Discovery Hours: LLM Steering
DiScho Discovery Hour is a weekly session where PUL's Digital Scholarship Specialists explore creative approaches to scholarship using technology. During this hour, anyone in the Princeton community is welcome to engage with our team as a Specialist hosts a session in the Commons Library Curiosity Studio.
In this session, we will explore recent work on LLM steering. This technique adds abstract concept vectors to a model's hidden state to alter its output. For example, we can add an Eifel Tower-related embedding and the model will speak as a "large metal structure" rather than a "helpful assistant." Come by to learn more and tinker with this new method.
Signs of introspection in large language models
Steering LLM Behavior Without Fine-Tuning
The Eiffel Tower Llama
Image: Disco Ball by iconfield
In this session, we will explore recent work on LLM steering. This technique adds abstract concept vectors to a model's hidden state to alter its output. For example, we can add an Eifel Tower-related embedding and the model will speak as a "large metal structure" rather than a "helpful assistant." Come by to learn more and tinker with this new method.
Signs of introspection in large language models
Steering LLM Behavior Without Fine-Tuning
The Eiffel Tower Llama
Image: Disco Ball by iconfield
University programs and activities are open to all eligible participants without regard to identity or other protected characteristics. Sponsorship of an event does not constitute institutional endorsement of external speakers or views presented.
View physical accessibility information for campus buildings and find accessible routes using the Princeton Campus Map app.