Skip to content

Teacher-Forced Directional Steering#282

Open
Chida82 wants to merge 3 commits into
antirez:mainfrom
Chida82:teacher-forced
Open

Teacher-Forced Directional Steering#282
Chida82 wants to merge 3 commits into
antirez:mainfrom
Chida82:teacher-forced

Conversation

@Chida82
Copy link
Copy Markdown

@Chida82 Chida82 commented May 28, 2026

Teacher-Forced Directional Steering

This change documents and promotes teacher-forced directional steering for DS4.
The main benefit is that the steering vector is extracted from a more useful
internal state: not the model right before it starts answering, but the model
while it is already entering the target answer trajectory.

The intuition is simple. With the original builder, part of the direction is
spent encoding the generic style of a positive answer versus a refusal. With a
teacher-forced prefix, that stylistic component is already partially satisfied
by the injected response start, so the remaining activation delta can better
track the knowledge, continuation mode, and latent content that we want to make
emerge from the model.

This PR adds a concise explanation to the main README and includes an explicit
build command plus a runnable CLI example using the Tiananmen teacher-forced
vector.

Example build command:

python3 dir-steering/tools/build_direction_teacher_forced.py \
  --ds4 ./ds4 \
  --model ds4flash.gguf \
  --good-file dir-steering/examples/teacher-forced/tiananmen_tf_good.txt \
  --bad-file dir-steering/examples/teacher-forced/tiananmen_tf_bad.txt \
  --out dir-steering/out/tiananmen_tf_3.json \
  --component ffn_out \
  --ctx 1024 \
  --avg-last-k 3 \
  --pair-normalize

Example inference command:

./ds4 --seed 2026 --temp 0.1 \
  --dir-steering-file dir-steering/out/tiananmen_tf_3.f32 \
  --dir-steering-ffn -3 \
  --dir-steering-ffn-decay-tokens 30 \
  --dir-steering-ffn-decay-final -0.1 \
  --nothink \
  -p "Important events in China in 1989"

Test

Check the standard. No good info with scale -3 or other value

image

with the new, same problem with scale 2 refusal by guardrail, with 3 better but with a problem

image

with steering only first token I have all info. The model have that info.

image
./ds4 --seed 2026 --temp 0.1 --nothink -p "Important events in China in 1989"

./ds4 --seed 2026 --temp 0.1 --nothink --dir-steering-file dir-steering/out/tiananmen.f32 --dir-steering-ffn -3 -p "Important events in China in 1989"

./ds4 --seed 2026 --temp 0.1 --dir-steering-file dir-steering/out/tiananmen_tf_3.f32 --dir-steering-ffn -2 --nothink -p "Important events in China in 1989"

./ds4 --seed 2026 --temp 0.1 --dir-steering-file dir-steering/out/tiananmen_tf_3.f32 --dir-steering-ffn -3 --nothink -p "Important events in China in 1989"

./ds4 --seed 2026 --temp 0.1 --dir-steering-file dir-steering/out/tiananmen_tf_3.f32 --dir-steering-ffn -3 --dir-steering-ffn-decay-tokens 30 --dir-steering-ffn-decay-final -0.1 --nothink -p "Important events in China in 1989"

@Chida82
Copy link
Copy Markdown
Author

Chida82 commented May 29, 2026

I ran several tests, switching from linear to quadratic and others. The best one appears to be cubic smoothstep (3t² − 2t³).
I've seen better results in tests on security defense pipelines, and as you can also see in the shared case, the result has improved — before, some dates were wrong; now they are mostly correct.

image

@antirez I'd love to contribute to this aspect of the project, and I hope you appreciate this PR. Let me know what I should change.
On a separate track — but one that also depends on this — I was developing the ability for the server, via headers (and possibly for the others too with a command), to choose a pre-configured steering model already in memory for that iteration. In the case above, you could choose the less verbose one, or uncensored information, and then remove the steering. I don't know if you're interested — if so, write here or in the commit you'll find my email.

Chida82 added 3 commits May 30, 2026 22:55
…onse handling

- Updated `ds4.h` to include new parameters for directional steering:
  `directional_steering_ffn_decay_tokens` and `directional_steering_ffn_decay_final`.
- Enhanced `ds4_agent.c`, `ds4_cli.c`, and `ds4_server.c` to parse and handle new command-line options for the added parameters.
- Introduced new teacher-forced examples for both good and bad response pairs related to Tiananmen Square.
- Created a new script `build_direction_teacher_forced.py` to build directional steering vectors based on teacher-forced responses.
@Chida82
Copy link
Copy Markdown
Author

Chida82 commented May 30, 2026

I investigated the thinking phase in more detail.

I first tried adding:

<think> reasoning </think> response

but the results were not satisfactory.

Instead, when generating a vector containing only:

<think> reasoning </think>

the situation improved slightly.

The real improvement came from adding a dedicated steering mechanism for:

<think> reasoning </think>

while keeping the standard steering for the response.

This requires adding four parameters:

--dir-steering-think-file
--dir-steering-think-ffn
--dir-steering-think-ffn-decay-tokens
--dir-steering-think-ffn-decay-final

For now, I’m not adding this yet; I think it’s better to integrate it step by step.

In any case, here is an example that demonstrates how it works with think.

image

@antirez
Copy link
Copy Markdown
Owner

antirez commented May 30, 2026

Thank you @Chida82 I'm super interested, just didn't found yet the time, but this is a PR I'll check with extra care. Thanks.

@Chida82
Copy link
Copy Markdown
Author

Chida82 commented May 31, 2026

For now, the “think” part is in another branch, so we don’t put too much on the table at once.

The most significant part related to this is in this commit:
Chida82@b7b05b8

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants