Cleaning up the init pipeline#1299
Conversation
Benchmark Report
Computer InformationBenchmark Results |
penelopeysm
left a comment
There was a problem hiding this comment.
Thanks!
In general I don't expect that there should be any performance difference between the conditionals vs dispatch -- for conditionals like if v isa V the Julia compiler will be able to statically infer which branch will be taken. Looking at the CI benchmarks, there's no noticeable difference (they're noisy and ±20% is normal), but it would be good to confirm these with proper benchmarks.
If my expectations are borne out, then this boils down to a stylistic choice. Personally, I prefer the conditionals (and wrote it that way) because it keeps the relevant logic all in one place, rather than having to jump between functions in different parts of the file. The extra indirection won't cause any issues in terms of Julia performance, but it means that a reader has to expand their context stack by two layers (init_transform + set_internal_transform).
| # In this case, we can't trust the transform stored in x because the _value_ | ||
| # in x may have been changed via unflatten!! without the transform being | ||
| # updated. Therefore, we always recompute the transform here. | ||
| VectorValue(x.val, from_vec_transform(dist)) |
There was a problem hiding this comment.
Could we retain this comment somewhere, since it is warning against potentially incorrect behaviour?
There was a problem hiding this comment.
Of course. Since I implemented a new outer constructor for VectorValue, I thought this might be redundant. After a second thought, I was clearly wrong lol.
| end | ||
| function LinkedVectorValue(val::AbstractVector, dist::Distribution) | ||
| return LinkedVectorValue(val, from_linked_vec_transform(dist)) | ||
| end |
There was a problem hiding this comment.
Regardless of the implementation of the DynamicPPL.init methods, which I'm open to discussing, I would like to request that we not add extra methods for this. I recognise this is a common pattern in Julia, but this is one of my most strongly held opinions and I have consistently argued against this sort of convenience methods (e.g. #780 (comment)). The reason is because these extra methods pretty much exist just to save characters, and I don't think that the saved characters outweigh the negatives:
-
An extra level of indirection makes it harder to keep everything in one's brain. Maybe in the future when coding is all done with LLMs this won't matter, but right now I still read the code.
-
Defining more methods for
LinkedVectorValuemakes it harder for tooling, e.g. LSP, to determine where the definition of a method is. LanguageServer.jl right now at least cannot differentiate between methods based on types, so when you use goto definition it just gives you a list of methods, and you have to figure out yourself which one is being called. It's basically a glorified grep, and it's easier to grep for something that exists once rather than twice.
There was a problem hiding this comment.
1. An extra level of indirection makes it harder to keep everything in one's brain.
While this is valid, I think there's some value in identifying patterns within the AbstractTransformedValue stack. It forces the user to best understand the transformations performed. For example, and I may be woefully unaware here, but I don't see a time where we would call VectorValue(val, from_linked_vec_transform(dist)). More than just Julian style programming this deters the (albeit advanced) user from doing something they perhaps shouldn't.
2. Defining more methods for `LinkedVectorValue` makes it harder for tooling, e.g. LSP, to determine where the definition of a method is
That makes sense, I figured there was likely a reason for this.
There was a problem hiding this comment.
Sure, I do see the point. At first I was going to say that I'd be happier if we made it an inner constructor rather than an outer constructor (the latter is merely a suggestion, the former actually statically prevents a user from making a wrong thing); then I realised that this bit of horrendous code precludes that.
Lines 329 to 341 in 0c1a629
Anyway, happy to go with the outer constructor. I'm mildly hopeful that in the future the transform field will just be completely removed too, which would eliminate the possibility of writing something incorrect (#1249 (comment)).
| # Need to return an unlinked value. We _could_ generate a VectorValue here, with the | ||
| # vectorisation transform. However, sometimes that's not needed (e.g. when | ||
| # evaluating with an OnlyAccsVarInfo). So we just return an UntransformedValue. If a | ||
| # downstream function requires a VectorValue, it's on them to generate it. | ||
| (raw_value, UntransformedValue(raw_value), zero(LogProbType)) | ||
| else | ||
| error("unknown target transform $target") | ||
| end | ||
| raw_value = get_raw_value(tv, dist) | ||
| return apply_transform(raw_value, target, tv, dist) | ||
| end | ||
|
|
||
| function apply_transform_strategy( |
There was a problem hiding this comment.
Same with these comments -- I'd like to retain them there because I'll have forgotten why I wrote these things 3 months down the road.
There was a problem hiding this comment.
One option instead of the comments could be to put a docstring on the internal apply_transform function.
Codecov Report❌ Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## main #1299 +/- ##
==========================================
+ Coverage 79.03% 79.13% +0.10%
==========================================
Files 48 48
Lines 3596 3585 -11
==========================================
- Hits 2842 2837 -5
+ Misses 754 748 -6 ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
|
@penelopeysm I have no problem throwing this one away in favor of conditionals. I just think some of the transformations encourage some potentially undesirable behavior like I mentioned here. This is actually a mistake I made while writing my own I would be more than happy to close if this is too much of an issue. |
guess who forgot to run the formatter again... Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
|
DynamicPPL.jl documentation for PR #1299 is available at: |
| # This block here is why we store transform_strategy inside the InitFromVector, as it | ||
| # allows for type stability. |
There was a problem hiding this comment.
Removing the conditional block means that this comment is dangling.
|
Sorry to be obstinate, but I still prefer conditionals! Happy to get a third opinion though-- @sunxd3 do you have a preference between the old function f(v)
return if v isa V1
result1
elseif v isa V2
result2
end
endvs get_result(v::V1) = result1
get_result(v::V2) = result2
[...]
function f(v)
return get_result(v)
endin this PR? |
|
I like conditionals better (i.e. the former) |
the tribe has spoken |
Following some comments I had in #1298, I decided to clean up the internal interface to make creating new strategies quite a bit easier.
Besides running unit tests, I haven't benchmarked anything. Although, since I rely on dispatch as opposed to conditional evaluation, this should theoretically be preferred.
Feel free to drop comments and lmk what you think.