ZJIT: Use shape id as cache key for object layout#17187
Conversation
64f650e to
169638a
Compare
Since ruby#17158, we can use the shape id as our cache key for determining object layout. This patch changes ZJIT to use shape id instead of testing all bits. Given this program: ```ruby class Foo def initialize; @bar = 123; end def read; @bar; end def add; @baz = 123; end end foo = Foo.new 3.times { foo = Foo.new foo.read foo.add foo.read } ``` We end up with a polymorphic read in the `read` method. On master, the HIR looks like this: ``` Optimized HIR: fn read@../test.rb:9: bb1(): EntryPoint interpreter v1:HeapBasicObject = LoadSelf Jump bb3(v1) bb2(): EntryPoint JIT(0) v4:HeapBasicObject = LoadArg :self@0 Jump bb3(v4) bb3(v6:HeapBasicObject): PatchPoint SingleRactorMode v12:CUInt64 = LoadField v6, :RBASIC_FLAGS@0x0 v14:CUInt64[0xffffffff0000001f] = Const CUInt64(0xffffffff0000001f) v15:CPtr[CPtr(0x8000800000001)] = Const CPtr(0x8000800000001) v16 = RefineType v15, CUInt64 v17:CInt64 = IntAnd v12, v14 v18:CBool = IsBitEqual v17, v16 CondBranch v18, bb5(), bb6() bb5(): v20:BasicObject = LoadField v6, :@bar@0x10 Jump bb4(v20) bb6(): v22:CUInt64[0xffffffff0000001f] = Const CUInt64(0xffffffff0000001f) v23:CPtr[CPtr(0x8000900000001)] = Const CPtr(0x8000900000001) v24 = RefineType v23, CUInt64 v25:CInt64 = IntAnd v12, v22 v26:CBool = IsBitEqual v25, v24 CondBranch v26, bb7(), bb8() bb7(): v28:BasicObject = LoadField v6, :@bar@0x10 Jump bb4(v28) bb8(): v30:BasicObject = GetIvar v6, :@bar Jump bb4(v30) bb4(v13:BasicObject): CheckInterrupts Return v13 ``` On this branch, the HIR is like this: ``` Optimized HIR: fn read@../test.rb:9: bb1(): EntryPoint interpreter v1:HeapBasicObject = LoadSelf Jump bb3(v1) bb2(): EntryPoint JIT(0) v4:HeapBasicObject = LoadArg :self@0 Jump bb3(v4) bb3(v6:HeapBasicObject): PatchPoint SingleRactorMode v13:CShape = LoadField v6, :shape_id@0x4 v14:CShape[0x80008] = Const CShape(0x80008) v15:CBool = IsBitEqual v14, v13 CondBranch v15, bb5(), bb6() bb5(): v17:BasicObject = LoadField v6, :@bar@0x10 Jump bb4(v17) bb6(): v19:CShape = LoadField v6, :shape_id@0x4 v20:CShape[0x80009] = Const CShape(0x80009) v21:CBool = IsBitEqual v20, v19 CondBranch v21, bb7(), bb8() bb7(): v23:BasicObject = LoadField v6, :@bar@0x10 Jump bb4(v23) bb8(): v25:BasicObject = GetIvar v6, :@bar Jump bb4(v25) bb4(v12:BasicObject): CheckInterrupts Return v12 ``` We're able to avoid loading all of the flags, applying a mask, and the testing. The machine code for bb3 looks like this on master (I've removed the nop buffers for patch points): ``` # Insn: v12 LoadField v6, :RBASIC_FLAGS@0x0 # Load field id=RBASIC_FLAGS offset=0 0x122c50124: ldur x1, [x0] # Insn: v14 Const CUInt64(0xffffffff0000001f) # Insn: v15 Const CPtr(0x8000800000001) # Insn: v16 RefineType v15, CUInt64 # Insn: v17 IntAnd v12, v14 0x122c50128: and x2, x1, #0xffffffff0000001f # Insn: v18 IsBitEqual v17, v16 0x122c5012c: mov x3, #1 0x122c50130: movk x3, #0, lsl ruby#16 0x122c50134: movk x3, ruby#8, lsl ruby#32 0x122c50138: movk x3, ruby#8, lsl ruby#48 0x122c5013c: cmp x2, x3 0x122c50140: mov x2, #1 0x122c50144: mov x3, #0 0x122c50148: csel x2, x2, x3, eq 0x122c5014c: tst x2, x2 0x122c50150: b.ne #0x122c501e8 ``` On this branch it looks like this: ``` # Insn: v13 LoadField v6, :shape_id@0x4 # Load field id=shape_id offset=4 0x124dd0124: ldur w1, [x0, #4] # Insn: v14 Const CShape(0x80008) # Insn: v15 IsBitEqual v14, v13 0x124dd0128: mov x2, ruby#8 0x124dd012c: movk x2, ruby#8, lsl ruby#16 0x124dd0130: cmp x2, x1 0x124dd0134: mov x1, #1 0x124dd0138: mov x2, #0 0x124dd013c: csel x1, x1, x2, eq 0x124dd0140: tst x1, x1 0x124dd0144: b.ne #0x124dd01d4 ``` We've eliminated the `and` instruction and only need to do a 32 bit load for the shape.
Head branch was pushed to by a user without write access
nirvdrum
left a comment
There was a problem hiding this comment.
Nice cleanup. This will make looking at the HIR a little easier. I had a couple of general comments. Please feel free to resolve or ignore as you see fit.
I need to better familiarize myself with the shape code to be sure the PR output is semantically equivalent. But, I'm also fairly confident the test suite would yell at you if it weren't.
|
|
||
| match layout { | ||
| ShapeLayout::RClass | ShapeLayout::RData => { | ||
| let offset = if layout == ShapeLayout::RClass { |
There was a problem hiding this comment.
It's not a major issue, but it looks a little odd to me to have a multivariant match and the immediately discriminate between the two of them. I'm wondering if it'd make sense to capture the field loading in a helper function and then update the match layout to match each variant separately.
There was a problem hiding this comment.
Ya I agree. I couldn't think of a better way to do it. TBH I would like to get it so that TDATA offset and RCLASS offset are the same and then we just eliminate this branch.
I'm wondering if it'd make sense to capture the field loading in a helper function and then update the
match layoutto match each variant separately.
We could. I'm not sure if it'd be any cleaner though 🤷
The RObject layout code below is almost identical to this branch. If we could get RClass to have the same offset as TDATA, then fix the flags function, it feels like we could collapse most of this. It's the plan I have anyway (though it's kind of a low priority plan).
| let expected_shape = profiled_type.shape(); | ||
| let (expected_rbasic_flags, rbasic_flags_mask) = profiled_type.rbasic_flags_and_mask(); | ||
| assert!(expected_shape.is_valid()); | ||
| let profiled_shape = profiled_type.shape(); |
There was a problem hiding this comment.
I realize you just replaced existing code, but I could not figure out how these two lines related to the comment about falling through to DefinedIvar. Then I realize it was just an unfortunate grouping. An extra newline here would help code clarity, IMHO.
| let rbasic_flags_mask = fun.push_insn(block, Insn::Const { val: Const::CUInt64(rbasic_flags_mask) }); | ||
| if profiled_shape.is_complex() { continue; } | ||
| if seen_shape.contains(&profiled_shape) { continue; } | ||
| seen_shape.push(profiled_shape); |
There was a problem hiding this comment.
Same deal here as my above comment. Physically separating the early-exit and happy path with a newline would help, IMHO. Although, by this point I was primed for it. And, also, not really anything to do with your PR. I'm just flagging it because it made reviewing a little harder.
| let expected_rbasic_flags = fun.push_insn(block, Insn::RefineType { val: expected_rbasic_flags, new_type: types::CUInt64 }); | ||
| let masked = fun.push_insn(block, Insn::IntAnd { left: rbasic_flags, right: rbasic_flags_mask}); | ||
| let has_shape_and_type = fun.push_insn(block, Insn::IsBitEqual { left: masked, right: expected_rbasic_flags }); | ||
| if profiled_shape.is_complex() { continue; } |
There was a problem hiding this comment.
General note: it'd be nice if we were able to deduplicate between definedivar and getinstancevariable. I realize the code was already duplicated, but it was a bit jarring re-reviewing essentially the same code.
There was a problem hiding this comment.
Ya makes sense. I'll try to follow up with a refactoring PR. I want to refactor polymorphic IV loads anyway, so it might be a good time to do it.
Since #17158, we can use the shape id as our cache key for determining object layout. This patch changes ZJIT to use shape id instead of testing all bits.
Given this program:
We end up with a polymorphic read in the
readmethod. On master, the HIR looks like this:On this branch, the HIR is like this:
We're able to avoid loading all of the flags, applying a mask, and the testing.
The machine code for bb3 looks like this on master (I've removed the nop buffers for patch points):
On this branch it looks like this:
We've eliminated the
andinstruction and only need to do a 32 bit load for the shape.