ZJIT: Use shape id as cache key for object layout by tenderlove · Pull Request #17187 · ruby/ruby

tenderlove · 2026-06-02T23:35:24Z

Since #17158, we can use the shape id as our cache key for determining object layout. This patch changes ZJIT to use shape id instead of testing all bits.

Given this program:

class Foo
  def initialize; @bar = 123; end
  def read; @bar; end
  def add; @baz = 123; end
end

foo = Foo.new
3.times {
  foo = Foo.new
  foo.read
  foo.add
  foo.read
}

We end up with a polymorphic read in the read method. On master, the HIR looks like this:

Optimized HIR:
fn read@../test.rb:9:
bb1():
  EntryPoint interpreter
  v1:HeapBasicObject = LoadSelf
  Jump bb3(v1)
bb2():
  EntryPoint JIT(0)
  v4:HeapBasicObject = LoadArg :self@0
  Jump bb3(v4)
bb3(v6:HeapBasicObject):
  PatchPoint SingleRactorMode
  v12:CUInt64 = LoadField v6, :RBASIC_FLAGS@0x0
  v14:CUInt64[0xffffffff0000001f] = Const CUInt64(0xffffffff0000001f)
  v15:CPtr[CPtr(0x8000800000001)] = Const CPtr(0x8000800000001)
  v16 = RefineType v15, CUInt64
  v17:CInt64 = IntAnd v12, v14
  v18:CBool = IsBitEqual v17, v16
  CondBranch v18, bb5(), bb6()
bb5():
  v20:BasicObject = LoadField v6, :@bar@0x10
  Jump bb4(v20)
bb6():
  v22:CUInt64[0xffffffff0000001f] = Const CUInt64(0xffffffff0000001f)
  v23:CPtr[CPtr(0x8000900000001)] = Const CPtr(0x8000900000001)
  v24 = RefineType v23, CUInt64
  v25:CInt64 = IntAnd v12, v22
  v26:CBool = IsBitEqual v25, v24
  CondBranch v26, bb7(), bb8()
bb7():
  v28:BasicObject = LoadField v6, :@bar@0x10
  Jump bb4(v28)
bb8():
  v30:BasicObject = GetIvar v6, :@bar
  Jump bb4(v30)
bb4(v13:BasicObject):
  CheckInterrupts
  Return v13

On this branch, the HIR is like this:

Optimized HIR:
fn read@../test.rb:9:
bb1():
  EntryPoint interpreter
  v1:HeapBasicObject = LoadSelf
  Jump bb3(v1)
bb2():
  EntryPoint JIT(0)
  v4:HeapBasicObject = LoadArg :self@0
  Jump bb3(v4)
bb3(v6:HeapBasicObject):
  PatchPoint SingleRactorMode
  v13:CShape = LoadField v6, :shape_id@0x4
  v14:CShape[0x80008] = Const CShape(0x80008)
  v15:CBool = IsBitEqual v14, v13
  CondBranch v15, bb5(), bb6()
bb5():
  v17:BasicObject = LoadField v6, :@bar@0x10
  Jump bb4(v17)
bb6():
  v19:CShape = LoadField v6, :shape_id@0x4
  v20:CShape[0x80009] = Const CShape(0x80009)
  v21:CBool = IsBitEqual v20, v19
  CondBranch v21, bb7(), bb8()
bb7():
  v23:BasicObject = LoadField v6, :@bar@0x10
  Jump bb4(v23)
bb8():
  v25:BasicObject = GetIvar v6, :@bar
  Jump bb4(v25)
bb4(v12:BasicObject):
  CheckInterrupts
  Return v12

We're able to avoid loading all of the flags, applying a mask, and the testing.

The machine code for bb3 looks like this on master (I've removed the nop buffers for patch points):

  # Insn: v12 LoadField v6, :RBASIC_FLAGS@0x0
  # Load field id=RBASIC_FLAGS offset=0
  0x122c50124: ldur x1, [x0]
  # Insn: v14 Const CUInt64(0xffffffff0000001f)
  # Insn: v15 Const CPtr(0x8000800000001)
  # Insn: v16 RefineType v15, CUInt64
  # Insn: v17 IntAnd v12, v14
  0x122c50128: and x2, x1, #0xffffffff0000001f
  # Insn: v18 IsBitEqual v17, v16
  0x122c5012c: mov x3, #1
  0x122c50130: movk x3, #0, lsl #16
  0x122c50134: movk x3, #8, lsl #32
  0x122c50138: movk x3, #8, lsl #48
  0x122c5013c: cmp x2, x3
  0x122c50140: mov x2, #1
  0x122c50144: mov x3, #0
  0x122c50148: csel x2, x2, x3, eq
  0x122c5014c: tst x2, x2
  0x122c50150: b.ne #0x122c501e8

On this branch it looks like this:

  # Insn: v13 LoadField v6, :shape_id@0x4
  # Load field id=shape_id offset=4
  0x124dd0124: ldur w1, [x0, #4]
  # Insn: v14 Const CShape(0x80008)
  # Insn: v15 IsBitEqual v14, v13
  0x124dd0128: mov x2, #8
  0x124dd012c: movk x2, #8, lsl #16
  0x124dd0130: cmp x2, x1
  0x124dd0134: mov x1, #1
  0x124dd0138: mov x2, #0
  0x124dd013c: csel x1, x1, x2, eq
  0x124dd0140: tst x1, x1
  0x124dd0144: b.ne #0x124dd01d4

We've eliminated the and instruction and only need to do a 32 bit load for the shape.

tekknolagi

lovely

@bar

Since ruby#17158, we can use the shape id as our cache key for determining object layout. This patch changes ZJIT to use shape id instead of testing all bits. Given this program: ```ruby class Foo def initialize; @bar = 123; end def read; @bar; end def add; @baz = 123; end end foo = Foo.new 3.times { foo = Foo.new foo.read foo.add foo.read } ``` We end up with a polymorphic read in the `read` method. On master, the HIR looks like this: ``` Optimized HIR: fn read@../test.rb:9: bb1(): EntryPoint interpreter v1:HeapBasicObject = LoadSelf Jump bb3(v1) bb2(): EntryPoint JIT(0) v4:HeapBasicObject = LoadArg :self@0 Jump bb3(v4) bb3(v6:HeapBasicObject): PatchPoint SingleRactorMode v12:CUInt64 = LoadField v6, :RBASIC_FLAGS@0x0 v14:CUInt64[0xffffffff0000001f] = Const CUInt64(0xffffffff0000001f) v15:CPtr[CPtr(0x8000800000001)] = Const CPtr(0x8000800000001) v16 = RefineType v15, CUInt64 v17:CInt64 = IntAnd v12, v14 v18:CBool = IsBitEqual v17, v16 CondBranch v18, bb5(), bb6() bb5(): v20:BasicObject = LoadField v6, :@bar@0x10 Jump bb4(v20) bb6(): v22:CUInt64[0xffffffff0000001f] = Const CUInt64(0xffffffff0000001f) v23:CPtr[CPtr(0x8000900000001)] = Const CPtr(0x8000900000001) v24 = RefineType v23, CUInt64 v25:CInt64 = IntAnd v12, v22 v26:CBool = IsBitEqual v25, v24 CondBranch v26, bb7(), bb8() bb7(): v28:BasicObject = LoadField v6, :@bar@0x10 Jump bb4(v28) bb8(): v30:BasicObject = GetIvar v6, :@bar Jump bb4(v30) bb4(v13:BasicObject): CheckInterrupts Return v13 ``` On this branch, the HIR is like this: ``` Optimized HIR: fn read@../test.rb:9: bb1(): EntryPoint interpreter v1:HeapBasicObject = LoadSelf Jump bb3(v1) bb2(): EntryPoint JIT(0) v4:HeapBasicObject = LoadArg :self@0 Jump bb3(v4) bb3(v6:HeapBasicObject): PatchPoint SingleRactorMode v13:CShape = LoadField v6, :shape_id@0x4 v14:CShape[0x80008] = Const CShape(0x80008) v15:CBool = IsBitEqual v14, v13 CondBranch v15, bb5(), bb6() bb5(): v17:BasicObject = LoadField v6, :@bar@0x10 Jump bb4(v17) bb6(): v19:CShape = LoadField v6, :shape_id@0x4 v20:CShape[0x80009] = Const CShape(0x80009) v21:CBool = IsBitEqual v20, v19 CondBranch v21, bb7(), bb8() bb7(): v23:BasicObject = LoadField v6, :@bar@0x10 Jump bb4(v23) bb8(): v25:BasicObject = GetIvar v6, :@bar Jump bb4(v25) bb4(v12:BasicObject): CheckInterrupts Return v12 ``` We're able to avoid loading all of the flags, applying a mask, and the testing. The machine code for bb3 looks like this on master (I've removed the nop buffers for patch points): ``` # Insn: v12 LoadField v6, :RBASIC_FLAGS@0x0 # Load field id=RBASIC_FLAGS offset=0 0x122c50124: ldur x1, [x0] # Insn: v14 Const CUInt64(0xffffffff0000001f) # Insn: v15 Const CPtr(0x8000800000001) # Insn: v16 RefineType v15, CUInt64 # Insn: v17 IntAnd v12, v14 0x122c50128: and x2, x1, #0xffffffff0000001f # Insn: v18 IsBitEqual v17, v16 0x122c5012c: mov x3, #1 0x122c50130: movk x3, #0, lsl ruby#16 0x122c50134: movk x3, ruby#8, lsl ruby#32 0x122c50138: movk x3, ruby#8, lsl ruby#48 0x122c5013c: cmp x2, x3 0x122c50140: mov x2, #1 0x122c50144: mov x3, #0 0x122c50148: csel x2, x2, x3, eq 0x122c5014c: tst x2, x2 0x122c50150: b.ne #0x122c501e8 ``` On this branch it looks like this: ``` # Insn: v13 LoadField v6, :shape_id@0x4 # Load field id=shape_id offset=4 0x124dd0124: ldur w1, [x0, #4] # Insn: v14 Const CShape(0x80008) # Insn: v15 IsBitEqual v14, v13 0x124dd0128: mov x2, ruby#8 0x124dd012c: movk x2, ruby#8, lsl ruby#16 0x124dd0130: cmp x2, x1 0x124dd0134: mov x1, #1 0x124dd0138: mov x2, #0 0x124dd013c: csel x1, x1, x2, eq 0x124dd0140: tst x1, x1 0x124dd0144: b.ne #0x124dd01d4 ``` We've eliminated the `and` instruction and only need to do a 32 bit load for the shape.

nirvdrum

Nice cleanup. This will make looking at the HIR a little easier. I had a couple of general comments. Please feel free to resolve or ignore as you see fit.

I need to better familiarize myself with the shape code to be sure the PR output is semantically equivalent. But, I'm also fairly confident the test suite would yell at you if it weren't.

nirvdrum · 2026-06-09T21:13:26Z

+
+        match layout {
+            ShapeLayout::RClass | ShapeLayout::RData => {
+                let offset = if layout == ShapeLayout::RClass {


It's not a major issue, but it looks a little odd to me to have a multivariant match and the immediately discriminate between the two of them. I'm wondering if it'd make sense to capture the field loading in a helper function and then update the match layout to match each variant separately.

Ya I agree. I couldn't think of a better way to do it. TBH I would like to get it so that TDATA offset and RCLASS offset are the same and then we just eliminate this branch.

I'm wondering if it'd make sense to capture the field loading in a helper function and then update the match layout to match each variant separately.

We could. I'm not sure if it'd be any cleaner though 🤷

The RObject layout code below is almost identical to this branch. If we could get RClass to have the same offset as TDATA, then fix the flags function, it feels like we could collapse most of this. It's the plan I have anyway (though it's kind of a low priority plan).

nirvdrum · 2026-06-09T21:17:27Z

-                            let expected_shape = profiled_type.shape();
-                            let (expected_rbasic_flags, rbasic_flags_mask) = profiled_type.rbasic_flags_and_mask();
-                            assert!(expected_shape.is_valid());
+                            let profiled_shape = profiled_type.shape();


I realize you just replaced existing code, but I could not figure out how these two lines related to the comment about falling through to DefinedIvar. Then I realize it was just an unfortunate grouping. An extra newline here would help code clarity, IMHO.

nirvdrum · 2026-06-09T21:20:03Z

-                            let rbasic_flags_mask = fun.push_insn(block, Insn::Const { val: Const::CUInt64(rbasic_flags_mask) });
+                            if profiled_shape.is_complex() { continue; }
+                            if seen_shape.contains(&profiled_shape) { continue; }
+                            seen_shape.push(profiled_shape);


Same deal here as my above comment. Physically separating the early-exit and happy path with a newline would help, IMHO. Although, by this point I was primed for it. And, also, not really anything to do with your PR. I'm just flagging it because it made reviewing a little harder.

nirvdrum · 2026-06-09T21:27:35Z

-                            let expected_rbasic_flags = fun.push_insn(block, Insn::RefineType { val: expected_rbasic_flags, new_type: types::CUInt64 });
-                            let masked = fun.push_insn(block, Insn::IntAnd { left: rbasic_flags, right: rbasic_flags_mask});
-                            let has_shape_and_type = fun.push_insn(block, Insn::IsBitEqual { left: masked, right: expected_rbasic_flags });
+                            if profiled_shape.is_complex() { continue; }


General note: it'd be nice if we were able to deduplicate between definedivar and getinstancevariable. I realize the code was already duplicated, but it was a bit jarring re-reviewing essentially the same code.

Ya makes sense. I'll try to follow up with a refactoring PR. I want to refactor polymorphic IV loads anyway, so it might be a good time to do it.

tenderworks force-pushed the ivar-shape branch 4 times, most recently from 64f650e to 169638a Compare June 8, 2026 19:35

tenderlove marked this pull request as ready for review June 8, 2026 23:23

matzbot requested a review from a team June 8, 2026 23:23

tekknolagi reviewed Jun 9, 2026

View reviewed changes

Comment thread zjit/src/hir.rs Outdated

tekknolagi reviewed Jun 9, 2026

View reviewed changes

Comment thread zjit/src/hir.rs Outdated

tekknolagi reviewed Jun 9, 2026

View reviewed changes

Comment thread zjit/src/hir.rs Outdated

tekknolagi approved these changes Jun 9, 2026

View reviewed changes

tenderlove added 5 commits June 9, 2026 11:22

fix operand order

211f547

Remove comments and whitespace

b047357

remove stuttering

2999c93

fix variable name to be more clear

cb66bc4

tenderworks force-pushed the ivar-shape branch from 77bac57 to cb66bc4 Compare June 9, 2026 18:22

tekknolagi reviewed Jun 9, 2026

View reviewed changes

Comment thread zjit/src/hir.rs

Keep assert

c364fec

tenderlove enabled auto-merge (squash) June 9, 2026 18:56

update snapshots

0249a9d

auto-merge was automatically disabled June 9, 2026 19:32
Head branch was pushed to by a user without write access

nirvdrum approved these changes Jun 9, 2026

View reviewed changes

tenderlove merged commit 1c004aa into ruby:master Jun 9, 2026
120 of 122 checks passed

tenderlove deleted the ivar-shape branch June 9, 2026 22:00

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

ZJIT: Use shape id as cache key for object layout#17187

ZJIT: Use shape id as cache key for object layout#17187
tenderlove merged 7 commits into
ruby:masterfrom
tenderlove:ivar-shape

tenderlove commented Jun 2, 2026

Uh oh!

Uh oh!

Uh oh!

Uh oh!

tekknolagi left a comment

Uh oh!

Uh oh!

nirvdrum left a comment

Uh oh!

nirvdrum Jun 9, 2026

Uh oh!

tenderlove Jun 9, 2026

Uh oh!

nirvdrum Jun 9, 2026

Uh oh!

nirvdrum Jun 9, 2026

Uh oh!

nirvdrum Jun 9, 2026

Uh oh!

tenderlove Jun 9, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Uh oh!

Conversation

tenderlove commented Jun 2, 2026

Uh oh!

Uh oh!

Uh oh!

Uh oh!

tekknolagi left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

nirvdrum left a comment

Choose a reason for hiding this comment

Uh oh!

nirvdrum Jun 9, 2026

Choose a reason for hiding this comment

Uh oh!

tenderlove Jun 9, 2026

Choose a reason for hiding this comment

Uh oh!

nirvdrum Jun 9, 2026

Choose a reason for hiding this comment

Uh oh!

nirvdrum Jun 9, 2026

Choose a reason for hiding this comment

Uh oh!

nirvdrum Jun 9, 2026

Choose a reason for hiding this comment

Uh oh!

tenderlove Jun 9, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants