Skip to content

ZJIT: Use shape id as cache key for object layout#17187

Merged
tenderlove merged 7 commits into
ruby:masterfrom
tenderlove:ivar-shape
Jun 9, 2026
Merged

ZJIT: Use shape id as cache key for object layout#17187
tenderlove merged 7 commits into
ruby:masterfrom
tenderlove:ivar-shape

Conversation

@tenderlove

Copy link
Copy Markdown
Member

Since #17158, we can use the shape id as our cache key for determining object layout. This patch changes ZJIT to use shape id instead of testing all bits.

Given this program:

class Foo
  def initialize; @bar = 123; end
  def read; @bar; end
  def add; @baz = 123; end
end

foo = Foo.new
3.times {
  foo = Foo.new
  foo.read
  foo.add
  foo.read
}

We end up with a polymorphic read in the read method. On master, the HIR looks like this:

Optimized HIR:
fn read@../test.rb:9:
bb1():
  EntryPoint interpreter
  v1:HeapBasicObject = LoadSelf
  Jump bb3(v1)
bb2():
  EntryPoint JIT(0)
  v4:HeapBasicObject = LoadArg :self@0
  Jump bb3(v4)
bb3(v6:HeapBasicObject):
  PatchPoint SingleRactorMode
  v12:CUInt64 = LoadField v6, :RBASIC_FLAGS@0x0
  v14:CUInt64[0xffffffff0000001f] = Const CUInt64(0xffffffff0000001f)
  v15:CPtr[CPtr(0x8000800000001)] = Const CPtr(0x8000800000001)
  v16 = RefineType v15, CUInt64
  v17:CInt64 = IntAnd v12, v14
  v18:CBool = IsBitEqual v17, v16
  CondBranch v18, bb5(), bb6()
bb5():
  v20:BasicObject = LoadField v6, :@bar@0x10
  Jump bb4(v20)
bb6():
  v22:CUInt64[0xffffffff0000001f] = Const CUInt64(0xffffffff0000001f)
  v23:CPtr[CPtr(0x8000900000001)] = Const CPtr(0x8000900000001)
  v24 = RefineType v23, CUInt64
  v25:CInt64 = IntAnd v12, v22
  v26:CBool = IsBitEqual v25, v24
  CondBranch v26, bb7(), bb8()
bb7():
  v28:BasicObject = LoadField v6, :@bar@0x10
  Jump bb4(v28)
bb8():
  v30:BasicObject = GetIvar v6, :@bar
  Jump bb4(v30)
bb4(v13:BasicObject):
  CheckInterrupts
  Return v13

On this branch, the HIR is like this:

Optimized HIR:
fn read@../test.rb:9:
bb1():
  EntryPoint interpreter
  v1:HeapBasicObject = LoadSelf
  Jump bb3(v1)
bb2():
  EntryPoint JIT(0)
  v4:HeapBasicObject = LoadArg :self@0
  Jump bb3(v4)
bb3(v6:HeapBasicObject):
  PatchPoint SingleRactorMode
  v13:CShape = LoadField v6, :shape_id@0x4
  v14:CShape[0x80008] = Const CShape(0x80008)
  v15:CBool = IsBitEqual v14, v13
  CondBranch v15, bb5(), bb6()
bb5():
  v17:BasicObject = LoadField v6, :@bar@0x10
  Jump bb4(v17)
bb6():
  v19:CShape = LoadField v6, :shape_id@0x4
  v20:CShape[0x80009] = Const CShape(0x80009)
  v21:CBool = IsBitEqual v20, v19
  CondBranch v21, bb7(), bb8()
bb7():
  v23:BasicObject = LoadField v6, :@bar@0x10
  Jump bb4(v23)
bb8():
  v25:BasicObject = GetIvar v6, :@bar
  Jump bb4(v25)
bb4(v12:BasicObject):
  CheckInterrupts
  Return v12

We're able to avoid loading all of the flags, applying a mask, and the testing.

The machine code for bb3 looks like this on master (I've removed the nop buffers for patch points):

  # Insn: v12 LoadField v6, :RBASIC_FLAGS@0x0
  # Load field id=RBASIC_FLAGS offset=0
  0x122c50124: ldur x1, [x0]
  # Insn: v14 Const CUInt64(0xffffffff0000001f)
  # Insn: v15 Const CPtr(0x8000800000001)
  # Insn: v16 RefineType v15, CUInt64
  # Insn: v17 IntAnd v12, v14
  0x122c50128: and x2, x1, #0xffffffff0000001f
  # Insn: v18 IsBitEqual v17, v16
  0x122c5012c: mov x3, #1
  0x122c50130: movk x3, #0, lsl #16
  0x122c50134: movk x3, #8, lsl #32
  0x122c50138: movk x3, #8, lsl #48
  0x122c5013c: cmp x2, x3
  0x122c50140: mov x2, #1
  0x122c50144: mov x3, #0
  0x122c50148: csel x2, x2, x3, eq
  0x122c5014c: tst x2, x2
  0x122c50150: b.ne #0x122c501e8

On this branch it looks like this:

  # Insn: v13 LoadField v6, :shape_id@0x4
  # Load field id=shape_id offset=4
  0x124dd0124: ldur w1, [x0, #4]
  # Insn: v14 Const CShape(0x80008)
  # Insn: v15 IsBitEqual v14, v13
  0x124dd0128: mov x2, #8
  0x124dd012c: movk x2, #8, lsl #16
  0x124dd0130: cmp x2, x1
  0x124dd0134: mov x1, #1
  0x124dd0138: mov x2, #0
  0x124dd013c: csel x1, x1, x2, eq
  0x124dd0140: tst x1, x1
  0x124dd0144: b.ne #0x124dd01d4

We've eliminated the and instruction and only need to do a 32 bit load for the shape.

@tenderworks tenderworks force-pushed the ivar-shape branch 4 times, most recently from 64f650e to 169638a Compare June 8, 2026 19:35
@tenderlove tenderlove marked this pull request as ready for review June 8, 2026 23:23
@matzbot matzbot requested a review from a team June 8, 2026 23:23
Comment thread zjit/src/hir.rs Outdated
Comment thread zjit/src/hir.rs Outdated
Comment thread zjit/src/hir.rs Outdated

@tekknolagi tekknolagi left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lovely

Since ruby#17158, we can use the shape id as our cache key for determining
object layout.  This patch changes ZJIT to use shape id instead of
testing all bits.

Given this program:

```ruby
class Foo
  def initialize; @bar = 123; end
  def read; @bar; end
  def add; @baz = 123; end
end

foo = Foo.new
3.times {
  foo = Foo.new
  foo.read
  foo.add
  foo.read
}
```

We end up with a polymorphic read in the `read` method.  On master, the
HIR looks like this:

```
Optimized HIR:
fn read@../test.rb:9:
bb1():
  EntryPoint interpreter
  v1:HeapBasicObject = LoadSelf
  Jump bb3(v1)
bb2():
  EntryPoint JIT(0)
  v4:HeapBasicObject = LoadArg :self@0
  Jump bb3(v4)
bb3(v6:HeapBasicObject):
  PatchPoint SingleRactorMode
  v12:CUInt64 = LoadField v6, :RBASIC_FLAGS@0x0
  v14:CUInt64[0xffffffff0000001f] = Const CUInt64(0xffffffff0000001f)
  v15:CPtr[CPtr(0x8000800000001)] = Const CPtr(0x8000800000001)
  v16 = RefineType v15, CUInt64
  v17:CInt64 = IntAnd v12, v14
  v18:CBool = IsBitEqual v17, v16
  CondBranch v18, bb5(), bb6()
bb5():
  v20:BasicObject = LoadField v6, :@bar@0x10
  Jump bb4(v20)
bb6():
  v22:CUInt64[0xffffffff0000001f] = Const CUInt64(0xffffffff0000001f)
  v23:CPtr[CPtr(0x8000900000001)] = Const CPtr(0x8000900000001)
  v24 = RefineType v23, CUInt64
  v25:CInt64 = IntAnd v12, v22
  v26:CBool = IsBitEqual v25, v24
  CondBranch v26, bb7(), bb8()
bb7():
  v28:BasicObject = LoadField v6, :@bar@0x10
  Jump bb4(v28)
bb8():
  v30:BasicObject = GetIvar v6, :@bar
  Jump bb4(v30)
bb4(v13:BasicObject):
  CheckInterrupts
  Return v13
```

On this branch, the HIR is like this:

```
Optimized HIR:
fn read@../test.rb:9:
bb1():
  EntryPoint interpreter
  v1:HeapBasicObject = LoadSelf
  Jump bb3(v1)
bb2():
  EntryPoint JIT(0)
  v4:HeapBasicObject = LoadArg :self@0
  Jump bb3(v4)
bb3(v6:HeapBasicObject):
  PatchPoint SingleRactorMode
  v13:CShape = LoadField v6, :shape_id@0x4
  v14:CShape[0x80008] = Const CShape(0x80008)
  v15:CBool = IsBitEqual v14, v13
  CondBranch v15, bb5(), bb6()
bb5():
  v17:BasicObject = LoadField v6, :@bar@0x10
  Jump bb4(v17)
bb6():
  v19:CShape = LoadField v6, :shape_id@0x4
  v20:CShape[0x80009] = Const CShape(0x80009)
  v21:CBool = IsBitEqual v20, v19
  CondBranch v21, bb7(), bb8()
bb7():
  v23:BasicObject = LoadField v6, :@bar@0x10
  Jump bb4(v23)
bb8():
  v25:BasicObject = GetIvar v6, :@bar
  Jump bb4(v25)
bb4(v12:BasicObject):
  CheckInterrupts
  Return v12
```

We're able to avoid loading all of the flags, applying a mask, and the
testing.

The machine code for bb3 looks like this on master (I've removed the nop
buffers for patch points):

```
  # Insn: v12 LoadField v6, :RBASIC_FLAGS@0x0
  # Load field id=RBASIC_FLAGS offset=0
  0x122c50124: ldur x1, [x0]
  # Insn: v14 Const CUInt64(0xffffffff0000001f)
  # Insn: v15 Const CPtr(0x8000800000001)
  # Insn: v16 RefineType v15, CUInt64
  # Insn: v17 IntAnd v12, v14
  0x122c50128: and x2, x1, #0xffffffff0000001f
  # Insn: v18 IsBitEqual v17, v16
  0x122c5012c: mov x3, #1
  0x122c50130: movk x3, #0, lsl ruby#16
  0x122c50134: movk x3, ruby#8, lsl ruby#32
  0x122c50138: movk x3, ruby#8, lsl ruby#48
  0x122c5013c: cmp x2, x3
  0x122c50140: mov x2, #1
  0x122c50144: mov x3, #0
  0x122c50148: csel x2, x2, x3, eq
  0x122c5014c: tst x2, x2
  0x122c50150: b.ne #0x122c501e8
```

On this branch it looks like this:

```
  # Insn: v13 LoadField v6, :shape_id@0x4
  # Load field id=shape_id offset=4
  0x124dd0124: ldur w1, [x0, #4]
  # Insn: v14 Const CShape(0x80008)
  # Insn: v15 IsBitEqual v14, v13
  0x124dd0128: mov x2, ruby#8
  0x124dd012c: movk x2, ruby#8, lsl ruby#16
  0x124dd0130: cmp x2, x1
  0x124dd0134: mov x1, #1
  0x124dd0138: mov x2, #0
  0x124dd013c: csel x1, x1, x2, eq
  0x124dd0140: tst x1, x1
  0x124dd0144: b.ne #0x124dd01d4
```

We've eliminated the `and` instruction and only need to do a 32 bit load
for the shape.
Comment thread zjit/src/hir.rs
@tenderlove tenderlove enabled auto-merge (squash) June 9, 2026 18:56
auto-merge was automatically disabled June 9, 2026 19:32

Head branch was pushed to by a user without write access

@nirvdrum nirvdrum left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice cleanup. This will make looking at the HIR a little easier. I had a couple of general comments. Please feel free to resolve or ignore as you see fit.

I need to better familiarize myself with the shape code to be sure the PR output is semantically equivalent. But, I'm also fairly confident the test suite would yell at you if it weren't.

Comment thread zjit/src/hir.rs

match layout {
ShapeLayout::RClass | ShapeLayout::RData => {
let offset = if layout == ShapeLayout::RClass {

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's not a major issue, but it looks a little odd to me to have a multivariant match and the immediately discriminate between the two of them. I'm wondering if it'd make sense to capture the field loading in a helper function and then update the match layout to match each variant separately.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ya I agree. I couldn't think of a better way to do it. TBH I would like to get it so that TDATA offset and RCLASS offset are the same and then we just eliminate this branch.

I'm wondering if it'd make sense to capture the field loading in a helper function and then update the match layout to match each variant separately.

We could. I'm not sure if it'd be any cleaner though 🤷

The RObject layout code below is almost identical to this branch. If we could get RClass to have the same offset as TDATA, then fix the flags function, it feels like we could collapse most of this. It's the plan I have anyway (though it's kind of a low priority plan).

Comment thread zjit/src/hir.rs
let expected_shape = profiled_type.shape();
let (expected_rbasic_flags, rbasic_flags_mask) = profiled_type.rbasic_flags_and_mask();
assert!(expected_shape.is_valid());
let profiled_shape = profiled_type.shape();

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I realize you just replaced existing code, but I could not figure out how these two lines related to the comment about falling through to DefinedIvar. Then I realize it was just an unfortunate grouping. An extra newline here would help code clarity, IMHO.

Comment thread zjit/src/hir.rs
let rbasic_flags_mask = fun.push_insn(block, Insn::Const { val: Const::CUInt64(rbasic_flags_mask) });
if profiled_shape.is_complex() { continue; }
if seen_shape.contains(&profiled_shape) { continue; }
seen_shape.push(profiled_shape);

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same deal here as my above comment. Physically separating the early-exit and happy path with a newline would help, IMHO. Although, by this point I was primed for it. And, also, not really anything to do with your PR. I'm just flagging it because it made reviewing a little harder.

Comment thread zjit/src/hir.rs
let expected_rbasic_flags = fun.push_insn(block, Insn::RefineType { val: expected_rbasic_flags, new_type: types::CUInt64 });
let masked = fun.push_insn(block, Insn::IntAnd { left: rbasic_flags, right: rbasic_flags_mask});
let has_shape_and_type = fun.push_insn(block, Insn::IsBitEqual { left: masked, right: expected_rbasic_flags });
if profiled_shape.is_complex() { continue; }

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

General note: it'd be nice if we were able to deduplicate between definedivar and getinstancevariable. I realize the code was already duplicated, but it was a bit jarring re-reviewing essentially the same code.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ya makes sense. I'll try to follow up with a refactoring PR. I want to refactor polymorphic IV loads anyway, so it might be a good time to do it.

@tenderlove tenderlove merged commit 1c004aa into ruby:master Jun 9, 2026
120 of 122 checks passed
@tenderlove tenderlove deleted the ivar-shape branch June 9, 2026 22:00
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants