Quantize aten.concat like aten.cat for Ethos-U#20600
Conversation
Summary: The ARM PT2E quantization annotator gives a `SharedQuantizationSpec` to `aten.cat.default`, `aten.concatenate.default`, and `aten.stack.default`, so every concat input and the output share one scale/zero-point. That shared annotation is what lets a quantized concat lower to a single TOSA `CONCAT` and stay inside one Ethos-U delegate. `aten.concat.default` was missing from that tuple. `torch.concat` is captured as `aten.concat.default` under dynamic-shape tracing (the path the EMG cascade detector PTQ uses for its `emg_features`/`imu_features` fusion at `networks.py:251`), so that concat received independent observers for each input and the output. The three distinct scales cannot be expressed as one TOSA `CONCAT` — the ARM backend has no rescale-before-concat — so the partitioner strands the op on CPU as dequantize/cat/quantize and the detector splits into two Ethos-U delegates instead of one. Under static-shape tracing the same source op is captured as `aten.cat.default`, which is why this only reproduces on the production (dynamic-shape) lowering path. Add `aten.concat.default` next to the existing aliases so it gets the same shared spec. Applied to both the `fbcode` and `xplat` mirrors. Differential Revision: D110085289
🔗 Helpful Links🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/executorch/20600
Note: Links to docs will display an error until the docs builds have been completed. ✅ No FailuresAs of commit 7185bad with merge base 035b45a ( This comment was automatically generated by Dr. CI and updates every 15 minutes. |
|
|
|
@3l1 has exported this pull request. If you are a Meta employee, you can view the originating Diff in D110085289. |
This PR needs a
|
Summary:
The ARM PT2E quantization annotator gives a
SharedQuantizationSpectoaten.cat.default,aten.concatenate.default, andaten.stack.default, so every concat input and the output share one scale/zero-point. That shared annotation is what lets a quantized concat lower to a single TOSACONCATand stay inside one Ethos-U delegate.aten.concat.defaultwas missing from that tuple.torch.concatis captured asaten.concat.defaultunder dynamic-shape tracing (the path the EMG cascade detector PTQ uses for itsemg_features/imu_featuresfusion atnetworks.py:251), so that concat received independent observers for each input and the output. The three distinct scales cannot be expressed as one TOSACONCAT— the ARM backend has no rescale-before-concat — so the partitioner strands the op on CPU as dequantize/cat/quantize and the detector splits into two Ethos-U delegates instead of one. Under static-shape tracing the same source op is captured asaten.cat.default, which is why this only reproduces on the production (dynamic-shape) lowering path.Add
aten.concat.defaultnext to the existing aliases so it gets the same shared spec. Applied to both thefbcodeandxplatmirrors.Differential Revision: D110085289
cc @digantdesai @freddan80 @per @zingo @oscarandersson8218 @mansnils @Sebastian-Larsson @robell @rascani