Improving avx/avx2 swizzles by DiamonDinoia · Pull Request #1141 · xtensor-stack/xsimd

DiamonDinoia · 2025-07-08T19:48:12Z

Hi,

This PR optimzes swizzles where possible
Fixed a bug in the float avx2 swizzle which did not allow duplicates
Adds more tests

serge-sans-paille

If you could reduce code duplication by syndicating patterns as suggested, that would be great!

DiamonDinoia · 2025-07-11T19:04:53Z

If this passes the tests it can be merged. I hope I did not break any build this time.

Should I add myself to the copyright?

serge-sans-paille · 2025-07-11T21:42:45Z

You can add yourself, yes. And rebase on master branch ;-)

DiamonDinoia · 2025-07-13T22:30:55Z

I rebased. I did not squash as I have some commits that I would like to keep for reference in the branch. Feel free to squash and merge in master.

serge-sans-paille · 2025-07-16T21:24:26Z

Hey @DiamonDinoia : I'll have a look at this next week, don't worry if you don't have any feedback since then 🙇

DiamonDinoia · 2025-08-04T12:35:26Z

I am not sure what the problem is with PowerPC. It should not touch any code I wrote and from the print the results seems correct.

serge-sans-paille · 2025-08-06T20:16:48Z

How strange: the failure shows batches that... look the same /o\

DiamonDinoia · 2025-08-06T20:29:41Z

I could not find an explanation for it.

serge-sans-paille · 2025-08-07T06:41:47Z

If you don't mind, I'll split your commit history in pieces and validate them step-by-step in order to understand what's happening - keeping your authorship information, of course.

DiamonDinoia · 2025-08-07T12:36:56Z

Sure, no problem, go ahead!

serge-sans-paille · 2025-08-07T20:45:19Z

perfect, can you just squash your commits into a nice history? Thanks o/

Fixes: some swizzled did not allow duplicates in the output

serge-sans-paille · 2025-08-08T05:05:08Z

Thanks for being patient and for the great patch set \o/

serge-sans-paille · 2025-08-08T05:48:08Z

See ##1155 for the sse2 part

DiamonDinoia · 2025-08-08T15:23:16Z

Thanks for being patient and for the great patch set \o/

My pleasure!

pitrou · 2025-10-16T15:17:52Z

+                // The intrinsic does NOT allow to copy the same element of the source vector to more than one element of the destination vector.
+                // one-shot 8-lane permute


Was this tested concretely? According to the intel ISA reference for VPERMPS,

this instruction permits a doubleword in the source operand to be copied to more than one location in the destination operand

It would be surprising if it was different from _mm256_permutevar8x32_epi32, in any case

from: https://www.intel.com/content/www/us/en/content-details/671569/19-0-c-compiler-developer-guide-and-reference.html?wapkw=_mm256_permutevar8x32_ps

There was a webpage once but intel changed their website and I cannot find it anymore

It is the same for _mm256_permutevar8x32_epi32 If I use that with duplicates I introduced a bug.

There was a problem with duplicates originally, that's why I investigated. But I think it was showing up with sse

(@AntoinePrv you could be interested in this discussion)

Happy to revert the changes since ISA says that it is allowed. Would you like to open a PR or issue?

I could try to, but I'm not a xsimd developer (I just encountered this by chance when grepping for xsimd's support for shuffling).

I can do it once I have time otherwise.

DiamonDinoia mentioned this pull request Jul 9, 2025

AVX swizzle seems a bit slow #1138

Closed

serge-sans-paille reviewed Jul 10, 2025

View reviewed changes

Comment thread include/xsimd/arch/xsimd_avx.hpp Outdated

serge-sans-paille reviewed Jul 10, 2025

View reviewed changes

DiamonDinoia force-pushed the improving-swizzle branch from 8239b5e to cfb5838 Compare July 13, 2025 22:17

DiamonDinoia requested a review from serge-sans-paille July 16, 2025 18:33

DiamonDinoia force-pushed the improving-swizzle branch 2 times, most recently from ec97ae8 to e99877d Compare July 25, 2025 19:18

DiamonDinoia force-pushed the improving-swizzle branch from e99877d to c6b58b6 Compare August 4, 2025 09:45

DiamonDinoia force-pushed the improving-swizzle branch from c6b58b6 to b285dbd Compare August 7, 2025 19:05

improved avx/avx2 swizzles

8b1c9b9

Fixes: some swizzled did not allow duplicates in the output

DiamonDinoia force-pushed the improving-swizzle branch from 0a906e5 to 8b1c9b9 Compare August 7, 2025 23:00

serge-sans-paille merged commit 825d298 into xtensor-stack:master Aug 8, 2025
65 checks passed

pitrou reviewed Oct 16, 2025

View reviewed changes

AntoinePrv mentioned this pull request Nov 12, 2025

Add Avx2 constant mask swizzle 8/16 and improve 32/64 #1201

Merged

		// The intrinsic does NOT allow to copy the same element of the source vector to more than one element of the destination vector.
		// one-shot 8-lane permute

Uh oh!

Conversation

DiamonDinoia commented Jul 8, 2025

Uh oh!

Uh oh!

serge-sans-paille left a comment

Choose a reason for hiding this comment

Uh oh!

DiamonDinoia commented Jul 11, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

serge-sans-paille commented Jul 11, 2025

Uh oh!

DiamonDinoia commented Jul 13, 2025

Uh oh!

serge-sans-paille commented Jul 16, 2025

Uh oh!

DiamonDinoia commented Aug 4, 2025

Uh oh!

serge-sans-paille commented Aug 6, 2025

Uh oh!

DiamonDinoia commented Aug 6, 2025

Uh oh!

serge-sans-paille commented Aug 7, 2025

Uh oh!

DiamonDinoia commented Aug 7, 2025

Uh oh!

serge-sans-paille commented Aug 7, 2025

Uh oh!

Uh oh!

serge-sans-paille commented Aug 8, 2025

Uh oh!

serge-sans-paille commented Aug 8, 2025

Uh oh!

DiamonDinoia commented Aug 8, 2025

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

DiamonDinoia commented Jul 11, 2025 •

edited

Loading