Skip to content

Improving avx/avx2 swizzles#1141

Merged
serge-sans-paille merged 1 commit into
xtensor-stack:masterfrom
DiamonDinoia:improving-swizzle
Aug 8, 2025
Merged

Improving avx/avx2 swizzles#1141
serge-sans-paille merged 1 commit into
xtensor-stack:masterfrom
DiamonDinoia:improving-swizzle

Conversation

@DiamonDinoia

Copy link
Copy Markdown
Contributor

Hi,

This PR optimzes swizzles where possible
Fixed a bug in the float avx2 swizzle which did not allow duplicates
Adds more tests

Comment thread include/xsimd/arch/xsimd_avx.hpp Outdated

@serge-sans-paille serge-sans-paille left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If you could reduce code duplication by syndicating patterns as suggested, that would be great!

@DiamonDinoia

DiamonDinoia commented Jul 11, 2025

Copy link
Copy Markdown
Contributor Author

If this passes the tests it can be merged. I hope I did not break any build this time.

Should I add myself to the copyright?

@serge-sans-paille

Copy link
Copy Markdown
Contributor

You can add yourself, yes. And rebase on master branch ;-)

@DiamonDinoia

Copy link
Copy Markdown
Contributor Author

I rebased. I did not squash as I have some commits that I would like to keep for reference in the branch. Feel free to squash and merge in master.

@serge-sans-paille

Copy link
Copy Markdown
Contributor

Hey @DiamonDinoia : I'll have a look at this next week, don't worry if you don't have any feedback since then 🙇

@DiamonDinoia DiamonDinoia force-pushed the improving-swizzle branch 2 times, most recently from ec97ae8 to e99877d Compare July 25, 2025 19:18
@DiamonDinoia

Copy link
Copy Markdown
Contributor Author

I am not sure what the problem is with PowerPC. It should not touch any code I wrote and from the print the results seems correct.

@serge-sans-paille

Copy link
Copy Markdown
Contributor

How strange: the failure shows batches that... look the same /o\

@DiamonDinoia

Copy link
Copy Markdown
Contributor Author

I could not find an explanation for it.

@serge-sans-paille

Copy link
Copy Markdown
Contributor

If you don't mind, I'll split your commit history in pieces and validate them step-by-step in order to understand what's happening - keeping your authorship information, of course.

@DiamonDinoia

Copy link
Copy Markdown
Contributor Author

Sure, no problem, go ahead!

@serge-sans-paille

Copy link
Copy Markdown
Contributor

perfect, can you just squash your commits into a nice history? Thanks o/

Fixes: some swizzled did not allow duplicates in the output
@serge-sans-paille serge-sans-paille merged commit 825d298 into xtensor-stack:master Aug 8, 2025
65 checks passed
@serge-sans-paille

Copy link
Copy Markdown
Contributor

Thanks for being patient and for the great patch set \o/

@serge-sans-paille

Copy link
Copy Markdown
Contributor

See ##1155 for the sse2 part

@DiamonDinoia

Copy link
Copy Markdown
Contributor Author

Thanks for being patient and for the great patch set \o/

My pleasure!

Comment on lines +945 to +946
// The intrinsic does NOT allow to copy the same element of the source vector to more than one element of the destination vector.
// one-shot 8-lane permute

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Was this tested concretely? According to the intel ISA reference for VPERMPS,

this instruction permits a doubleword in the source operand to be copied to more than one location in the destination operand

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It would be surprising if it was different from _mm256_permutevar8x32_epi32, in any case

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

image from: https://www.intel.com/content/www/us/en/content-details/671569/19-0-c-compiler-developer-guide-and-reference.html?wapkw=_mm256_permutevar8x32_ps

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There was a webpage once but intel changed their website and I cannot find it anymore

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is the same for _mm256_permutevar8x32_epi32 If I use that with duplicates I introduced a bug.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There was a problem with duplicates originally, that's why I investigated. But I think it was showing up with sse

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(@AntoinePrv you could be interested in this discussion)

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Happy to revert the changes since ISA says that it is allowed. Would you like to open a PR or issue?

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I could try to, but I'm not a xsimd developer (I just encountered this by chance when grepping for xsimd's support for shuffling).

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can do it once I have time otherwise.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants