https://gcc.gnu.org/git/?p=gcc.git;a=patch;h=584b346a4c7a6e6e77da6dc80968401a3c08161d From 584b346a4c7a6e6e77da6dc80968401a3c08161d Mon Sep 17 00:00:00 2001 From: Jakub Jelinek Date: Tue, 25 Mar 2025 16:55:24 +0100 Subject: [PATCH] i386: Fix up combination of -2 r<<= (x & 7) into btr [PR119428] The following patch is miscompiled from r15-8478 but latently already since my r11-5756 and r11-6631 changes. The r11-5756 change was https://gcc.gnu.org/pipermail/gcc-patches/2020-December/561164.html which changed the splitters to immediately throw away the masking. And the r11-6631 change was an optimization to recognize (set (zero_extract:HI (...) (const_int 1) (...)) (const_int 1) as btr. The problem is their interaction. x86 is not a SHIFT_COUNT_TRUNCATED target, so the masking needs to be explicit in the IL. And combine.cc (make_field_assignment) has since 1992 optimizations which try to optimize x &= (-2 r<< y) into zero_extract (x) = 0. Now, such an optimization is fine if y has not been masked or if the chosen zero_extract has the same mode as the rotate (or it recognizes something with a left shift too). IMHO such optimization is invalid for SHIFT_COUNT_TRUNCATED targets because we explicitly say that the masking of the shift/rotate counts are redundant there and don't need to be part of the IL (I have a patch for that, but because it is just latent, I'm not sure it needs to be posted for gcc 15 (and also am not sure if it should punt or add operand masking just in case)). x86 is not SHIFT_COUNT_TRUNCATED though and so even fixing combine not to do that for SHIFT_COUNT_TRUNCATED targets doesn't help, and we don't have QImode insv, so it is optimized into HImode insertions. Now, if the y in x &= (-2 r<< y) wasn't masked in any way, turning it into HImode btr is just fine, but if it was x &= (-2 r<< (y & 7)) and we just decided to throw away the masking, using btr changes the behavior on it and causes e2fsprogs and sqlite miscompilations. So IMHO on !SHIFT_COUNT_TRUNCATED targets, we need to keep the maskings explicit in the IL, either at least for the duration of the combine pass as does the following patch (where combine is the only known pass to have such transformation), or even keep it until final pass in case there are some later optimizations that would also need to know whether there was explicit masking or not and with what mask. The latter change would be much larger. The following patch just reverts the r11-5756 change and adds a testcase. 2025-03-25 Jakub Jelinek PR target/96226 PR target/119428 * config/i386/i386.md (splitter after *3_mask, splitter after *3_mask_1): Revert 2020-12-05 changes. * gcc.c-torture/execute/pr119428.c: New test. --- gcc/config/i386/i386.md | 6 ++++-- gcc/testsuite/gcc.c-torture/execute/pr119428.c | 18 ++++++++++++++++++ 2 files changed, 22 insertions(+), 2 deletions(-) create mode 100644 gcc/testsuite/gcc.c-torture/execute/pr119428.c diff --git a/gcc/config/i386/i386.md b/gcc/config/i386/i386.md index 2b3cffc1f350..9d1b34e55959 100644 --- a/gcc/config/i386/i386.md +++ b/gcc/config/i386/i386.md @@ -18168,7 +18168,8 @@ [(set (match_dup 4) (match_dup 1)) (set (match_dup 0) (any_rotate:SWI (match_dup 4) - (subreg:QI (match_dup 2) 0)))] + (subreg:QI + (and:SI (match_dup 2) (match_dup 3)) 0)))] "operands[4] = gen_reg_rtx (mode);") (define_insn_and_split "*3_mask_1" @@ -18202,7 +18203,8 @@ == GET_MODE_BITSIZE (mode) - 1" [(set (match_dup 4) (match_dup 1)) (set (match_dup 0) - (any_rotate:SWI (match_dup 4) (match_dup 2)))] + (any_rotate:SWI (match_dup 4) + (and:QI (match_dup 2) (match_dup 3))))] "operands[4] = gen_reg_rtx (mode);") (define_insn_and_split "*3_add" diff --git a/gcc/testsuite/gcc.c-torture/execute/pr119428.c b/gcc/testsuite/gcc.c-torture/execute/pr119428.c new file mode 100644 index 000000000000..33a93f46b3bd --- /dev/null +++ b/gcc/testsuite/gcc.c-torture/execute/pr119428.c @@ -0,0 +1,18 @@ +/* PR target/119428 */ + +__attribute__((noipa)) void +foo (unsigned int x, unsigned char *y) +{ + y += x >> 3; + *y &= (unsigned char) ~(1 << (x & 0x07)); +} + +int +main () +{ + unsigned char buf[8]; + __builtin_memset (buf, 0xff, 8); + foo (8, buf); + if (buf[1] != 0xfe) + __builtin_abort (); +} -- 2.43.5