![]() |
NMSIS-Core
Version 1.4.0
NMSIS-Core support for Nuclei processor-based devices
|
SIMD 8-bit Multiply Instructions. More...
Functions | |
__STATIC_FORCEINLINE unsigned long | __RV_KHM8 (unsigned long a, unsigned long b) |
KHM8 (SIMD Signed Saturating Q7 Multiply) More... | |
__STATIC_FORCEINLINE unsigned long | __RV_KHMX8 (unsigned long a, unsigned long b) |
KHMX8 (SIMD Signed Saturating Crossed Q7 Multiply) More... | |
__STATIC_FORCEINLINE unsigned long long | __RV_SMUL8 (unsigned int a, unsigned int b) |
SMUL8 (SIMD Signed 8-bit Multiply) More... | |
__STATIC_FORCEINLINE unsigned long long | __RV_SMULX8 (unsigned int a, unsigned int b) |
SMULX8 (SIMD Signed Crossed 8-bit Multiply) More... | |
__STATIC_FORCEINLINE unsigned long long | __RV_UMUL8 (unsigned int a, unsigned int b) |
UMUL8 (SIMD Unsigned 8-bit Multiply) More... | |
__STATIC_FORCEINLINE unsigned long long | __RV_UMULX8 (unsigned int a, unsigned int b) |
UMULX8 (SIMD Unsigned Crossed 8-bit Multiply) More... | |
SIMD 8-bit Multiply Instructions.
there are 6 SIMD 8-bit Multiply instructions.
__STATIC_FORCEINLINE unsigned long __RV_KHM8 | ( | unsigned long | a, |
unsigned long | b | ||
) |
KHM8 (SIMD Signed Saturating Q7 Multiply)
Type: SIMD
Syntax:
Purpose:
Do Q7xQ7 element multiplications simultaneously. The Q14 results are then reduced to Q7 numbers again.
Description:
For the KHM8
instruction, multiply the top 8-bit Q7 content of 16-bit chunks in Rs1 with the top 8-bit Q7 content of 16-bit chunks in Rs2. At the same time, multiply the bottom 8-bit Q7 content of 16-bit chunks in Rs1 with the bottom 8-bit Q7 content of 16-bit chunks in Rs2. For the KHMX16
instruction, multiply the top 8-bit Q7 content of 16-bit chunks in Rs1 with the bottom 8-bit Q7 content of 16-bit chunks in Rs2. At the same time, multiply the bottom 8-bit Q7 content of 16-bit chunks in Rs1 with the top 8-bit Q7 content of 16-bit chunks in Rs2. The Q14 results are then right-shifted 7-bits and saturated into Q7 values. The Q7 results are then written into Rd. When both the two Q7 inputs of a multiplication are 0x80, saturation will happen. The result will be saturated to 0x7F and the overflow flag OV will be set.
Operations:
[in] | a | unsigned long type of value stored in a |
[in] | b | unsigned long type of value stored in b |
Definition at line 2294 of file core_feature_dsp.h.
References __ASM.
__STATIC_FORCEINLINE unsigned long __RV_KHMX8 | ( | unsigned long | a, |
unsigned long | b | ||
) |
KHMX8 (SIMD Signed Saturating Crossed Q7 Multiply)
Type: SIMD
Syntax:
Purpose:
Do Q7xQ7 element multiplications simultaneously. The Q14 results are then reduced to Q7 numbers again.
Description:
For the KHM8
instruction, multiply the top 8-bit Q7 content of 16-bit chunks in Rs1 with the top 8-bit Q7 content of 16-bit chunks in Rs2. At the same time, multiply the bottom 8-bit Q7 content of 16-bit chunks in Rs1 with the bottom 8-bit Q7 content of 16-bit chunks in Rs2. For the KHMX16
instruction, multiply the top 8-bit Q7 content of 16-bit chunks in Rs1 with the bottom 8-bit Q7 content of 16-bit chunks in Rs2. At the same time, multiply the bottom 8-bit Q7 content of 16-bit chunks in Rs1 with the top 8-bit Q7 content of 16-bit chunks in Rs2. The Q14 results are then right-shifted 7-bits and saturated into Q7 values. The Q7 results are then written into Rd. When both the two Q7 inputs of a multiplication are 0x80, saturation will happen. The result will be saturated to 0x7F and the overflow flag OV will be set.
Operations:
[in] | a | unsigned long type of value stored in a |
[in] | b | unsigned long type of value stored in b |
Definition at line 2356 of file core_feature_dsp.h.
References __ASM.
__STATIC_FORCEINLINE unsigned long long __RV_SMUL8 | ( | unsigned int | a, |
unsigned int | b | ||
) |
SMUL8 (SIMD Signed 8-bit Multiply)
Type: SIMD
Syntax:
Purpose:
Do signed 8-bit multiplications and generate four 16-bit results simultaneously.
RV32 Description:
For the SMUL8
instruction, multiply the 8-bit data elements of Rs1 with the corresponding 8-bit data elements of Rs2. For the SMULX8
instruction, multiply the first and second 8-bit data elements of Rs1 with the second and first 8-bit data elements of Rs2. At the same time, multiply the third and fourth 8-bit data elements of Rs1 with the fourth and third 8-bit data elements of Rs2. The four 16-bit results are then written into an even/odd pair of registers specified by Rd(4,1). Rd(4,1), i.e., d, determines the even/odd pair group of two registers. Specifically, the register pair includes register 2d and 2d+1. The odd 2d+1
register of the pair contains the two 16-bit results calculated from the top part of Rs1 and the even 2d
register of the pair contains the two 16-bit results calculated from the bottom part of Rs1.
RV64 Description:
For the SMUL8
instruction, multiply the 8-bit data elements of Rs1 with the corresponding 8-bit data elements of Rs2. For the SMULX8
instruction, multiply the first and second 8-bit data elements of Rs1 with the second and first 8-bit data elements of Rs2. At the same time, multiply the third and fourth 8-bit data elements of Rs1 with the fourth and third 8-bit data elements of Rs2. The four 16-bit results are then written into Rd. The Rd.W[1] contains the two 16-bit results calculated from the top part of Rs1 and the Rd.W[0] contains the two 16-bit results calculated from the bottom part of Rs1.
Operations:
[in] | a | unsigned int type of value stored in a |
[in] | b | unsigned int type of value stored in b |
Definition at line 9316 of file core_feature_dsp.h.
References __ASM.
__STATIC_FORCEINLINE unsigned long long __RV_SMULX8 | ( | unsigned int | a, |
unsigned int | b | ||
) |
SMULX8 (SIMD Signed Crossed 8-bit Multiply)
Type: SIMD
Syntax:
Purpose:
Do signed 8-bit multiplications and generate four 16-bit results simultaneously.
RV32 Description:
For the SMUL8
instruction, multiply the 8-bit data elements of Rs1 with the corresponding 8-bit data elements of Rs2. For the SMULX8
instruction, multiply the first and second 8-bit data elements of Rs1 with the second and first 8-bit data elements of Rs2. At the same time, multiply the third and fourth 8-bit data elements of Rs1 with the fourth and third 8-bit data elements of Rs2. The four 16-bit results are then written into an even/odd pair of registers specified by Rd(4,1). Rd(4,1), i.e., d, determines the even/odd pair group of two registers. Specifically, the register pair includes register 2d and 2d+1. The odd 2d+1
register of the pair contains the two 16-bit results calculated from the top part of Rs1 and the even 2d
register of the pair contains the two 16-bit results calculated from the bottom part of Rs1.
RV64 Description:
For the SMUL8
instruction, multiply the 8-bit data elements of Rs1 with the corresponding 8-bit data elements of Rs2. For the SMULX8
instruction, multiply the first and second 8-bit data elements of Rs1 with the second and first 8-bit data elements of Rs2. At the same time, multiply the third and fourth 8-bit data elements of Rs1 with the fourth and third 8-bit data elements of Rs2. The four 16-bit results are then written into Rd. The Rd.W[1] contains the two 16-bit results calculated from the top part of Rs1 and the Rd.W[0] contains the two 16-bit results calculated from the bottom part of Rs1.
Operations:
[in] | a | unsigned int type of value stored in a |
[in] | b | unsigned int type of value stored in b |
Definition at line 9399 of file core_feature_dsp.h.
References __ASM.
__STATIC_FORCEINLINE unsigned long long __RV_UMUL8 | ( | unsigned int | a, |
unsigned int | b | ||
) |
UMUL8 (SIMD Unsigned 8-bit Multiply)
Type: SIMD
Syntax:
Purpose:
Do unsigned 8-bit multiplications and generate four 16-bit results simultaneously.
RV32 Description:
For the UMUL8
instruction, multiply the unsigned 8-bit data elements of Rs1 with the corresponding unsigned 8-bit data elements of Rs2. For the UMULX8
instruction, multiply the first and second unsigned 8-bit data elements of Rs1 with the second and first unsigned 8-bit data elements of Rs2. At the same time, multiply the third and fourth unsigned 8-bit data elements of Rs1 with the fourth and third unsigned 8-bit data elements of Rs2. The four 16-bit results are then written into an even/odd pair of registers specified by Rd(4,1). Rd(4,1), i.e., d, determines the even/odd pair group of two registers. Specifically, the register pair includes register 2d and 2d+1. The odd 2d+1
register of the pair contains the two 16-bit results calculated from the top part of Rs1 and the even 2d
register of the pair contains the two 16-bit results calculated from the bottom part of Rs1.
RV64 Description:
For the UMUL8
instruction, multiply the unsigned 8-bit data elements of Rs1 with the corresponding unsigned 8-bit data elements of Rs2. For the UMULX8
instruction, multiply the first and second unsigned 8-bit data elements of Rs1 with the second and first unsigned 8-bit data elements of Rs2. At the same time, multiply the third and fourth unsigned 8-bit data elements of Rs1 with the fourth and third unsigned 8-bit data elements of Rs2. The four 16-bit results are then written into Rd. The Rd.W[1] contains the two 16-bit results calculated from the top part of Rs1 and the Rd.W[0] contains the two 16-bit results calculated from the bottom part of Rs1.
Operations:
[in] | a | unsigned int type of value stored in a |
[in] | b | unsigned int type of value stored in b |
Definition at line 12593 of file core_feature_dsp.h.
References __ASM.
__STATIC_FORCEINLINE unsigned long long __RV_UMULX8 | ( | unsigned int | a, |
unsigned int | b | ||
) |
UMULX8 (SIMD Unsigned Crossed 8-bit Multiply)
Type: SIMD
Syntax:
Purpose:
Do unsigned 8-bit multiplications and generate four 16-bit results simultaneously.
RV32 Description:
For the UMUL8
instruction, multiply the unsigned 8-bit data elements of Rs1 with the corresponding unsigned 8-bit data elements of Rs2. For the UMULX8
instruction, multiply the first and second unsigned 8-bit data elements of Rs1 with the second and first unsigned 8-bit data elements of Rs2. At the same time, multiply the third and fourth unsigned 8-bit data elements of Rs1 with the fourth and third unsigned 8-bit data elements of Rs2. The four 16-bit results are then written into an even/odd pair of registers specified by Rd(4,1). Rd(4,1), i.e., d, determines the even/odd pair group of two registers. Specifically, the register pair includes register 2d and 2d+1. The odd 2d+1
register of the pair contains the two 16-bit results calculated from the top part of Rs1 and the even 2d
register of the pair contains the two 16-bit results calculated from the bottom part of Rs1.
RV64 Description:
For the UMUL8
instruction, multiply the unsigned 8-bit data elements of Rs1 with the corresponding unsigned 8-bit data elements of Rs2. For the UMULX8
instruction, multiply the first and second unsigned 8-bit data elements of Rs1 with the second and first unsigned 8-bit data elements of Rs2. At the same time, multiply the third and fourth unsigned 8-bit data elements of Rs1 with the fourth and third unsigned 8-bit data elements of Rs2. The four 16-bit results are then written into Rd. The Rd.W[1] contains the two 16-bit results calculated from the top part of Rs1 and the Rd.W[0] contains the two 16-bit results calculated from the bottom part of Rs1.
Operations:
[in] | a | unsigned int type of value stored in a |
[in] | b | unsigned int type of value stored in b |
Definition at line 12677 of file core_feature_dsp.h.
References __ASM.