SIMD 8-bit Multiply Instructions
- __STATIC_FORCEINLINE unsigned long __RV_KHM8 (unsigned long a, unsigned long b)
- __STATIC_FORCEINLINE unsigned long __RV_KHMX8 (unsigned long a, unsigned long b)
- __STATIC_FORCEINLINE unsigned long long __RV_SMUL8 (unsigned int a, unsigned int b)
- __STATIC_FORCEINLINE unsigned long long __RV_SMULX8 (unsigned int a, unsigned int b)
- __STATIC_FORCEINLINE unsigned long long __RV_UMUL8 (unsigned int a, unsigned int b)
- __STATIC_FORCEINLINE unsigned long long __RV_UMULX8 (unsigned int a, unsigned int b)
- group NMSIS_Core_DSP_Intrinsic_SIMD_8B_MULTIPLY
SIMD 8-bit Multiply Instructions.
there are 6 SIMD 8-bit Multiply instructions.
Functions
- __STATIC_FORCEINLINE unsigned long __RV_KHM8 (unsigned long a, unsigned long b)
KHM8 (SIMD Signed Saturating Q7 Multiply)
Type: SIMD
Syntax:
KHM8 Rd, Rs1, Rs2 KHMX8 Rd, Rs1, Rs2
Purpose
:
Do Q7xQ7 element multiplications simultaneously. The Q14 results are then reduced to Q7 numbers again.
Description
:
For the
KHM8
instruction, multiply the top 8-bit Q7 content of 16-bit chunks in Rs1 with the top 8-bit Q7 content of 16-bit chunks in Rs2. At the same time, multiply the bottom 8-bit Q7 content of 16-bit chunks in Rs1 with the bottom 8-bit Q7 content of 16-bit chunks in Rs2. For theKHMX16
instruction, multiply the top 8-bit Q7 content of 16-bit chunks in Rs1 with the bottom 8-bit Q7 content of 16-bit chunks in Rs2. At the same time, multiply the bottom 8-bit Q7 content of 16-bit chunks in Rs1 with the top 8-bit Q7 content of 16-bit chunks in Rs2. The Q14 results are then right-shifted 7-bits and saturated into Q7 values. The Q7 results are then written into Rd. When both the two Q7 inputs of a multiplication are 0x80, saturation will happen. The result will be saturated to 0x7F and the overflow flag OV will be set.Operations:
if (is `KHM8`) { op1t = Rs1.B[x+1]; op2t = Rs2.B[x+1]; // top op1b = Rs1.B[x]; op2b = Rs2.B[x]; // bottom } else if (is `KHMX8`) { op1t = Rs1.H[x+1]; op2t = Rs2.H[x]; // Rs1 top op1b = Rs1.H[x]; op2b = Rs2.H[x+1]; // Rs1 bottom } for ((aop,bop,res) in [(op1t,op2t,rest), (op1b,op2b,resb)]) { if (0x80 != aop | 0x80 != bop) { res = (aop s* bop) >> 7; } else { res= 0x7F; OV = 1; } } Rd.H[x/2] = concat(rest, resb); for RV32, x=0,2 for RV64, x=0,2,4,6
- Parameters
a – [in] unsigned long type of value stored in a
b – [in] unsigned long type of value stored in b
- Returns
value stored in unsigned long type
- __STATIC_FORCEINLINE unsigned long __RV_KHMX8 (unsigned long a, unsigned long b)
KHMX8 (SIMD Signed Saturating Crossed Q7 Multiply)
Type: SIMD
Syntax:
KHM8 Rd, Rs1, Rs2 KHMX8 Rd, Rs1, Rs2
Purpose
:
Do Q7xQ7 element multiplications simultaneously. The Q14 results are then reduced to Q7 numbers again.
Description
:
For the
KHM8
instruction, multiply the top 8-bit Q7 content of 16-bit chunks in Rs1 with the top 8-bit Q7 content of 16-bit chunks in Rs2. At the same time, multiply the bottom 8-bit Q7 content of 16-bit chunks in Rs1 with the bottom 8-bit Q7 content of 16-bit chunks in Rs2. For theKHMX16
instruction, multiply the top 8-bit Q7 content of 16-bit chunks in Rs1 with the bottom 8-bit Q7 content of 16-bit chunks in Rs2. At the same time, multiply the bottom 8-bit Q7 content of 16-bit chunks in Rs1 with the top 8-bit Q7 content of 16-bit chunks in Rs2. The Q14 results are then right-shifted 7-bits and saturated into Q7 values. The Q7 results are then written into Rd. When both the two Q7 inputs of a multiplication are 0x80, saturation will happen. The result will be saturated to 0x7F and the overflow flag OV will be set.Operations:
if (is `KHM8`) { op1t = Rs1.B[x+1]; op2t = Rs2.B[x+1]; // top op1b = Rs1.B[x]; op2b = Rs2.B[x]; // bottom } else if (is `KHMX8`) { op1t = Rs1.H[x+1]; op2t = Rs2.H[x]; // Rs1 top op1b = Rs1.H[x]; op2b = Rs2.H[x+1]; // Rs1 bottom } for ((aop,bop,res) in [(op1t,op2t,rest), (op1b,op2b,resb)]) { if (0x80 != aop | 0x80 != bop) { res = (aop s* bop) >> 7; } else { res= 0x7F; OV = 1; } } Rd.H[x/2] = concat(rest, resb); for RV32, x=0,2 for RV64, x=0,2,4,6
- Parameters
a – [in] unsigned long type of value stored in a
b – [in] unsigned long type of value stored in b
- Returns
value stored in unsigned long type
- __STATIC_FORCEINLINE unsigned long long __RV_SMUL8 (unsigned int a, unsigned int b)
SMUL8 (SIMD Signed 8-bit Multiply)
Type: SIMD
Syntax:
SMUL8 Rd, Rs1, Rs2 SMULX8 Rd, Rs1, Rs2
Purpose
:
Do signed 8-bit multiplications and generate four 16-bit results simultaneously.
RV32 Description
:
For the
SMUL8
instruction, multiply the 8-bit data elements of Rs1 with the corresponding 8-bit data elements of Rs2. For theSMULX8
instruction, multiply the first and second 8-bit data elements of Rs1 with the second and first 8-bit data elements of Rs2. At the same time, multiply the third and fourth 8-bit data elements of Rs1 with the fourth and third 8-bit data elements of Rs2. The four 16-bit results are then written into an even/odd pair of registers specified by Rd(4,1). Rd(4,1), i.e., d, determines the even/odd pair group of two registers. Specifically, the register pair includes register 2d and 2d+1. The odd2d+1
register of the pair contains the two 16-bit results calculated from the top part of Rs1 and the even2d
register of the pair contains the two 16-bit results calculated from the bottom part of Rs1.RV64 Description
:
For the
SMUL8
instruction, multiply the 8-bit data elements of Rs1 with the corresponding 8-bit data elements of Rs2. For theSMULX8
instruction, multiply the first and second 8-bit data elements of Rs1 with the second and first 8-bit data elements of Rs2. At the same time, multiply the third and fourth 8-bit data elements of Rs1 with the fourth and third 8-bit data elements of Rs2. The four 16-bit results are then written into Rd. The Rd.W[1] contains the two 16-bit results calculated from the top part of Rs1 and the Rd.W[0] contains the two 16-bit results calculated from the bottom part of Rs1.Operations:
* RV32: if (is `SMUL8`) { op1t[x/2] = Rs1.B[x+1]; op2t[x/2] = Rs2.B[x+1]; // top op1b[x/2] = Rs1.B[x]; op2b[x/2] = Rs2.B[x]; // bottom } else if (is `SMULX8`) { op1t[x/2] = Rs1.B[x+1]; op2t[x/2] = Rs2.B[x]; // Rs1 top op1b[x/2] = Rs1.B[x]; op2b[x/2] = Rs2.B[x+1]; // Rs1 bottom } rest[x/2] = op1t[x/2] s* op2t[x/2]; resb[x/2] = op1b[x/2] s* op2b[x/2]; t_L = CONCAT(Rd(4,1),1'b0); t_H = CONCAT(Rd(4,1),1'b1); R[t_H].H[1] = rest[1]; R[t_H].H[0] = resb[1]; R[t_L].H[1] = rest[0]; R[t_L].H[0] = resb[0]; x = 0 and 2 * RV64: if (is `SMUL8`) { op1t[x/2] = Rs1.B[x+1]; op2t[x/2] = Rs2.B[x+1]; // top op1b[x/2] = Rs1.B[x]; op2b[x/2] = Rs2.B[x]; // bottom } else if (is `SMULX8`) { op1t[x/2] = Rs1.B[x+1]; op2t[x/2] = Rs2.B[x]; // Rs1 top op1b[x/2] = Rs1.B[x]; op2b[x/2] = Rs2.B[x+1]; // Rs1 bottom } rest[x/2] = op1t[x/2] s* op2t[x/2]; resb[x/2] = op1b[x/2] s* op2b[x/2]; t_L = CONCAT(Rd(4,1),1'b0); t_H = CONCAT(Rd(4,1),1'b1); Rd.W[1].H[1] = rest[1]; Rd.W[1].H[0] = resb[1]; Rd.W[0].H[1] = rest[0]; Rd.W[0].H[0] = resb[0]; x = 0 and 2
- Parameters
a – [in] unsigned int type of value stored in a
b – [in] unsigned int type of value stored in b
- Returns
value stored in unsigned long long type
- __STATIC_FORCEINLINE unsigned long long __RV_SMULX8 (unsigned int a, unsigned int b)
SMULX8 (SIMD Signed Crossed 8-bit Multiply)
Type: SIMD
Syntax:
SMUL8 Rd, Rs1, Rs2 SMULX8 Rd, Rs1, Rs2
Purpose
:
Do signed 8-bit multiplications and generate four 16-bit results simultaneously.
RV32 Description
:
For the
SMUL8
instruction, multiply the 8-bit data elements of Rs1 with the corresponding 8-bit data elements of Rs2. For theSMULX8
instruction, multiply the first and second 8-bit data elements of Rs1 with the second and first 8-bit data elements of Rs2. At the same time, multiply the third and fourth 8-bit data elements of Rs1 with the fourth and third 8-bit data elements of Rs2. The four 16-bit results are then written into an even/odd pair of registers specified by Rd(4,1). Rd(4,1), i.e., d, determines the even/odd pair group of two registers. Specifically, the register pair includes register 2d and 2d+1. The odd2d+1
register of the pair contains the two 16-bit results calculated from the top part of Rs1 and the even2d
register of the pair contains the two 16-bit results calculated from the bottom part of Rs1.RV64 Description
:
For the
SMUL8
instruction, multiply the 8-bit data elements of Rs1 with the corresponding 8-bit data elements of Rs2. For theSMULX8
instruction, multiply the first and second 8-bit data elements of Rs1 with the second and first 8-bit data elements of Rs2. At the same time, multiply the third and fourth 8-bit data elements of Rs1 with the fourth and third 8-bit data elements of Rs2. The four 16-bit results are then written into Rd. The Rd.W[1] contains the two 16-bit results calculated from the top part of Rs1 and the Rd.W[0] contains the two 16-bit results calculated from the bottom part of Rs1.Operations:
* RV32: if (is `SMUL8`) { op1t[x/2] = Rs1.B[x+1]; op2t[x/2] = Rs2.B[x+1]; // top op1b[x/2] = Rs1.B[x]; op2b[x/2] = Rs2.B[x]; // bottom } else if (is `SMULX8`) { op1t[x/2] = Rs1.B[x+1]; op2t[x/2] = Rs2.B[x]; // Rs1 top op1b[x/2] = Rs1.B[x]; op2b[x/2] = Rs2.B[x+1]; // Rs1 bottom } rest[x/2] = op1t[x/2] s* op2t[x/2]; resb[x/2] = op1b[x/2] s* op2b[x/2]; t_L = CONCAT(Rd(4,1),1'b0); t_H = CONCAT(Rd(4,1),1'b1); R[t_H].H[1] = rest[1]; R[t_H].H[0] = resb[1]; R[t_L].H[1] = rest[0]; R[t_L].H[0] = resb[0]; x = 0 and 2 * RV64: if (is `SMUL8`) { op1t[x/2] = Rs1.B[x+1]; op2t[x/2] = Rs2.B[x+1]; // top op1b[x/2] = Rs1.B[x]; op2b[x/2] = Rs2.B[x]; // bottom } else if (is `SMULX8`) { op1t[x/2] = Rs1.B[x+1]; op2t[x/2] = Rs2.B[x]; // Rs1 top op1b[x/2] = Rs1.B[x]; op2b[x/2] = Rs2.B[x+1]; // Rs1 bottom } rest[x/2] = op1t[x/2] s* op2t[x/2]; resb[x/2] = op1b[x/2] s* op2b[x/2]; t_L = CONCAT(Rd(4,1),1'b0); t_H = CONCAT(Rd(4,1),1'b1); Rd.W[1].H[1] = rest[1]; Rd.W[1].H[0] = resb[1]; Rd.W[0].H[1] = rest[0]; Rd.W[0].H[0] = resb[0]; x = 0 and 2
- Parameters
a – [in] unsigned int type of value stored in a
b – [in] unsigned int type of value stored in b
- Returns
value stored in unsigned long long type
- __STATIC_FORCEINLINE unsigned long long __RV_UMUL8 (unsigned int a, unsigned int b)
UMUL8 (SIMD Unsigned 8-bit Multiply)
Type: SIMD
Syntax:
UMUL8 Rd, Rs1, Rs2 UMULX8 Rd, Rs1, Rs2
Purpose
:
Do unsigned 8-bit multiplications and generate four 16-bit results simultaneously.
RV32 Description
:
For the
UMUL8
instruction, multiply the unsigned 8-bit data elements of Rs1 with the corresponding unsigned 8-bit data elements of Rs2. For theUMULX8
instruction, multiply the first and second unsigned 8-bit data elements of Rs1 with the second and first unsigned 8-bit data elements of Rs2. At the same time, multiply the third and fourth unsigned 8-bit data elements of Rs1 with the fourth and third unsigned 8-bit data elements of Rs2. The four 16-bit results are then written into an even/odd pair of registers specified by Rd(4,1). Rd(4,1), i.e., d, determines the even/odd pair group of two registers. Specifically, the register pair includes register 2d and 2d+1. The odd2d+1
register of the pair contains the two 16-bit results calculated from the top part of Rs1 and the even2d
register of the pair contains the two 16-bit results calculated from the bottom part of Rs1.RV64 Description
:
For the
UMUL8
instruction, multiply the unsigned 8-bit data elements of Rs1 with the corresponding unsigned 8-bit data elements of Rs2. For theUMULX8
instruction, multiply the first and second unsigned 8-bit data elements of Rs1 with the second and first unsigned 8-bit data elements of Rs2. At the same time, multiply the third and fourth unsigned 8-bit data elements of Rs1 with the fourth and third unsigned 8-bit data elements of Rs2. The four 16-bit results are then written into Rd. The Rd.W[1] contains the two 16-bit results calculated from the top part of Rs1 and the Rd.W[0] contains the two 16-bit results calculated from the bottom part of Rs1.Operations:
* RV32: if (is `UMUL8`) { op1t[x/2] = Rs1.B[x+1]; op2t[x/2] = Rs2.B[x+1]; // top op1b[x/2] = Rs1.B[x]; op2b[x/2] = Rs2.B[x]; // bottom } else if (is `UMULX8`) { op1t[x/2] = Rs1.B[x+1]; op2t[x/2] = Rs2.B[x]; // Rs1 top op1b[x/2] = Rs1.B[x]; op2b[x/2] = Rs2.B[x+1]; // Rs1 bottom } rest[x/2] = op1t[x/2] u* op2t[x/2]; resb[x/2] = op1b[x/2] u* op2b[x/2]; t_L = CONCAT(Rd(4,1),1'b0); t_H = CONCAT(Rd(4,1),1'b1); R[t_H].H[1] = rest[1]; R[t_H].H[0] = resb[1]; R[t_L].H[1] = rest[0]; R[t_L].H[0] = resb[0]; x = 0 and 2 * RV64: if (is `UMUL8`) { op1t[x/2] = Rs1.B[x+1]; op2t[x/2] = Rs2.B[x+1]; // top op1b[x/2] = Rs1.B[x]; op2b[x/2] = Rs2.B[x]; // bottom } else if (is `UMULX8`) { op1t[x/2] = Rs1.B[x+1]; op2t[x/2] = Rs2.B[x]; // Rs1 top op1b[x/2] = Rs1.B[x]; op2b[x/2] = Rs2.B[x+1]; // Rs1 bottom } rest[x/2] = op1t[x/2] u* op2t[x/2]; resb[x/2] = op1b[x/2] u* op2b[x/2]; t_L = CONCAT(Rd(4,1),1'b0); t_H = CONCAT(Rd(4,1),1'b1); Rd.W[1].H[1] = rest[1]; Rd.W[1].H[0] = resb[1]; Rd.W[0].H[1] = rest[0]; Rd.W[0].H[0] = resb[0]; x = 0 and 2
- Parameters
a – [in] unsigned int type of value stored in a
b – [in] unsigned int type of value stored in b
- Returns
value stored in unsigned long long type
- __STATIC_FORCEINLINE unsigned long long __RV_UMULX8 (unsigned int a, unsigned int b)
UMULX8 (SIMD Unsigned Crossed 8-bit Multiply)
Type: SIMD
Syntax:
UMUL8 Rd, Rs1, Rs2 UMULX8 Rd, Rs1, Rs2
Purpose
:
Do unsigned 8-bit multiplications and generate four 16-bit results simultaneously.
RV32 Description
:
For the
UMUL8
instruction, multiply the unsigned 8-bit data elements of Rs1 with the corresponding unsigned 8-bit data elements of Rs2. For theUMULX8
instruction, multiply the first and second unsigned 8-bit data elements of Rs1 with the second and first unsigned 8-bit data elements of Rs2. At the same time, multiply the third and fourth unsigned 8-bit data elements of Rs1 with the fourth and third unsigned 8-bit data elements of Rs2. The four 16-bit results are then written into an even/odd pair of registers specified by Rd(4,1). Rd(4,1), i.e., d, determines the even/odd pair group of two registers. Specifically, the register pair includes register 2d and 2d+1. The odd2d+1
register of the pair contains the two 16-bit results calculated from the top part of Rs1 and the even2d
register of the pair contains the two 16-bit results calculated from the bottom part of Rs1.RV64 Description
:
For the
UMUL8
instruction, multiply the unsigned 8-bit data elements of Rs1 with the corresponding unsigned 8-bit data elements of Rs2. For theUMULX8
instruction, multiply the first and second unsigned 8-bit data elements of Rs1 with the second and first unsigned 8-bit data elements of Rs2. At the same time, multiply the third and fourth unsigned 8-bit data elements of Rs1 with the fourth and third unsigned 8-bit data elements of Rs2. The four 16-bit results are then written into Rd. The Rd.W[1] contains the two 16-bit results calculated from the top part of Rs1 and the Rd.W[0] contains the two 16-bit results calculated from the bottom part of Rs1.Operations:
* RV32: if (is `UMUL8`) { op1t[x/2] = Rs1.B[x+1]; op2t[x/2] = Rs2.B[x+1]; // top op1b[x/2] = Rs1.B[x]; op2b[x/2] = Rs2.B[x]; // bottom } else if (is `UMULX8`) { op1t[x/2] = Rs1.B[x+1]; op2t[x/2] = Rs2.B[x]; // Rs1 top op1b[x/2] = Rs1.B[x]; op2b[x/2] = Rs2.B[x+1]; // Rs1 bottom } rest[x/2] = op1t[x/2] u* op2t[x/2]; resb[x/2] = op1b[x/2] u* op2b[x/2]; t_L = CONCAT(Rd(4,1),1'b0); t_H = CONCAT(Rd(4,1),1'b1); R[t_H].H[1] = rest[1]; R[t_H].H[0] = resb[1]; R[t_L].H[1] = rest[0]; R[t_L].H[0] = resb[0]; x = 0 and 2 * RV64: if (is `UMUL8`) { op1t[x/2] = Rs1.B[x+1]; op2t[x/2] = Rs2.B[x+1]; // top op1b[x/2] = Rs1.B[x]; op2b[x/2] = Rs2.B[x]; // bottom } else if (is `UMULX8`) { op1t[x/2] = Rs1.B[x+1]; op2t[x/2] = Rs2.B[x]; // Rs1 top op1b[x/2] = Rs1.B[x]; op2b[x/2] = Rs2.B[x+1]; // Rs1 bottom } rest[x/2] = op1t[x/2] u* op2t[x/2]; resb[x/2] = op1b[x/2] u* op2b[x/2]; t_L = CONCAT(Rd(4,1),1'b0); t_H = CONCAT(Rd(4,1),1'b1); Rd.W[1].H[1] = rest[1]; Rd.W[1].H[0] = resb[1]; Rd.W[0].H[1] = rest[0]; Rd.W[0].H[0] = resb[0]; x = 0 and 2
- Parameters
a – [in] unsigned int type of value stored in a
b – [in] unsigned int type of value stored in b
- Returns
value stored in unsigned long long type