SIMD 8bit Multiply InstructionsΒΆ

__STATIC_FORCEINLINE unsigned long __RV_KHM8(unsigned long a, unsigned long b)

__STATIC_FORCEINLINE unsigned long __RV_KHMX8(unsigned long a, unsigned long b)

__STATIC_FORCEINLINE unsigned long long __RV_SMUL8(unsigned int a, unsigned int b)

__STATIC_FORCEINLINE unsigned long long __RV_SMULX8(unsigned int a, unsigned int b)

__STATIC_FORCEINLINE unsigned long long __RV_UMUL8(unsigned int a, unsigned int b)

__STATIC_FORCEINLINE unsigned long long __RV_UMULX8(unsigned int a, unsigned int b)

group
NMSIS_Core_DSP_Intrinsic_SIMD_8B_MULTIPLY
SIMD 8bit Multiply Instructions.
there are 6 SIMD 8bit Multiply instructions.
Functions

__STATIC_FORCEINLINE unsigned long __RV_KHM8(unsigned long a, unsigned long b)
KHM8 (SIMD Signed Saturating Q7 Multiply)
Type: SIMD
Syntax:
KHM8 Rd, Rs1, Rs2 KHMX8 Rd, Rs1, Rs2
Purpose:
Do Q7xQ7 element multiplications simultaneously. The Q14 results are then reduced to Q7 numbers again.
Description:
For the
KHM8
instruction, multiply the top 8bit Q7 content of 16bit chunks in Rs1 with the top 8bit Q7 content of 16bit chunks in Rs2. At the same time, multiply the bottom 8bit Q7 content of 16bit chunks in Rs1 with the bottom 8bit Q7 content of 16bit chunks in Rs2. For theKHMX16
instruction, multiply the top 8bit Q7 content of 16bit chunks in Rs1 with the bottom 8bit Q7 content of 16bit chunks in Rs2. At the same time, multiply the bottom 8bit Q7 content of 16bit chunks in Rs1 with the top 8bit Q7 content of 16bit chunks in Rs2. The Q14 results are then rightshifted 7bits and saturated into Q7 values. The Q7 results are then written into Rd. When both the two Q7 inputs of a multiplication are 0x80, saturation will happen. The result will be saturated to 0x7F and the overflow flag OV will be set.Operations:
if (is `KHM8`) { op1t = Rs1.B[x+1]; op2t = Rs2.B[x+1]; // top op1b = Rs1.B[x]; op2b = Rs2.B[x]; // bottom } else if (is `KHMX8`) { op1t = Rs1.H[x+1]; op2t = Rs2.H[x]; // Rs1 top op1b = Rs1.H[x]; op2b = Rs2.H[x+1]; // Rs1 bottom } for ((aop,bop,res) in [(op1t,op2t,rest), (op1b,op2b,resb)]) { if (0x80 != aop  0x80 != bop) { res = (aop s* bop) >> 7; } else { res= 0x7F; OV = 1; } } Rd.H[x/2] = concat(rest, resb); for RV32, x=0,2 for RV64, x=0,2,4,6
 Return
value stored in unsigned long type
 Parameters
[in] a
: unsigned long type of value stored in a[in] b
: unsigned long type of value stored in b

__STATIC_FORCEINLINE unsigned long __RV_KHMX8(unsigned long a, unsigned long b)
KHMX8 (SIMD Signed Saturating Crossed Q7 Multiply)
Type: SIMD
Syntax:
KHM8 Rd, Rs1, Rs2 KHMX8 Rd, Rs1, Rs2
Purpose:
Do Q7xQ7 element multiplications simultaneously. The Q14 results are then reduced to Q7 numbers again.
Description:
For the
KHM8
instruction, multiply the top 8bit Q7 content of 16bit chunks in Rs1 with the top 8bit Q7 content of 16bit chunks in Rs2. At the same time, multiply the bottom 8bit Q7 content of 16bit chunks in Rs1 with the bottom 8bit Q7 content of 16bit chunks in Rs2. For theKHMX16
instruction, multiply the top 8bit Q7 content of 16bit chunks in Rs1 with the bottom 8bit Q7 content of 16bit chunks in Rs2. At the same time, multiply the bottom 8bit Q7 content of 16bit chunks in Rs1 with the top 8bit Q7 content of 16bit chunks in Rs2. The Q14 results are then rightshifted 7bits and saturated into Q7 values. The Q7 results are then written into Rd. When both the two Q7 inputs of a multiplication are 0x80, saturation will happen. The result will be saturated to 0x7F and the overflow flag OV will be set.Operations:
if (is `KHM8`) { op1t = Rs1.B[x+1]; op2t = Rs2.B[x+1]; // top op1b = Rs1.B[x]; op2b = Rs2.B[x]; // bottom } else if (is `KHMX8`) { op1t = Rs1.H[x+1]; op2t = Rs2.H[x]; // Rs1 top op1b = Rs1.H[x]; op2b = Rs2.H[x+1]; // Rs1 bottom } for ((aop,bop,res) in [(op1t,op2t,rest), (op1b,op2b,resb)]) { if (0x80 != aop  0x80 != bop) { res = (aop s* bop) >> 7; } else { res= 0x7F; OV = 1; } } Rd.H[x/2] = concat(rest, resb); for RV32, x=0,2 for RV64, x=0,2,4,6
 Return
value stored in unsigned long type
 Parameters
[in] a
: unsigned long type of value stored in a[in] b
: unsigned long type of value stored in b

__STATIC_FORCEINLINE unsigned long long __RV_SMUL8(unsigned int a, unsigned int b)
SMUL8 (SIMD Signed 8bit Multiply)
Type: SIMD
Syntax:
SMUL8 Rd, Rs1, Rs2 SMULX8 Rd, Rs1, Rs2
Purpose:
Do signed 8bit multiplications and generate four 16bit results simultaneously.
RV32 Description:
For the
SMUL8
instruction, multiply the 8bit data elements of Rs1 with the corresponding 8bit data elements of Rs2. For theSMULX8
instruction, multiply the first and second 8bit data elements of Rs1 with the second and first 8bit data elements of Rs2. At the same time, multiply the third and fourth 8bit data elements of Rs1 with the fourth and third 8bit data elements of Rs2. The four 16bit results are then written into an even/odd pair of registers specified by Rd(4,1). Rd(4,1), i.e., d, determines the even/odd pair group of two registers. Specifically, the register pair includes register 2d and 2d+1. The odd2d+1
register of the pair contains the two 16bit results calculated from the top part of Rs1 and the even2d
register of the pair contains the two 16bit results calculated from the bottom part of Rs1.RV64 Description:
For the
SMUL8
instruction, multiply the 8bit data elements of Rs1 with the corresponding 8bit data elements of Rs2. For theSMULX8
instruction, multiply the first and second 8bit data elements of Rs1 with the second and first 8bit data elements of Rs2. At the same time, multiply the third and fourth 8bit data elements of Rs1 with the fourth and third 8bit data elements of Rs2. The four 16bit results are then written into Rd. The Rd.W[1] contains the two 16bit results calculated from the top part of Rs1 and the Rd.W[0] contains the two 16bit results calculated from the bottom part of Rs1.Operations:
* RV32: if (is `SMUL8`) { op1t[x/2] = Rs1.B[x+1]; op2t[x/2] = Rs2.B[x+1]; // top op1b[x/2] = Rs1.B[x]; op2b[x/2] = Rs2.B[x]; // bottom } else if (is `SMULX8`) { op1t[x/2] = Rs1.B[x+1]; op2t[x/2] = Rs2.B[x]; // Rs1 top op1b[x/2] = Rs1.B[x]; op2b[x/2] = Rs2.B[x+1]; // Rs1 bottom } rest[x/2] = op1t[x/2] s* op2t[x/2]; resb[x/2] = op1b[x/2] s* op2b[x/2]; t_L = CONCAT(Rd(4,1),1'b0); t_H = CONCAT(Rd(4,1),1'b1); R[t_H].H[1] = rest[1]; R[t_H].H[0] = resb[1]; R[t_L].H[1] = rest[0]; R[t_L].H[0] = resb[0]; x = 0 and 2 * RV64: if (is `SMUL8`) { op1t[x/2] = Rs1.B[x+1]; op2t[x/2] = Rs2.B[x+1]; // top op1b[x/2] = Rs1.B[x]; op2b[x/2] = Rs2.B[x]; // bottom } else if (is `SMULX8`) { op1t[x/2] = Rs1.B[x+1]; op2t[x/2] = Rs2.B[x]; // Rs1 top op1b[x/2] = Rs1.B[x]; op2b[x/2] = Rs2.B[x+1]; // Rs1 bottom } rest[x/2] = op1t[x/2] s* op2t[x/2]; resb[x/2] = op1b[x/2] s* op2b[x/2]; t_L = CONCAT(Rd(4,1),1'b0); t_H = CONCAT(Rd(4,1),1'b1); Rd.W[1].H[1] = rest[1]; Rd.W[1].H[0] = resb[1]; Rd.W[0].H[1] = rest[0]; Rd.W[0].H[0] = resb[0]; x = 0 and 2
 Return
value stored in unsigned long long type
 Parameters
[in] a
: unsigned int type of value stored in a[in] b
: unsigned int type of value stored in b

__STATIC_FORCEINLINE unsigned long long __RV_SMULX8(unsigned int a, unsigned int b)
SMULX8 (SIMD Signed Crossed 8bit Multiply)
Type: SIMD
Syntax:
SMUL8 Rd, Rs1, Rs2 SMULX8 Rd, Rs1, Rs2
Purpose:
Do signed 8bit multiplications and generate four 16bit results simultaneously.
RV32 Description:
For the
SMUL8
instruction, multiply the 8bit data elements of Rs1 with the corresponding 8bit data elements of Rs2. For theSMULX8
instruction, multiply the first and second 8bit data elements of Rs1 with the second and first 8bit data elements of Rs2. At the same time, multiply the third and fourth 8bit data elements of Rs1 with the fourth and third 8bit data elements of Rs2. The four 16bit results are then written into an even/odd pair of registers specified by Rd(4,1). Rd(4,1), i.e., d, determines the even/odd pair group of two registers. Specifically, the register pair includes register 2d and 2d+1. The odd2d+1
register of the pair contains the two 16bit results calculated from the top part of Rs1 and the even2d
register of the pair contains the two 16bit results calculated from the bottom part of Rs1.RV64 Description:
For the
SMUL8
instruction, multiply the 8bit data elements of Rs1 with the corresponding 8bit data elements of Rs2. For theSMULX8
instruction, multiply the first and second 8bit data elements of Rs1 with the second and first 8bit data elements of Rs2. At the same time, multiply the third and fourth 8bit data elements of Rs1 with the fourth and third 8bit data elements of Rs2. The four 16bit results are then written into Rd. The Rd.W[1] contains the two 16bit results calculated from the top part of Rs1 and the Rd.W[0] contains the two 16bit results calculated from the bottom part of Rs1.Operations:
* RV32: if (is `SMUL8`) { op1t[x/2] = Rs1.B[x+1]; op2t[x/2] = Rs2.B[x+1]; // top op1b[x/2] = Rs1.B[x]; op2b[x/2] = Rs2.B[x]; // bottom } else if (is `SMULX8`) { op1t[x/2] = Rs1.B[x+1]; op2t[x/2] = Rs2.B[x]; // Rs1 top op1b[x/2] = Rs1.B[x]; op2b[x/2] = Rs2.B[x+1]; // Rs1 bottom } rest[x/2] = op1t[x/2] s* op2t[x/2]; resb[x/2] = op1b[x/2] s* op2b[x/2]; t_L = CONCAT(Rd(4,1),1'b0); t_H = CONCAT(Rd(4,1),1'b1); R[t_H].H[1] = rest[1]; R[t_H].H[0] = resb[1]; R[t_L].H[1] = rest[0]; R[t_L].H[0] = resb[0]; x = 0 and 2 * RV64: if (is `SMUL8`) { op1t[x/2] = Rs1.B[x+1]; op2t[x/2] = Rs2.B[x+1]; // top op1b[x/2] = Rs1.B[x]; op2b[x/2] = Rs2.B[x]; // bottom } else if (is `SMULX8`) { op1t[x/2] = Rs1.B[x+1]; op2t[x/2] = Rs2.B[x]; // Rs1 top op1b[x/2] = Rs1.B[x]; op2b[x/2] = Rs2.B[x+1]; // Rs1 bottom } rest[x/2] = op1t[x/2] s* op2t[x/2]; resb[x/2] = op1b[x/2] s* op2b[x/2]; t_L = CONCAT(Rd(4,1),1'b0); t_H = CONCAT(Rd(4,1),1'b1); Rd.W[1].H[1] = rest[1]; Rd.W[1].H[0] = resb[1]; Rd.W[0].H[1] = rest[0]; Rd.W[0].H[0] = resb[0]; x = 0 and 2
 Return
value stored in unsigned long long type
 Parameters
[in] a
: unsigned int type of value stored in a[in] b
: unsigned int type of value stored in b

__STATIC_FORCEINLINE unsigned long long __RV_UMUL8(unsigned int a, unsigned int b)
UMUL8 (SIMD Unsigned 8bit Multiply)
Type: SIMD
Syntax:
UMUL8 Rd, Rs1, Rs2 UMULX8 Rd, Rs1, Rs2
Purpose:
Do unsigned 8bit multiplications and generate four 16bit results simultaneously.
RV32 Description:
For the
UMUL8
instruction, multiply the unsigned 8bit data elements of Rs1 with the corresponding unsigned 8bit data elements of Rs2. For theUMULX8
instruction, multiply the first and second unsigned 8bit data elements of Rs1 with the second and first unsigned 8bit data elements of Rs2. At the same time, multiply the third and fourth unsigned 8bit data elements of Rs1 with the fourth and third unsigned 8bit data elements of Rs2. The four 16bit results are then written into an even/odd pair of registers specified by Rd(4,1). Rd(4,1), i.e., d, determines the even/odd pair group of two registers. Specifically, the register pair includes register 2d and 2d+1. The odd2d+1
register of the pair contains the two 16bit results calculated from the top part of Rs1 and the even2d
register of the pair contains the two 16bit results calculated from the bottom part of Rs1.RV64 Description:
For the
UMUL8
instruction, multiply the unsigned 8bit data elements of Rs1 with the corresponding unsigned 8bit data elements of Rs2. For theUMULX8
instruction, multiply the first and second unsigned 8bit data elements of Rs1 with the second and first unsigned 8bit data elements of Rs2. At the same time, multiply the third and fourth unsigned 8bit data elements of Rs1 with the fourth and third unsigned 8bit data elements of Rs2. The four 16bit results are then written into Rd. The Rd.W[1] contains the two 16bit results calculated from the top part of Rs1 and the Rd.W[0] contains the two 16bit results calculated from the bottom part of Rs1.Operations:
* RV32: if (is `UMUL8`) { op1t[x/2] = Rs1.B[x+1]; op2t[x/2] = Rs2.B[x+1]; // top op1b[x/2] = Rs1.B[x]; op2b[x/2] = Rs2.B[x]; // bottom } else if (is `UMULX8`) { op1t[x/2] = Rs1.B[x+1]; op2t[x/2] = Rs2.B[x]; // Rs1 top op1b[x/2] = Rs1.B[x]; op2b[x/2] = Rs2.B[x+1]; // Rs1 bottom } rest[x/2] = op1t[x/2] u* op2t[x/2]; resb[x/2] = op1b[x/2] u* op2b[x/2]; t_L = CONCAT(Rd(4,1),1'b0); t_H = CONCAT(Rd(4,1),1'b1); R[t_H].H[1] = rest[1]; R[t_H].H[0] = resb[1]; R[t_L].H[1] = rest[0]; R[t_L].H[0] = resb[0]; x = 0 and 2 * RV64: if (is `UMUL8`) { op1t[x/2] = Rs1.B[x+1]; op2t[x/2] = Rs2.B[x+1]; // top op1b[x/2] = Rs1.B[x]; op2b[x/2] = Rs2.B[x]; // bottom } else if (is `UMULX8`) { op1t[x/2] = Rs1.B[x+1]; op2t[x/2] = Rs2.B[x]; // Rs1 top op1b[x/2] = Rs1.B[x]; op2b[x/2] = Rs2.B[x+1]; // Rs1 bottom } rest[x/2] = op1t[x/2] u* op2t[x/2]; resb[x/2] = op1b[x/2] u* op2b[x/2]; t_L = CONCAT(Rd(4,1),1'b0); t_H = CONCAT(Rd(4,1),1'b1); Rd.W[1].H[1] = rest[1]; Rd.W[1].H[0] = resb[1]; Rd.W[0].H[1] = rest[0]; Rd.W[0].H[0] = resb[0]; x = 0 and 2
 Return
value stored in unsigned long long type
 Parameters
[in] a
: unsigned int type of value stored in a[in] b
: unsigned int type of value stored in b

__STATIC_FORCEINLINE unsigned long long __RV_UMULX8(unsigned int a, unsigned int b)
UMULX8 (SIMD Unsigned Crossed 8bit Multiply)
Type: SIMD
Syntax:
UMUL8 Rd, Rs1, Rs2 UMULX8 Rd, Rs1, Rs2
Purpose:
Do unsigned 8bit multiplications and generate four 16bit results simultaneously.
RV32 Description:
For the
UMUL8
instruction, multiply the unsigned 8bit data elements of Rs1 with the corresponding unsigned 8bit data elements of Rs2. For theUMULX8
instruction, multiply the first and second unsigned 8bit data elements of Rs1 with the second and first unsigned 8bit data elements of Rs2. At the same time, multiply the third and fourth unsigned 8bit data elements of Rs1 with the fourth and third unsigned 8bit data elements of Rs2. The four 16bit results are then written into an even/odd pair of registers specified by Rd(4,1). Rd(4,1), i.e., d, determines the even/odd pair group of two registers. Specifically, the register pair includes register 2d and 2d+1. The odd2d+1
register of the pair contains the two 16bit results calculated from the top part of Rs1 and the even2d
register of the pair contains the two 16bit results calculated from the bottom part of Rs1.RV64 Description:
For the
UMUL8
instruction, multiply the unsigned 8bit data elements of Rs1 with the corresponding unsigned 8bit data elements of Rs2. For theUMULX8
instruction, multiply the first and second unsigned 8bit data elements of Rs1 with the second and first unsigned 8bit data elements of Rs2. At the same time, multiply the third and fourth unsigned 8bit data elements of Rs1 with the fourth and third unsigned 8bit data elements of Rs2. The four 16bit results are then written into Rd. The Rd.W[1] contains the two 16bit results calculated from the top part of Rs1 and the Rd.W[0] contains the two 16bit results calculated from the bottom part of Rs1.Operations:
* RV32: if (is `UMUL8`) { op1t[x/2] = Rs1.B[x+1]; op2t[x/2] = Rs2.B[x+1]; // top op1b[x/2] = Rs1.B[x]; op2b[x/2] = Rs2.B[x]; // bottom } else if (is `UMULX8`) { op1t[x/2] = Rs1.B[x+1]; op2t[x/2] = Rs2.B[x]; // Rs1 top op1b[x/2] = Rs1.B[x]; op2b[x/2] = Rs2.B[x+1]; // Rs1 bottom } rest[x/2] = op1t[x/2] u* op2t[x/2]; resb[x/2] = op1b[x/2] u* op2b[x/2]; t_L = CONCAT(Rd(4,1),1'b0); t_H = CONCAT(Rd(4,1),1'b1); R[t_H].H[1] = rest[1]; R[t_H].H[0] = resb[1]; R[t_L].H[1] = rest[0]; R[t_L].H[0] = resb[0]; x = 0 and 2 * RV64: if (is `UMUL8`) { op1t[x/2] = Rs1.B[x+1]; op2t[x/2] = Rs2.B[x+1]; // top op1b[x/2] = Rs1.B[x]; op2b[x/2] = Rs2.B[x]; // bottom } else if (is `UMULX8`) { op1t[x/2] = Rs1.B[x+1]; op2t[x/2] = Rs2.B[x]; // Rs1 top op1b[x/2] = Rs1.B[x]; op2b[x/2] = Rs2.B[x+1]; // Rs1 bottom } rest[x/2] = op1t[x/2] u* op2t[x/2]; resb[x/2] = op1b[x/2] u* op2b[x/2]; t_L = CONCAT(Rd(4,1),1'b0); t_H = CONCAT(Rd(4,1),1'b1); Rd.W[1].H[1] = rest[1]; Rd.W[1].H[0] = resb[1]; Rd.W[0].H[1] = rest[0]; Rd.W[0].H[0] = resb[0]; x = 0 and 2
 Return
value stored in unsigned long long type
 Parameters
[in] a
: unsigned int type of value stored in a[in] b
: unsigned int type of value stored in b
