SIMD 16bit Multiply InstructionsΒΆ

__STATIC_FORCEINLINE unsigned long __RV_KHM16(unsigned long a, unsigned long b)

__STATIC_FORCEINLINE unsigned long __RV_KHMX16(unsigned long a, unsigned long b)

__STATIC_FORCEINLINE unsigned long long __RV_SMUL16(unsigned int a, unsigned int b)

__STATIC_FORCEINLINE unsigned long long __RV_SMULX16(unsigned int a, unsigned int b)

__STATIC_FORCEINLINE unsigned long long __RV_UMUL16(unsigned int a, unsigned int b)

__STATIC_FORCEINLINE unsigned long long __RV_UMULX16(unsigned int a, unsigned int b)

group
NMSIS_Core_DSP_Intrinsic_SIMD_16B_MULTIPLY
SIMD 16bit Multiply Instructions.
there are 6 SIMD 16bit Multiply instructions.
Functions

__STATIC_FORCEINLINE unsigned long __RV_KHM16(unsigned long a, unsigned long b)
KHM16 (SIMD Signed Saturating Q15 Multiply)
Type: SIMD
Syntax:
KHM16 Rd, Rs1, Rs2 KHMX16 Rd, Rs1, Rs2
Purpose:
Do Q15xQ15 element multiplications simultaneously. The Q30 results are then reduced to Q15 numbers again.
Description:
For the
KHM16
instruction, multiply the top 16bit Q15 content of 32bit chunks in Rs1 with the top 16bit Q15 content of 32bit chunks in Rs2. At the same time, multiply the bottom 16bit Q15 content of 32bit chunks in Rs1 with the bottom 16bit Q15 content of 32bit chunks in Rs2. For theKHMX16
instruction, multiply the top 16bit Q15 content of 32bit chunks in Rs1 with the bottom 16bit Q15 content of 32bit chunks in Rs2. At the same time, multiply the bottom 16bit Q15 content of 32bit chunks in Rs1 with the top 16bit Q15 content of 32bit chunks in Rs2. The Q30 results are then rightshifted 15bits and saturated into Q15 values. The Q15 results are then written into Rd. When both the two Q15 inputs of a multiplication are 0x8000, saturation will happen. The result will be saturated to 0x7FFF and the overflow flag OV will be set.Operations:
if (is `KHM16`) { op1t = Rs1.H[x+1]; op2t = Rs2.H[x+1]; // top op1b = Rs1.H[x]; op2b = Rs2.H[x]; // bottom } else if (is `KHMX16`) { op1t = Rs1.H[x+1]; op2t = Rs2.H[x]; // Rs1 top op1b = Rs1.H[x]; op2b = Rs2.H[x+1]; // Rs1 bottom } for ((aop,bop,res) in [(op1t,op2t,rest), (op1b,op2b,resb)]) { if (0x8000 != aop  0x8000 != bop) { res = (aop s* bop) >> 15; } else { res= 0x7FFF; OV = 1; } } Rd.W[x/2] = concat(rest, resb); for RV32: x=0 for RV64: x=0,2
 Return
value stored in unsigned long type
 Parameters
[in] a
: unsigned long type of value stored in a[in] b
: unsigned long type of value stored in b

__STATIC_FORCEINLINE unsigned long __RV_KHMX16(unsigned long a, unsigned long b)
KHMX16 (SIMD Signed Saturating Crossed Q15 Multiply)
Type: SIMD
Syntax:
KHM16 Rd, Rs1, Rs2 KHMX16 Rd, Rs1, Rs2
Purpose:
Do Q15xQ15 element multiplications simultaneously. The Q30 results are then reduced to Q15 numbers again.
Description:
For the
KHM16
instruction, multiply the top 16bit Q15 content of 32bit chunks in Rs1 with the top 16bit Q15 content of 32bit chunks in Rs2. At the same time, multiply the bottom 16bit Q15 content of 32bit chunks in Rs1 with the bottom 16bit Q15 content of 32bit chunks in Rs2. For theKHMX16
instruction, multiply the top 16bit Q15 content of 32bit chunks in Rs1 with the bottom 16bit Q15 content of 32bit chunks in Rs2. At the same time, multiply the bottom 16bit Q15 content of 32bit chunks in Rs1 with the top 16bit Q15 content of 32bit chunks in Rs2. The Q30 results are then rightshifted 15bits and saturated into Q15 values. The Q15 results are then written into Rd. When both the two Q15 inputs of a multiplication are 0x8000, saturation will happen. The result will be saturated to 0x7FFF and the overflow flag OV will be set.Operations:
if (is `KHM16`) { op1t = Rs1.H[x+1]; op2t = Rs2.H[x+1]; // top op1b = Rs1.H[x]; op2b = Rs2.H[x]; // bottom } else if (is `KHMX16`) { op1t = Rs1.H[x+1]; op2t = Rs2.H[x]; // Rs1 top op1b = Rs1.H[x]; op2b = Rs2.H[x+1]; // Rs1 bottom } for ((aop,bop,res) in [(op1t,op2t,rest), (op1b,op2b,resb)]) { if (0x8000 != aop  0x8000 != bop) { res = (aop s* bop) >> 15; } else { res= 0x7FFF; OV = 1; } } Rd.W[x/2] = concat(rest, resb); for RV32: x=0 for RV64: x=0,2
 Return
value stored in unsigned long type
 Parameters
[in] a
: unsigned long type of value stored in a[in] b
: unsigned long type of value stored in b

__STATIC_FORCEINLINE unsigned long long __RV_SMUL16(unsigned int a, unsigned int b)
SMUL16 (SIMD Signed 16bit Multiply)
Type: SIMD
Syntax:
SMUL16 Rd, Rs1, Rs2 SMULX16 Rd, Rs1, Rs2
Purpose:
Do signed 16bit multiplications and generate two 32bit results simultaneously.
RV32 Description:
For the
SMUL16
instruction, multiply the top 16bit Q15 content of Rs1 with the top 16bit Q15 content of Rs2. At the same time, multiply the bottom 16bit Q15 content of Rs1 with the bottom 16bit Q15 content of Rs2. For theSMULX16
instruction, multiply the top 16bit Q15 content of Rs1 with the bottom 16bit Q15 content of Rs2. At the same time, multiply the bottom 16bit Q15 content of Rs1 with the top 16 bit Q15 content of Rs2. The two Q30 results are then written into an even/odd pair of registers specified by Rd(4,1). Rd(4,1), i.e., d, determines the even/odd pair group of two registers. Specifically, the register pair includes register 2d and 2d+1. The odd2d+1
register of the pair contains the 32bit result calculated from the top part of Rs1 and the even2d
register of the pair contains the 32bit result calculated from the bottom part of Rs1.RV64 Description:
For the
SMUL16
instruction, multiply the top 16bit Q15 content of the lower 32bit word in Rs1 with the top 16bit Q15 content of the lower 32bit word in Rs2. At the same time, multiply the bottom 16bit Q15 content of the lower 32bit word in Rs1 with the bottom 16bit Q15 content of the lower 32bit word in Rs2. For theSMULX16
instruction, multiply the top 16bit Q15 content of the lower 32bit word in Rs1 with the bottom 16bit Q15 content of the lower 32bit word in Rs2. At the same time, multiply the bottom 16bit Q15 content of the lower 32bit word in Rs1 with the top 16bit Q15 content of the lower 32bit word in Rs2. The two 32bit Q30 results are then written into Rd. The result calculated from the top 16bit of the lower 32bit word in Rs1 is written to Rd.W[1]. And the result calculated from the bottom 16bit of the lower 32bit word in Rs1 is written to Rd.W[0]Operations:
* RV32: if (is `SMUL16`) { op1t = Rs1.H[1]; op2t = Rs2.H[1]; // top op1b = Rs1.H[0]; op2b = Rs2.H[0]; // bottom } else if (is `SMULX16`) { op1t = Rs1.H[1]; op2t = Rs2.H[0]; // Rs1 top op1b = Rs1.H[0]; op2b = Rs2.H[1]; // Rs1 bottom } for ((aop,bop,res) in [(op1t,op2t,rest), (op1b,op2b,resb)]) { res = aop s* bop; } t_L = CONCAT(Rd(4,1),1'b0); t_H = CONCAT(Rd(4,1),1'b1); R[t_H] = rest; R[t_L] = resb; * RV64: if (is `SMUL16`) { op1t = Rs1.H[1]; op2t = Rs2.H[1]; // top op1b = Rs1.H[0]; op2b = Rs2.H[0]; // bottom } else if (is `SMULX16`) { op1t = Rs1.H[1]; op2t = Rs2.H[0]; // Rs1 top op1b = Rs1.H[0]; op2b = Rs2.H[1]; // Rs1 bottom } for ((aop,bop,res) in [(op1t,op2t,rest), (op1b,op2b,resb)]) { res = aop s* bop; } Rd.W[1] = rest; Rd.W[0] = resb;
 Return
value stored in unsigned long long type
 Parameters
[in] a
: unsigned int type of value stored in a[in] b
: unsigned int type of value stored in b

__STATIC_FORCEINLINE unsigned long long __RV_SMULX16(unsigned int a, unsigned int b)
SMULX16 (SIMD Signed Crossed 16bit Multiply)
Type: SIMD
Syntax:
SMUL16 Rd, Rs1, Rs2 SMULX16 Rd, Rs1, Rs2
Purpose:
Do signed 16bit multiplications and generate two 32bit results simultaneously.
RV32 Description:
For the
SMUL16
instruction, multiply the top 16bit Q15 content of Rs1 with the top 16bit Q15 content of Rs2. At the same time, multiply the bottom 16bit Q15 content of Rs1 with the bottom 16bit Q15 content of Rs2. For theSMULX16
instruction, multiply the top 16bit Q15 content of Rs1 with the bottom 16bit Q15 content of Rs2. At the same time, multiply the bottom 16bit Q15 content of Rs1 with the top 16 bit Q15 content of Rs2. The two Q30 results are then written into an even/odd pair of registers specified by Rd(4,1). Rd(4,1), i.e., d, determines the even/odd pair group of two registers. Specifically, the register pair includes register 2d and 2d+1. The odd2d+1
register of the pair contains the 32bit result calculated from the top part of Rs1 and the even2d
register of the pair contains the 32bit result calculated from the bottom part of Rs1.RV64 Description:
For the
SMUL16
instruction, multiply the top 16bit Q15 content of the lower 32bit word in Rs1 with the top 16bit Q15 content of the lower 32bit word in Rs2. At the same time, multiply the bottom 16bit Q15 content of the lower 32bit word in Rs1 with the bottom 16bit Q15 content of the lower 32bit word in Rs2. For theSMULX16
instruction, multiply the top 16bit Q15 content of the lower 32bit word in Rs1 with the bottom 16bit Q15 content of the lower 32bit word in Rs2. At the same time, multiply the bottom 16bit Q15 content of the lower 32bit word in Rs1 with the top 16bit Q15 content of the lower 32bit word in Rs2. The two 32bit Q30 results are then written into Rd. The result calculated from the top 16bit of the lower 32bit word in Rs1 is written to Rd.W[1]. And the result calculated from the bottom 16bit of the lower 32bit word in Rs1 is written to Rd.W[0]Operations:
* RV32: if (is `SMUL16`) { op1t = Rs1.H[1]; op2t = Rs2.H[1]; // top op1b = Rs1.H[0]; op2b = Rs2.H[0]; // bottom } else if (is `SMULX16`) { op1t = Rs1.H[1]; op2t = Rs2.H[0]; // Rs1 top op1b = Rs1.H[0]; op2b = Rs2.H[1]; // Rs1 bottom } for ((aop,bop,res) in [(op1t,op2t,rest), (op1b,op2b,resb)]) { res = aop s* bop; } t_L = CONCAT(Rd(4,1),1'b0); t_H = CONCAT(Rd(4,1),1'b1); R[t_H] = rest; R[t_L] = resb; * RV64: if (is `SMUL16`) { op1t = Rs1.H[1]; op2t = Rs2.H[1]; // top op1b = Rs1.H[0]; op2b = Rs2.H[0]; // bottom } else if (is `SMULX16`) { op1t = Rs1.H[1]; op2t = Rs2.H[0]; // Rs1 top op1b = Rs1.H[0]; op2b = Rs2.H[1]; // Rs1 bottom } for ((aop,bop,res) in [(op1t,op2t,rest), (op1b,op2b,resb)]) { res = aop s* bop; } Rd.W[1] = rest; Rd.W[0] = resb;
 Return
value stored in unsigned long long type
 Parameters
[in] a
: unsigned int type of value stored in a[in] b
: unsigned int type of value stored in b

__STATIC_FORCEINLINE unsigned long long __RV_UMUL16(unsigned int a, unsigned int b)
UMUL16 (SIMD Unsigned 16bit Multiply)
Type: SIMD
Syntax:
UMUL16 Rd, Rs1, Rs2 UMULX16 Rd, Rs1, Rs2
Purpose:
Do unsigned 16bit multiplications and generate two 32bit results simultaneously.
RV32 Description:
For the
UMUL16
instruction, multiply the top 16bit U16 content of Rs1 with the top 16bit U16 content of Rs2. At the same time, multiply the bottom 16bit U16 content of Rs1 with the bottom 16bit U16 content of Rs2. For theUMULX16
instruction, multiply the top 16bit U16 content of Rs1 with the bottom 16bit U16 content of Rs2. At the same time, multiply the bottom 16bit U16 content of Rs1 with the top 16 bit U16 content of Rs2. The two U32 results are then written into an even/odd pair of registers specified by Rd(4,1). Rd(4,1), i.e., d, determines the even/odd pair group of two registers. Specifically, the register pair includes register 2d and 2d+1. The odd2d+1
register of the pair contains the 32bit result calculated from the top part of Rs1 and the even2d
register of the pair contains the 32bit result calculated from the bottom part of Rs1.RV64 Description:
For the
UMUL16
instruction, multiply the top 16bit U16 content of the lower 32bit word in Rs1 with the top 16bit U16 content of the lower 32bit word in Rs2. At the same time, multiply the bottom 16bit U16 content of the lower 32bit word in Rs1 with the bottom 16bit U16 content of the lower 32bit word in Rs2. For theUMULX16
instruction, multiply the top 16bit U16 content of the lower 32bit word in Rs1 with the bottom 16bit U16 content of the lower 32bit word in Rs2. At the same time, multiply the bottom 16bit U16 content of the lower 32bit word in Rs1 with the top 16bit U16 content of the lower 32bit word in Rs2. The two 32bit U32 results are then written into Rd. The result calculated from the top 16bit of the lower 32bit word in Rs1 is written to Rd.W[1]. And the result calculated from the bottom 16bit of the lower 32bit word in Rs1 is written to Rd.W[0]Operations:
* RV32: if (is `UMUL16`) { op1t = Rs1.H[1]; op2t = Rs2.H[1]; // top op1b = Rs1.H[0]; op2b = Rs2.H[0]; // bottom } else if (is `UMULX16`) { op1t = Rs1.H[1]; op2t = Rs2.H[0]; // Rs1 top op1b = Rs1.H[0]; op2b = Rs2.H[1]; // Rs1 bottom } for ((aop,bop,res) in [(op1t,op2t,rest), (op1b,op2b,resb)]) { res = aop u* bop; } t_L = CONCAT(Rd(4,1),1'b0); t_H = CONCAT(Rd(4,1),1'b1); R[t_H] = rest; R[t_L] = resb; * RV64: if (is `UMUL16`) { op1t = Rs1.H[1]; op2t = Rs2.H[1]; // top op1b = Rs1.H[0]; op2b = Rs2.H[0]; // bottom } else if (is `UMULX16`) { op1t = Rs1.H[1]; op2t = Rs2.H[0]; // Rs1 top op1b = Rs1.H[0]; op2b = Rs2.H[1]; // Rs1 bottom } for ((aop,bop,res) in [(op1t,op2t,rest), (op1b,op2b,resb)]) { res = aop u* bop; } Rd.W[1] = rest; Rd.W[0] = resb;
 Return
value stored in unsigned long long type
 Parameters
[in] a
: unsigned int type of value stored in a[in] b
: unsigned int type of value stored in b

__STATIC_FORCEINLINE unsigned long long __RV_UMULX16(unsigned int a, unsigned int b)
UMULX16 (SIMD Unsigned Crossed 16bit Multiply)
Type: SIMD
Syntax:
UMUL16 Rd, Rs1, Rs2 UMULX16 Rd, Rs1, Rs2
Purpose:
Do unsigned 16bit multiplications and generate two 32bit results simultaneously.
RV32 Description:
For the
UMUL16
instruction, multiply the top 16bit U16 content of Rs1 with the top 16bit U16 content of Rs2. At the same time, multiply the bottom 16bit U16 content of Rs1 with the bottom 16bit U16 content of Rs2. For theUMULX16
instruction, multiply the top 16bit U16 content of Rs1 with the bottom 16bit U16 content of Rs2. At the same time, multiply the bottom 16bit U16 content of Rs1 with the top 16 bit U16 content of Rs2. The two U32 results are then written into an even/odd pair of registers specified by Rd(4,1). Rd(4,1), i.e., d, determines the even/odd pair group of two registers. Specifically, the register pair includes register 2d and 2d+1. The odd2d+1
register of the pair contains the 32bit result calculated from the top part of Rs1 and the even2d
register of the pair contains the 32bit result calculated from the bottom part of Rs1.RV64 Description:
For the
UMUL16
instruction, multiply the top 16bit U16 content of the lower 32bit word in Rs1 with the top 16bit U16 content of the lower 32bit word in Rs2. At the same time, multiply the bottom 16bit U16 content of the lower 32bit word in Rs1 with the bottom 16bit U16 content of the lower 32bit word in Rs2. For theUMULX16
instruction, multiply the top 16bit U16 content of the lower 32bit word in Rs1 with the bottom 16bit U16 content of the lower 32bit word in Rs2. At the same time, multiply the bottom 16bit U16 content of the lower 32bit word in Rs1 with the top 16bit U16 content of the lower 32bit word in Rs2. The two 32bit U32 results are then written into Rd. The result calculated from the top 16bit of the lower 32bit word in Rs1 is written to Rd.W[1]. And the result calculated from the bottom 16bit of the lower 32bit word in Rs1 is written to Rd.W[0]Operations:
* RV32: if (is `UMUL16`) { op1t = Rs1.H[1]; op2t = Rs2.H[1]; // top op1b = Rs1.H[0]; op2b = Rs2.H[0]; // bottom } else if (is `UMULX16`) { op1t = Rs1.H[1]; op2t = Rs2.H[0]; // Rs1 top op1b = Rs1.H[0]; op2b = Rs2.H[1]; // Rs1 bottom } for ((aop,bop,res) in [(op1t,op2t,rest), (op1b,op2b,resb)]) { res = aop u* bop; } t_L = CONCAT(Rd(4,1),1'b0); t_H = CONCAT(Rd(4,1),1'b1); R[t_H] = rest; R[t_L] = resb; * RV64: if (is `UMUL16`) { op1t = Rs1.H[1]; op2t = Rs2.H[1]; // top op1b = Rs1.H[0]; op2b = Rs2.H[0]; // bottom } else if (is `UMULX16`) { op1t = Rs1.H[1]; op2t = Rs2.H[0]; // Rs1 top op1b = Rs1.H[0]; op2b = Rs2.H[1]; // Rs1 bottom } for ((aop,bop,res) in [(op1t,op2t,rest), (op1b,op2b,resb)]) { res = aop u* bop; } Rd.W[1] = rest; Rd.W[0] = resb;
 Return
value stored in unsigned long long type
 Parameters
[in] a
: unsigned int type of value stored in a[in] b
: unsigned int type of value stored in b
