SIMD 16-bit Multiply Instructions
- __STATIC_FORCEINLINE unsigned long __RV_KHM16 (unsigned long a, unsigned long b)
- __STATIC_FORCEINLINE unsigned long __RV_KHMX16 (unsigned long a, unsigned long b)
- __STATIC_FORCEINLINE unsigned long long __RV_SMUL16 (unsigned int a, unsigned int b)
- __STATIC_FORCEINLINE unsigned long long __RV_SMULX16 (unsigned int a, unsigned int b)
- __STATIC_FORCEINLINE unsigned long long __RV_UMUL16 (unsigned int a, unsigned int b)
- __STATIC_FORCEINLINE unsigned long long __RV_UMULX16 (unsigned int a, unsigned int b)
- group NMSIS_Core_DSP_Intrinsic_SIMD_16B_MULTIPLY
SIMD 16-bit Multiply Instructions.
there are 6 SIMD 16-bit Multiply instructions.
Functions
- __STATIC_FORCEINLINE unsigned long __RV_KHM16 (unsigned long a, unsigned long b)
KHM16 (SIMD Signed Saturating Q15 Multiply)
Type: SIMD
Syntax:
KHM16 Rd, Rs1, Rs2 KHMX16 Rd, Rs1, Rs2
Purpose
:
Do Q15xQ15 element multiplications simultaneously. The Q30 results are then reduced to Q15 numbers again.
Description
:
For the
KHM16
instruction, multiply the top 16-bit Q15 content of 32-bit chunks in Rs1 with the top 16-bit Q15 content of 32-bit chunks in Rs2. At the same time, multiply the bottom 16-bit Q15 content of 32-bit chunks in Rs1 with the bottom 16-bit Q15 content of 32-bit chunks in Rs2. For theKHMX16
instruction, multiply the top 16-bit Q15 content of 32-bit chunks in Rs1 with the bottom 16-bit Q15 content of 32-bit chunks in Rs2. At the same time, multiply the bottom 16-bit Q15 content of 32-bit chunks in Rs1 with the top 16-bit Q15 content of 32-bit chunks in Rs2. The Q30 results are then right-shifted 15-bits and saturated into Q15 values. The Q15 results are then written into Rd. When both the two Q15 inputs of a multiplication are 0x8000, saturation will happen. The result will be saturated to 0x7FFF and the overflow flag OV will be set.Operations:
if (is `KHM16`) { op1t = Rs1.H[x+1]; op2t = Rs2.H[x+1]; // top op1b = Rs1.H[x]; op2b = Rs2.H[x]; // bottom } else if (is `KHMX16`) { op1t = Rs1.H[x+1]; op2t = Rs2.H[x]; // Rs1 top op1b = Rs1.H[x]; op2b = Rs2.H[x+1]; // Rs1 bottom } for ((aop,bop,res) in [(op1t,op2t,rest), (op1b,op2b,resb)]) { if (0x8000 != aop | 0x8000 != bop) { res = (aop s* bop) >> 15; } else { res= 0x7FFF; OV = 1; } } Rd.W[x/2] = concat(rest, resb); for RV32: x=0 for RV64: x=0,2
- Parameters
a – [in] unsigned long type of value stored in a
b – [in] unsigned long type of value stored in b
- Returns
value stored in unsigned long type
- __STATIC_FORCEINLINE unsigned long __RV_KHMX16 (unsigned long a, unsigned long b)
KHMX16 (SIMD Signed Saturating Crossed Q15 Multiply)
Type: SIMD
Syntax:
KHM16 Rd, Rs1, Rs2 KHMX16 Rd, Rs1, Rs2
Purpose
:
Do Q15xQ15 element multiplications simultaneously. The Q30 results are then reduced to Q15 numbers again.
Description
:
For the
KHM16
instruction, multiply the top 16-bit Q15 content of 32-bit chunks in Rs1 with the top 16-bit Q15 content of 32-bit chunks in Rs2. At the same time, multiply the bottom 16-bit Q15 content of 32-bit chunks in Rs1 with the bottom 16-bit Q15 content of 32-bit chunks in Rs2. For theKHMX16
instruction, multiply the top 16-bit Q15 content of 32-bit chunks in Rs1 with the bottom 16-bit Q15 content of 32-bit chunks in Rs2. At the same time, multiply the bottom 16-bit Q15 content of 32-bit chunks in Rs1 with the top 16-bit Q15 content of 32-bit chunks in Rs2. The Q30 results are then right-shifted 15-bits and saturated into Q15 values. The Q15 results are then written into Rd. When both the two Q15 inputs of a multiplication are 0x8000, saturation will happen. The result will be saturated to 0x7FFF and the overflow flag OV will be set.Operations:
if (is `KHM16`) { op1t = Rs1.H[x+1]; op2t = Rs2.H[x+1]; // top op1b = Rs1.H[x]; op2b = Rs2.H[x]; // bottom } else if (is `KHMX16`) { op1t = Rs1.H[x+1]; op2t = Rs2.H[x]; // Rs1 top op1b = Rs1.H[x]; op2b = Rs2.H[x+1]; // Rs1 bottom } for ((aop,bop,res) in [(op1t,op2t,rest), (op1b,op2b,resb)]) { if (0x8000 != aop | 0x8000 != bop) { res = (aop s* bop) >> 15; } else { res= 0x7FFF; OV = 1; } } Rd.W[x/2] = concat(rest, resb); for RV32: x=0 for RV64: x=0,2
- Parameters
a – [in] unsigned long type of value stored in a
b – [in] unsigned long type of value stored in b
- Returns
value stored in unsigned long type
- __STATIC_FORCEINLINE unsigned long long __RV_SMUL16 (unsigned int a, unsigned int b)
SMUL16 (SIMD Signed 16-bit Multiply)
Type: SIMD
Syntax:
SMUL16 Rd, Rs1, Rs2 SMULX16 Rd, Rs1, Rs2
Purpose
:
Do signed 16-bit multiplications and generate two 32-bit results simultaneously.
RV32 Description
:
For the
SMUL16
instruction, multiply the top 16-bit Q15 content of Rs1 with the top 16-bit Q15 content of Rs2. At the same time, multiply the bottom 16-bit Q15 content of Rs1 with the bottom 16-bit Q15 content of Rs2. For theSMULX16
instruction, multiply the top 16-bit Q15 content of Rs1 with the bottom 16-bit Q15 content of Rs2. At the same time, multiply the bottom 16-bit Q15 content of Rs1 with the top 16- bit Q15 content of Rs2. The two Q30 results are then written into an even/odd pair of registers specified by Rd(4,1). Rd(4,1), i.e., d, determines the even/odd pair group of two registers. Specifically, the register pair includes register 2d and 2d+1. The odd2d+1
register of the pair contains the 32-bit result calculated from the top part of Rs1 and the even2d
register of the pair contains the 32-bit result calculated from the bottom part of Rs1.RV64 Description
:
For the
SMUL16
instruction, multiply the top 16-bit Q15 content of the lower 32-bit word in Rs1 with the top 16-bit Q15 content of the lower 32-bit word in Rs2. At the same time, multiply the bottom 16-bit Q15 content of the lower 32-bit word in Rs1 with the bottom 16-bit Q15 content of the lower 32-bit word in Rs2. For theSMULX16
instruction, multiply the top 16-bit Q15 content of the lower 32-bit word in Rs1 with the bottom 16-bit Q15 content of the lower 32-bit word in Rs2. At the same time, multiply the bottom 16-bit Q15 content of the lower 32-bit word in Rs1 with the top 16-bit Q15 content of the lower 32-bit word in Rs2. The two 32-bit Q30 results are then written into Rd. The result calculated from the top 16-bit of the lower 32-bit word in Rs1 is written to Rd.W[1]. And the result calculated from the bottom 16-bit of the lower 32-bit word in Rs1 is written to Rd.W[0]Operations:
* RV32: if (is `SMUL16`) { op1t = Rs1.H[1]; op2t = Rs2.H[1]; // top op1b = Rs1.H[0]; op2b = Rs2.H[0]; // bottom } else if (is `SMULX16`) { op1t = Rs1.H[1]; op2t = Rs2.H[0]; // Rs1 top op1b = Rs1.H[0]; op2b = Rs2.H[1]; // Rs1 bottom } for ((aop,bop,res) in [(op1t,op2t,rest), (op1b,op2b,resb)]) { res = aop s* bop; } t_L = CONCAT(Rd(4,1),1'b0); t_H = CONCAT(Rd(4,1),1'b1); R[t_H] = rest; R[t_L] = resb; * RV64: if (is `SMUL16`) { op1t = Rs1.H[1]; op2t = Rs2.H[1]; // top op1b = Rs1.H[0]; op2b = Rs2.H[0]; // bottom } else if (is `SMULX16`) { op1t = Rs1.H[1]; op2t = Rs2.H[0]; // Rs1 top op1b = Rs1.H[0]; op2b = Rs2.H[1]; // Rs1 bottom } for ((aop,bop,res) in [(op1t,op2t,rest), (op1b,op2b,resb)]) { res = aop s* bop; } Rd.W[1] = rest; Rd.W[0] = resb;
- Parameters
a – [in] unsigned int type of value stored in a
b – [in] unsigned int type of value stored in b
- Returns
value stored in unsigned long long type
- __STATIC_FORCEINLINE unsigned long long __RV_SMULX16 (unsigned int a, unsigned int b)
SMULX16 (SIMD Signed Crossed 16-bit Multiply)
Type: SIMD
Syntax:
SMUL16 Rd, Rs1, Rs2 SMULX16 Rd, Rs1, Rs2
Purpose
:
Do signed 16-bit multiplications and generate two 32-bit results simultaneously.
RV32 Description
:
For the
SMUL16
instruction, multiply the top 16-bit Q15 content of Rs1 with the top 16-bit Q15 content of Rs2. At the same time, multiply the bottom 16-bit Q15 content of Rs1 with the bottom 16-bit Q15 content of Rs2. For theSMULX16
instruction, multiply the top 16-bit Q15 content of Rs1 with the bottom 16-bit Q15 content of Rs2. At the same time, multiply the bottom 16-bit Q15 content of Rs1 with the top 16- bit Q15 content of Rs2. The two Q30 results are then written into an even/odd pair of registers specified by Rd(4,1). Rd(4,1), i.e., d, determines the even/odd pair group of two registers. Specifically, the register pair includes register 2d and 2d+1. The odd2d+1
register of the pair contains the 32-bit result calculated from the top part of Rs1 and the even2d
register of the pair contains the 32-bit result calculated from the bottom part of Rs1.RV64 Description
:
For the
SMUL16
instruction, multiply the top 16-bit Q15 content of the lower 32-bit word in Rs1 with the top 16-bit Q15 content of the lower 32-bit word in Rs2. At the same time, multiply the bottom 16-bit Q15 content of the lower 32-bit word in Rs1 with the bottom 16-bit Q15 content of the lower 32-bit word in Rs2. For theSMULX16
instruction, multiply the top 16-bit Q15 content of the lower 32-bit word in Rs1 with the bottom 16-bit Q15 content of the lower 32-bit word in Rs2. At the same time, multiply the bottom 16-bit Q15 content of the lower 32-bit word in Rs1 with the top 16-bit Q15 content of the lower 32-bit word in Rs2. The two 32-bit Q30 results are then written into Rd. The result calculated from the top 16-bit of the lower 32-bit word in Rs1 is written to Rd.W[1]. And the result calculated from the bottom 16-bit of the lower 32-bit word in Rs1 is written to Rd.W[0]Operations:
* RV32: if (is `SMUL16`) { op1t = Rs1.H[1]; op2t = Rs2.H[1]; // top op1b = Rs1.H[0]; op2b = Rs2.H[0]; // bottom } else if (is `SMULX16`) { op1t = Rs1.H[1]; op2t = Rs2.H[0]; // Rs1 top op1b = Rs1.H[0]; op2b = Rs2.H[1]; // Rs1 bottom } for ((aop,bop,res) in [(op1t,op2t,rest), (op1b,op2b,resb)]) { res = aop s* bop; } t_L = CONCAT(Rd(4,1),1'b0); t_H = CONCAT(Rd(4,1),1'b1); R[t_H] = rest; R[t_L] = resb; * RV64: if (is `SMUL16`) { op1t = Rs1.H[1]; op2t = Rs2.H[1]; // top op1b = Rs1.H[0]; op2b = Rs2.H[0]; // bottom } else if (is `SMULX16`) { op1t = Rs1.H[1]; op2t = Rs2.H[0]; // Rs1 top op1b = Rs1.H[0]; op2b = Rs2.H[1]; // Rs1 bottom } for ((aop,bop,res) in [(op1t,op2t,rest), (op1b,op2b,resb)]) { res = aop s* bop; } Rd.W[1] = rest; Rd.W[0] = resb;
- Parameters
a – [in] unsigned int type of value stored in a
b – [in] unsigned int type of value stored in b
- Returns
value stored in unsigned long long type
- __STATIC_FORCEINLINE unsigned long long __RV_UMUL16 (unsigned int a, unsigned int b)
UMUL16 (SIMD Unsigned 16-bit Multiply)
Type: SIMD
Syntax:
UMUL16 Rd, Rs1, Rs2 UMULX16 Rd, Rs1, Rs2
Purpose
:
Do unsigned 16-bit multiplications and generate two 32-bit results simultaneously.
RV32 Description
:
For the
UMUL16
instruction, multiply the top 16-bit U16 content of Rs1 with the top 16-bit U16 content of Rs2. At the same time, multiply the bottom 16-bit U16 content of Rs1 with the bottom 16-bit U16 content of Rs2. For theUMULX16
instruction, multiply the top 16-bit U16 content of Rs1 with the bottom 16-bit U16 content of Rs2. At the same time, multiply the bottom 16-bit U16 content of Rs1 with the top 16- bit U16 content of Rs2. The two U32 results are then written into an even/odd pair of registers specified by Rd(4,1). Rd(4,1), i.e., d, determines the even/odd pair group of two registers. Specifically, the register pair includes register 2d and 2d+1. The odd2d+1
register of the pair contains the 32-bit result calculated from the top part of Rs1 and the even2d
register of the pair contains the 32-bit result calculated from the bottom part of Rs1.RV64 Description
:
For the
UMUL16
instruction, multiply the top 16-bit U16 content of the lower 32-bit word in Rs1 with the top 16-bit U16 content of the lower 32-bit word in Rs2. At the same time, multiply the bottom 16-bit U16 content of the lower 32-bit word in Rs1 with the bottom 16-bit U16 content of the lower 32-bit word in Rs2. For theUMULX16
instruction, multiply the top 16-bit U16 content of the lower 32-bit word in Rs1 with the bottom 16-bit U16 content of the lower 32-bit word in Rs2. At the same time, multiply the bottom 16-bit U16 content of the lower 32-bit word in Rs1 with the top 16-bit U16 content of the lower 32-bit word in Rs2. The two 32-bit U32 results are then written into Rd. The result calculated from the top 16-bit of the lower 32-bit word in Rs1 is written to Rd.W[1]. And the result calculated from the bottom 16-bit of the lower 32-bit word in Rs1 is written to Rd.W[0]Operations:
* RV32: if (is `UMUL16`) { op1t = Rs1.H[1]; op2t = Rs2.H[1]; // top op1b = Rs1.H[0]; op2b = Rs2.H[0]; // bottom } else if (is `UMULX16`) { op1t = Rs1.H[1]; op2t = Rs2.H[0]; // Rs1 top op1b = Rs1.H[0]; op2b = Rs2.H[1]; // Rs1 bottom } for ((aop,bop,res) in [(op1t,op2t,rest), (op1b,op2b,resb)]) { res = aop u* bop; } t_L = CONCAT(Rd(4,1),1'b0); t_H = CONCAT(Rd(4,1),1'b1); R[t_H] = rest; R[t_L] = resb; * RV64: if (is `UMUL16`) { op1t = Rs1.H[1]; op2t = Rs2.H[1]; // top op1b = Rs1.H[0]; op2b = Rs2.H[0]; // bottom } else if (is `UMULX16`) { op1t = Rs1.H[1]; op2t = Rs2.H[0]; // Rs1 top op1b = Rs1.H[0]; op2b = Rs2.H[1]; // Rs1 bottom } for ((aop,bop,res) in [(op1t,op2t,rest), (op1b,op2b,resb)]) { res = aop u* bop; } Rd.W[1] = rest; Rd.W[0] = resb;
- Parameters
a – [in] unsigned int type of value stored in a
b – [in] unsigned int type of value stored in b
- Returns
value stored in unsigned long long type
- __STATIC_FORCEINLINE unsigned long long __RV_UMULX16 (unsigned int a, unsigned int b)
UMULX16 (SIMD Unsigned Crossed 16-bit Multiply)
Type: SIMD
Syntax:
UMUL16 Rd, Rs1, Rs2 UMULX16 Rd, Rs1, Rs2
Purpose
:
Do unsigned 16-bit multiplications and generate two 32-bit results simultaneously.
RV32 Description
:
For the
UMUL16
instruction, multiply the top 16-bit U16 content of Rs1 with the top 16-bit U16 content of Rs2. At the same time, multiply the bottom 16-bit U16 content of Rs1 with the bottom 16-bit U16 content of Rs2. For theUMULX16
instruction, multiply the top 16-bit U16 content of Rs1 with the bottom 16-bit U16 content of Rs2. At the same time, multiply the bottom 16-bit U16 content of Rs1 with the top 16- bit U16 content of Rs2. The two U32 results are then written into an even/odd pair of registers specified by Rd(4,1). Rd(4,1), i.e., d, determines the even/odd pair group of two registers. Specifically, the register pair includes register 2d and 2d+1. The odd2d+1
register of the pair contains the 32-bit result calculated from the top part of Rs1 and the even2d
register of the pair contains the 32-bit result calculated from the bottom part of Rs1.RV64 Description
:
For the
UMUL16
instruction, multiply the top 16-bit U16 content of the lower 32-bit word in Rs1 with the top 16-bit U16 content of the lower 32-bit word in Rs2. At the same time, multiply the bottom 16-bit U16 content of the lower 32-bit word in Rs1 with the bottom 16-bit U16 content of the lower 32-bit word in Rs2. For theUMULX16
instruction, multiply the top 16-bit U16 content of the lower 32-bit word in Rs1 with the bottom 16-bit U16 content of the lower 32-bit word in Rs2. At the same time, multiply the bottom 16-bit U16 content of the lower 32-bit word in Rs1 with the top 16-bit U16 content of the lower 32-bit word in Rs2. The two 32-bit U32 results are then written into Rd. The result calculated from the top 16-bit of the lower 32-bit word in Rs1 is written to Rd.W[1]. And the result calculated from the bottom 16-bit of the lower 32-bit word in Rs1 is written to Rd.W[0]Operations:
* RV32: if (is `UMUL16`) { op1t = Rs1.H[1]; op2t = Rs2.H[1]; // top op1b = Rs1.H[0]; op2b = Rs2.H[0]; // bottom } else if (is `UMULX16`) { op1t = Rs1.H[1]; op2t = Rs2.H[0]; // Rs1 top op1b = Rs1.H[0]; op2b = Rs2.H[1]; // Rs1 bottom } for ((aop,bop,res) in [(op1t,op2t,rest), (op1b,op2b,resb)]) { res = aop u* bop; } t_L = CONCAT(Rd(4,1),1'b0); t_H = CONCAT(Rd(4,1),1'b1); R[t_H] = rest; R[t_L] = resb; * RV64: if (is `UMUL16`) { op1t = Rs1.H[1]; op2t = Rs2.H[1]; // top op1b = Rs1.H[0]; op2b = Rs2.H[0]; // bottom } else if (is `UMULX16`) { op1t = Rs1.H[1]; op2t = Rs2.H[0]; // Rs1 top op1b = Rs1.H[0]; op2b = Rs2.H[1]; // Rs1 bottom } for ((aop,bop,res) in [(op1t,op2t,rest), (op1b,op2b,resb)]) { res = aop u* bop; } Rd.W[1] = rest; Rd.W[0] = resb;
- Parameters
a – [in] unsigned int type of value stored in a
b – [in] unsigned int type of value stored in b
- Returns
value stored in unsigned long long type