Nuclei N3 SIMD DSP Additional Instructions
- __STATIC_FORCEINLINE unsigned long long __RV_DKMMAC (unsigned long long t, unsigned long long a, unsigned long long b)
- __STATIC_FORCEINLINE unsigned long long __RV_DKMMAC_U (unsigned long long t, unsigned long long a, unsigned long long b)
- __STATIC_FORCEINLINE unsigned long long __RV_DKMMSB (unsigned long long t, unsigned long long a, unsigned long long b)
- __STATIC_FORCEINLINE unsigned long long __RV_DKMMSB_U (unsigned long long t, unsigned long long a, unsigned long long b)
- __STATIC_FORCEINLINE unsigned long long __RV_DKMADA (unsigned long long t, unsigned long long a, unsigned long long b)
- __STATIC_FORCEINLINE unsigned long long __RV_DKMAXDA (unsigned long long t, unsigned long long a, unsigned long long b)
- __STATIC_FORCEINLINE unsigned long long __RV_DKMADS (unsigned long long t, unsigned long long a, unsigned long long b)
- __STATIC_FORCEINLINE unsigned long long __RV_DKMADRS (unsigned long long t, unsigned long long a, unsigned long long b)
- __STATIC_FORCEINLINE unsigned long long __RV_DKMAXDS (unsigned long long t, unsigned long long a, unsigned long long b)
- __STATIC_FORCEINLINE unsigned long long __RV_DKMSDA (unsigned long long t, unsigned long long a, unsigned long long b)
- __STATIC_FORCEINLINE unsigned long long __RV_DKMSXDA (unsigned long long t, unsigned long long a, unsigned long long b)
- __STATIC_FORCEINLINE unsigned long long __RV_DSMAQA (unsigned long long t, unsigned long long a, unsigned long long b)
- __STATIC_FORCEINLINE unsigned long long __RV_DSMAQA_SU (unsigned long long t, unsigned long long a, unsigned long long b)
- __STATIC_FORCEINLINE unsigned long long __RV_DUMAQA (unsigned long long t, unsigned long long a, unsigned long long b)
- __STATIC_FORCEINLINE long long __RV_DKMDA32 (unsigned long long a, unsigned long long b)
- __STATIC_FORCEINLINE long long __RV_DKMXDA32 (unsigned long long a, unsigned long long b)
- __STATIC_FORCEINLINE long long __RV_DKMADA32 (long long t, unsigned long long a, unsigned long long b)
- __STATIC_FORCEINLINE long long __RV_DKMAXDA32 (long long t, unsigned long long a, unsigned long long b)
- __STATIC_FORCEINLINE long long __RV_DKMADS32 (long long t, unsigned long long a, unsigned long long b)
- __STATIC_FORCEINLINE long long __RV_DKMADRS32 (long long t, unsigned long long a, unsigned long long b)
- __STATIC_FORCEINLINE long long __RV_DKMAXDS32 (long long t, unsigned long long a, unsigned long long b)
- __STATIC_FORCEINLINE long long __RV_DKMSDA32 (long long t, unsigned long long a, unsigned long long b)
- __STATIC_FORCEINLINE long long __RV_DKMSXDA32 (long long t, unsigned long long a, unsigned long long b)
- __STATIC_FORCEINLINE long long __RV_DSMDS32 (unsigned long long a, unsigned long long b)
- __STATIC_FORCEINLINE long long __RV_DSMDRS32 (unsigned long long a, unsigned long long b)
- __STATIC_FORCEINLINE long long __RV_DSMXDS32 (unsigned long long a, unsigned long long b)
- __STATIC_FORCEINLINE long long __RV_DSMALDA (long long t, unsigned long long a, unsigned long long b)
- __STATIC_FORCEINLINE long long __RV_DSMALXDA (long long t, unsigned long long a, unsigned long long b)
- __STATIC_FORCEINLINE long long __RV_DSMALDS (long long t, unsigned long long a, unsigned long long b)
- __STATIC_FORCEINLINE long long __RV_DSMALDRS (long long t, unsigned long long a, unsigned long long b)
- __STATIC_FORCEINLINE long long __RV_DSMALXDS (long long t, unsigned long long a, unsigned long long b)
- __STATIC_FORCEINLINE long long __RV_DSMSLDA (long long t, unsigned long long a, unsigned long long b)
- __STATIC_FORCEINLINE long long __RV_DSMSLXDA (long long t, unsigned long long a, unsigned long long b)
- __STATIC_FORCEINLINE long long __RV_DDSMAQA (long long t, unsigned long long a, unsigned long long b)
- __STATIC_FORCEINLINE long long __RV_DDSMAQA_SU (long long t, unsigned long long a, unsigned long long b)
- __STATIC_FORCEINLINE long long __RV_DDUMAQA (long long t, unsigned long long a, unsigned long long b)
- __STATIC_FORCEINLINE long __RV_DSMA32_U (unsigned long long a, unsigned long long b)
- __STATIC_FORCEINLINE long __RV_DSMXS32_U (unsigned long long a, unsigned long long b)
- __STATIC_FORCEINLINE long __RV_DSMXA32_U (unsigned long long a, unsigned long long b)
- __STATIC_FORCEINLINE long __RV_DSMS32_U (unsigned long long a, unsigned long long b)
- __STATIC_FORCEINLINE long __RV_DSMADA16 (long long t, unsigned long long a, unsigned long long b)
- __STATIC_FORCEINLINE long __RV_DSMAXDA16 (long long t, unsigned long long a, unsigned long long b)
- __STATIC_FORCEINLINE unsigned long long __RV_DKSMS32_U (unsigned long long t, unsigned long long a, unsigned long long b)
- __STATIC_FORCEINLINE long __RV_DMADA32 (long long t, unsigned long long a, unsigned long long b)
- __STATIC_FORCEINLINE long long __RV_DSMALBB (long long t, unsigned long long a, unsigned long long b)
- __STATIC_FORCEINLINE long long __RV_DSMALBT (long long t, unsigned long long a, unsigned long long b)
- __STATIC_FORCEINLINE long long __RV_DSMALTT (long long t, unsigned long long a, unsigned long long b)
- __STATIC_FORCEINLINE long long __RV_DKMABB32 (long long t, unsigned long long a, unsigned long long b)
- __STATIC_FORCEINLINE long long __RV_DKMABT32 (long long t, unsigned long long a, unsigned long long b)
- __STATIC_FORCEINLINE long long __RV_DKMATT32 (long long t, unsigned long long a, unsigned long long b)
- group Nuclei N3 SIMD DSP Additional Instructions
(RV32 only)Nuclei Customized N3 DSP Instructions
This is Nuclei customized DSP N3 instructions only for RV32
Functions
- __STATIC_FORCEINLINE unsigned long long __RV_DKMMAC (unsigned long long t, unsigned long long a, unsigned long long b)
DKMMAC (64-bit MSW 32x32 Signed Multiply and Saturating Add)
Type: SIMD
Syntax:
DKMMAC Rd, Rs1, Rs2 # Rd, Rs1, Rs2 are all even/odd pair of registers
Purpose
:
Do MSW 32x32 element signed multiplications and saturating addition simultaneously. The results are written into Rd.
Description
:
This instruction multiplies the signed 32-bit elements of Rs1 with the signed 32-bit elements of Rs2 and adds the most significant 32-bit multiplication results with the signed 32-bit elements of Rd. If the addition result is beyond the Q31 number range (-2^31 <= Q31 <= 2^31-1), it is saturated to the range and the OV bit is set to 1. The results after saturation are written to Rd. The .u form of the instruction additionally rounds up the most significant 32-bit of the 64-bit multiplication results by adding a 1 to bit 31 of the results.
Operations:
op1t = Rs1.W[x+1]; op2t = Rs2.W[x+1]; op3t = Rd.W[x+1] // top op1b = Rs1.W[x]; op2b = Rs2.W[x]; op3b = Rd.W[x] // bottom for ((aop,bop,dop,res) in [(op1t,op2t,op3t,rest), (op1b,op2b,op3b,resb)]) { res = sat.q31(dop + (aop s* bop)[63:32]); } Rd = concat(rest, resb); x=0
- Parameters:
t – [in] unsigned long long type of value stored in t
a – [in] unsigned long long type of value stored in a
b – [in] unsigned long long type of value stored in b
- Returns:
value stored in unsigned long long type
- __STATIC_FORCEINLINE unsigned long long __RV_DKMMAC_U (unsigned long long t, unsigned long long a, unsigned long long b)
DKMMAC.u (64-bit MSW 32x32 Unsigned Multiply and Saturating Add)
Type: SIMD
Syntax:
DKMMAC.u Rd, Rs1, Rs2 # Rd, Rs1, Rs2 are all even/odd pair of registers
Purpose
:
Do MSW 32x32 element unsigned multiplications and saturating addition simultaneously. The results are written into Rd.
Description
:
This instruction multiplies the signed 32-bit elements of Rs1 with the signed 32-bit elements of Rs2 and adds the most significant 32-bit multiplication results with the signed 32-bit elements of Rd. If the addition result is beyond the Q31 number range (-2^31 <= Q31 <= 2^31-1), it is saturated to the range and the OV bit is set to 1. The results after saturation are written to Rd. The .u form of the instruction additionally rounds up the most significant 32-bit of the 64-bit multiplication results by adding a 1 to bit 31 of the results.
Operations:
op1t = Rs1.W[x+1]; op2t = Rs2.W[x+1]; op3t = Rd.W[x+1] // top op1b = Rs1.W[x]; op2b = Rs2.W[x]; op3b = Rd.W[x] // bottom for ((aop,bop,dop,res) in [(op1t,op2t,op3t,rest), (op1b,op2b,op3b,resb)]) { res = sat.q31(dop + RUND(aop u* bop)[63:32]); } Rd = concat(rest, resb); x=0
- Parameters:
t – [in] unsigned long long type of value stored in t
a – [in] unsigned long long type of value stored in a
b – [in] unsigned long long type of value stored in b
- Returns:
value stored in unsigned long long type
- __STATIC_FORCEINLINE unsigned long long __RV_DKMMSB (unsigned long long t, unsigned long long a, unsigned long long b)
DKMMSB (64-bit MSW 32x32 Signed Multiply and Saturating Sub)
Type: SIMD
Syntax:
DKMMSB Rd, Rs1, Rs2 # Rd, Rs1, Rs2 are all even/odd pair of registers
Purpose
:
Do MSW 32x32 element signed multiplications and saturating subtraction simultaneously. The results are written into Rd.
Description
:
This instruction multiplies the signed 32-bit elements of Rs1 with the signed 32-bit elements of Rs2 and subtracts the most significant 32-bit multiplication results from the signed 32-bit elements of Rd. If the subtraction result is beyond the Q31 number range (-2^31 <= Q31 <= 2^31-1), it is saturated to the range and the OV bit is set to 1. The results after saturation are written to Rd. The .u form of the instruction additionally rounds up the most significant 32-bit of the 64-bit multiplication results by adding a 1 to bit 31 of the results.
Operations:
op1t = Rs1.W[x+1]; op2t = Rs2.W[x+1]; op3t = Rd.W[x+1] // top op1b = Rs1.W[x]; op2b = Rs2.W[x]; op3b = Rd.W[x] // bottom for ((aop,bop,dop,res) in [(op1t,op2t,op3t,rest), (op1b,op2b,op3b,resb)]) { res = sat.q31(dop - (aop s* bop)[63:32]); } Rd = concat(rest, resb); x=0
- Parameters:
t – [in] unsigned long long type of value stored in t
a – [in] unsigned long long type of value stored in a
b – [in] unsigned long long type of value stored in b
- Returns:
value stored in unsigned long long type
- __STATIC_FORCEINLINE unsigned long long __RV_DKMMSB_U (unsigned long long t, unsigned long long a, unsigned long long b)
DKMMSB.u (64-bit MSW 32x32 Unsigned Multiply and Saturating Sub)
Type: SIMD
Syntax:
DKMMSB.u Rd, Rs1, Rs2 # Rd, Rs1, Rs2 are all even/odd pair of registers
Purpose
:
Do MSW 32x32 element unsigned multiplications and saturating subtraction simultaneously. The results are written into Rd.
Description
:
This instruction multiplies the signed 32-bit elements of Rs1 with the signed 32-bit elements of Rs2 and subtracts the most significant 32-bit multiplication results from the signed 32-bit elements of Rd. If the subtraction result is beyond the Q31 number range (-2^31 <= Q31 <= 2^31-1), it is saturated to the range and the OV bit is set to 1. The results after saturation are written to Rd. The .u form of the instruction additionally rounds up the most significant 32-bit of the 64-bit multiplication results by adding a 1 to bit 31 of the results.
Operations:
op1t = Rs1.W[x+1]; op2t = Rs2.W[x+1]; op3t = Rd.W[x+1] // top op1b = Rs1.W[x]; op2b = Rs2.W[x]; op3b = Rd.W[x] // bottom for ((aop,bop,dop,res) in [(op1t,op2t,op3t,rest), (op1b,op2b,op3b,resb)]) { res = sat.q31(dop - (aop u* bop)[63:32]); } Rd = concat(rest, resb); x=0
- Parameters:
t – [in] unsigned long long type of value stored in t
a – [in] unsigned long long type of value stored in a
b – [in] unsigned long long type of value stored in b
- Returns:
value stored in unsigned long long type
- __STATIC_FORCEINLINE unsigned long long __RV_DKMADA (unsigned long long t, unsigned long long a, unsigned long long b)
DKMADA (Saturating Signed Multiply Two Halfs and Two Adds)
Type: DSP
Syntax:
DKMADA Rd, Rs1, Rs2
Purpose
:
Do two 16x16 with 32-bit signed double addition simultaneously. The results are written into Rd.
Description
:
It multiplies the bottom 16-bit content of 32-bit elements in Rs1 with the bottom 16-bit content of 32-bit elements in Rs2 and then adds the result to the result of multiplying the top 16-bit content of 32-bit elements in Rs1 with the top 16-bit content of 32-bit elements in Rs2.
Operations:
op1t = Rs1.W[x+1]; op2t = Rs2.W[x+1]; op3t = Rd.W[x+1] // top op1b = Rs1.W[x]; op2b = Rs2.W[x]; op3b = Rd.W[x] // bottom for ((aop,bop,dop,res) in [(op1t,op2t,op3t,rest), (op1b,op2b,op3b,resb)]) { mul1 = aop.H[1] s* bop.H[1]; mul2 = aop.H[0] s* bop.H[0]; res = sat.q31(dop + mul1 + mul2); } Rd = concat(rest, resb); x=0
- Parameters:
t – [in] unsigned long long type of value stored in t
a – [in] unsigned long long type of value stored in a
b – [in] unsigned long long type of value stored in b
- Returns:
value stored in unsigned long long type
- __STATIC_FORCEINLINE unsigned long long __RV_DKMAXDA (unsigned long long t, unsigned long long a, unsigned long long b)
DKMAXDA (Two Cross 16x16 with 32-bit Signed Double Add)
Type: DSP
Syntax:
DKMAXDA Rd, Rs1, Rs2
Purpose
:
Do two cross 16x16 with 32-bit signed double addition simultaneously. The results are written into Rd.
Description
:
It multiplies the top 16-bit content of 32-bit elements in Rs1 with the bottom 16-bit content of 32-bit elements in Rs2 and then adds the result to the result of multiplying the bottom 16-bit content of 32-bit elements in Rs1 with the top 16-bit content of 32-bit elements in elements in Rs2.
Operations:
op1t = Rs1.W[x+1]; op2t = Rs2.W[x+1]; op3t = Rd.W[x+1] // top op1b = Rs1.W[x]; op2b = Rs2.W[x]; op3b = Rd.W[x] // bottom for ((aop,bop,dop,res) in [(op1t,op2t,op3t,rest), (op1b,op2b,op3b,resb)]) { mul1 = aop.H[1] s* bop.H[0]; mul2 = aop.H[0] s* bop.H[1]; res = sat.q31(dop + mul1 + mul2); } Rd = concat(rest, resb); x=0
- Parameters:
t – [in] unsigned long long type of value stored in t
a – [in] unsigned long long type of value stored in a
b – [in] unsigned long long type of value stored in b
- Returns:
value stored in unsigned long long type
- __STATIC_FORCEINLINE unsigned long long __RV_DKMADS (unsigned long long t, unsigned long long a, unsigned long long b)
DKMADS (Two 16x16 with 32-bit Signed Add and Sub)
Type: DSP
Syntax:
DKMADS Rd, Rs1, Rs2
Purpose
:
Do two 16x16 with 32-bit signed addition and subtraction simultaneously. The results are written into Rd.
Description
:
It multiplies the bottom 16-bit content of 32-bit elements in Rs1 with the bottom 16-bit content of 32-bit elements in Rs2 and then subtracts the result from the result of multiplying the top 16-bit content of 32-bit elements in Rs1 with the top 16-bit content of 32-bit elements in Rs2.
Operations:
op1t = Rs1.W[x+1]; op2t = Rs2.W[x+1]; op3t = Rd.W[x+1] // top op1b = Rs1.W[x]; op2b = Rs2.W[x]; op3b = Rd.W[x] // bottom for ((aop,bop,dop,res) in [(op1t,op2t,op3t,rest), (op1b,op2b,op3b,resb)]) { mul1 = aop.H[1] s* bop.H[1]; mul2 = aop.H[0] s* bop.H[0]; res = sat.q31(dop + mul1 - mul2); } Rd = concat(rest, resb); x=0
- Parameters:
t – [in] unsigned long long type of value stored in t
a – [in] unsigned long long type of value stored in a
b – [in] unsigned long long type of value stored in b
- Returns:
value stored in unsigned long long type
- __STATIC_FORCEINLINE unsigned long long __RV_DKMADRS (unsigned long long t, unsigned long long a, unsigned long long b)
DKMADRS (Two 16x16 with 32-bit Signed Add and Reversed Sub)
Type: DSP
Syntax:
DKMADRS Rd, Rs1, Rs2
Purpose
:
Do two 16x16 with 32-bit signed addition and revered subtraction simultaneously. The results are written into Rd.
Description
:
it multiplies the top 16-bit content of 32-bit elements in Rs1 with the top 16-bit content of 32-bit elements in Rs2 and then subtracts the result from the result of multiplying the bottom 16-bit content of 32-bit elements in Rs1 with the bottom 16-bit content of 32- bit elements in Rs2
Operations:
op1t = Rs1.W[x+1]; op2t = Rs2.W[x+1]; op3t = Rd.W[x+1] // top op1b = Rs1.W[x]; op2b = Rs2.W[x]; op3b = Rd.W[x] // bottom for ((aop,bop,dop,res) in [(op1t,op2t,op3t,rest), (op1b,op2b,op3b,resb)]) { mul1 = aop.H[1] s* bop.H[1]; mul2 = aop.H[0] s* bop.H[0]; res = sat.q31(dop - mul1 + mul2); } Rd = concat(rest, resb); x=0
- Parameters:
t – [in] unsigned long long type of value stored in t
a – [in] unsigned long long type of value stored in a
b – [in] unsigned long long type of value stored in b
- Returns:
value stored in unsigned long long type
- __STATIC_FORCEINLINE unsigned long long __RV_DKMAXDS (unsigned long long t, unsigned long long a, unsigned long long b)
DKMAXDS (Saturating Signed Crossed Multiply Two Halfs & Subtract & Add)
Type: DSP
Syntax:
DKMAXDS Rd, Rs1, Rs2
Purpose
:
Do two cross 16x16 with 32-bit signed addition and subtraction simultaneously. The results are written into Rd.
Description
:
Do two signed 16-bit multiplications from 32-bit elements in two registers; and then perform a subtraction operation between the two 32-bit results. Then add the subtraction result to the corresponding 32-bit elements in a third register. The addition result may be saturated.
Operations:
op1t = Rs1.W[x+1]; op2t = Rs2.W[x+1]; op3t = Rd.W[x+1] // top op1b = Rs1.W[x]; op2b = Rs2.W[x]; op3b = Rd.W[x] // bottom for ((aop,bop,dop,res) in [(op1t,op2t,op3t,rest), (op1b,op2b,op3b,resb)]) { mul1 = aop.H[1] s* bop.H[0]; mul2 = aop.H[0] s* bop.H[1]; res = sat.q31(dop + mul1 - mul2); } Rd = concat(rest, resb); x=0
- Parameters:
t – [in] unsigned long long type of value stored in t
a – [in] unsigned long long type of value stored in a
b – [in] unsigned long long type of value stored in b
- Returns:
value stored in unsigned long long type
- __STATIC_FORCEINLINE unsigned long long __RV_DKMSDA (unsigned long long t, unsigned long long a, unsigned long long b)
DKMSDA (Two 16x16 with 32-bit Signed Double Sub)
Type: DSP
Syntax:
DKMSDA Rd, Rs1, Rs2
Purpose
:
Do two 16x16 with 32-bit signed double subtraction simultaneously. The results are written into Rd.
Description
:
it multiplies the bottom 16-bit content of the 32-bit elements of Rs1 with the bottom 16-bit content of the 32-bit elements of Rs2 and multiplies the top 16-bit content of the 32-bit elements of Rs1 with the top 16-bit content of the 32-bit elements of Rs2.
Operations:
op1t = Rs1.W[x+1]; op2t = Rs2.W[x+1]; op3t = Rd.W[x+1] // top op1b = Rs1.W[x]; op2b = Rs2.W[x]; op3b = Rd.W[x] // bottom for ((aop,bop,dop,res) in [(op1t,op2t,op3t,rest), (op1b,op2b,op3b,resb)]) { mul1 = aop.H[1] s* bop.H[0]; mul2 = aop.H[0] s* bop.H[1]; res = sat.q31(dop - mul1 - mul2); } Rd = concat(rest, resb); x=0
- Parameters:
t – [in] unsigned long long type of value stored in t
a – [in] unsigned long long type of value stored in a
b – [in] unsigned long long type of value stored in b
- Returns:
value stored in unsigned long long type
- __STATIC_FORCEINLINE unsigned long long __RV_DKMSXDA (unsigned long long t, unsigned long long a, unsigned long long b)
DKMSXDA (Two Cross 16x16 with 32-bit Signed Double Sub)
Type: DSP
Syntax:
DKMSXDA Rd, Rs1, Rs2
Purpose
:
Do two cross 16x16 with 32-bit signed double subtraction simultaneously. The results are written into Rd.
Description
:
It multiplies the bottom 16-bit content of the 32-bit elements of Rs1 with the top 16-bit content of the 32-bit elements of Rs2 and multiplies the top 16-bit content of the 32-bit elements of Rs1 with the bottom 16-bit content of the 32-bit elements of Rs2.
Operations:
op1t = Rs1.W[x+1]; op2t = Rs2.W[x+1]; op3t = Rd.W[x+1] // top op1b = Rs1.W[x]; op2b = Rs2.W[x]; op3b = Rd.W[x] // bottom for ((aop,bop,dop,res) in [(op1t,op2t,op3t,rest), (op1b,op2b,op3b,resb)]) { mul1 = aop.H[1] s* bop.H[0]; mul2 = aop.H[0] s* bop.H[1]; res = sat.q31(dop - mul1 - mul2); } Rd = concat(rest, resb); x=0
- Parameters:
t – [in] unsigned long long type of value stored in t
a – [in] unsigned long long type of value stored in a
b – [in] unsigned long long type of value stored in b
- Returns:
value stored in unsigned long long type
- __STATIC_FORCEINLINE unsigned long long __RV_DSMAQA (unsigned long long t, unsigned long long a, unsigned long long b)
DSMAQA (Four Signed 8x8 with 32-bit Signed Add)
Type: DSP
Syntax:
DSMAQA Rd, Rs1, Rs2
Purpose
:
Do four signed 8x8 with 32-bit signed addition simultaneously. The results are written into Rd.
Description
:
This instruction multiplies the four signed 8-bit elements of 32-bit chunks of Rs1 with the four signed 8-bit elements of 32-bit chunks of Rs2 and then adds the four results together with the signed content of the corresponding 32-bit chunks of Rd. The final results are written back to the corresponding 32-bit chunks in Rd.
Operations:
op1t = Rs1.W[x+1]; op2t = Rs2.W[x+1]; op3t = Rd.W[x+1] // top op1b = Rs1.W[x]; op2b = Rs2.W[x]; op3b = Rd.W[x] // bottom for ((aop,bop,dop,res) in [(op1t,op2t,op3t,rest), (op1b,op2b,op3b,resb)]) { m0 = aop.B[0] s* bop.B[0]; m1 = aop.B[1] s* bop.B[1]; m2 = aop.B[2] s* bop.B[2]; m3 = aop.B[3] s* bop.B[3]; res = dop + m0 + m1 + m2 + m3; } Rd = concat(rest, resb); x=0
- Parameters:
t – [in] unsigned long long type of value stored in t
a – [in] unsigned long long type of value stored in a
b – [in] unsigned long long type of value stored in b
- Returns:
value stored in unsigned long long type
- __STATIC_FORCEINLINE unsigned long long __RV_DSMAQA_SU (unsigned long long t, unsigned long long a, unsigned long long b)
DSMAQA.SU (Four Signed 8 x Unsigned 8 with 32-bit Signed Add)
Type: DSP
Syntax:
DSMAQA.SU Rd, Rs1, Rs2
Purpose
:
Do four Signed 8 x Unsigned 8 with 32-bit unsigned addition simultaneously. The results are written into Rd.
Description
:
This instruction multiplies the four unsigned 8-bit elements of 32-bit chunks of Rs1 with the four signed 8-bit elements of 32-bit chunks of Rs2 and then adds the four results together with the unsigned content of the corresponding 32-bit chunks of Rd. The final results are written back to the corresponding 32-bit chunks in Rd.
Operations:
op1t = Rs1.W[x+1]; op2t = Rs2.W[x+1]; op3t = Rd.W[x+1] // top op1b = Rs1.W[x]; op2b = Rs2.W[x]; op3b = Rd.W[x] // bottom for ((aop,bop,dop,res) in [(op1t,op2t,op3t,rest), (op1b,op2b,op3b,resb)]) { m0 = aop.B[0] su* bop.B[0]; m1 = aop.B[1] su* bop.B[1]; m2 = aop.B[2] su* bop.B[2]; m3 = aop.B[3] su* bop.B[3]; res = dop + m0 + m1 + m2 + m3; } Rd = concat(rest, resb); x=0
- Parameters:
t – [in] unsigned long long type of value stored in t
a – [in] unsigned long long type of value stored in a
b – [in] unsigned long long type of value stored in b
- Returns:
value stored in unsigned long long type
- __STATIC_FORCEINLINE unsigned long long __RV_DUMAQA (unsigned long long t, unsigned long long a, unsigned long long b)
DUMAQA (Four Unsigned 8x8 with 32-bit Unsigned Add)
Type: DSP
Syntax:
DUMAQA Rd, Rs1, Rs2
Purpose
:
Do four unsigned 8x8 with 32-bit unsigned addition simultaneously. The results are written into Rd.
Description
:
This instruction multiplies the four unsigned 8-bit elements of 32-bit chunks of Rs1 with the four unsigned 8-bit elements of 32-bit chunks of Rs2 and then adds the four results together with the unsigned content of the corresponding 32-bit chunks of Rd. The final results are written back to the corresponding 32-bit chunks in Rd.
Operations:
op1t = Rs1.W[x+1]; op2t = Rs2.W[x+1]; op3t = Rd.W[x+1] // top op1b = Rs1.W[x]; op2b = Rs2.W[x]; op3b = Rd.W[x] // bottom for ((aop,bop,dop,res) in [(op1t,op2t,op3t,rest), (op1b,op2b,op3b,resb)]) { m0 = aop.B[0] su* bop.B[0]; m1 = aop.B[1] su* bop.B[1]; m2 = aop.B[2] su* bop.B[2]; m3 = aop.B[3] su* bop.B[3]; res = dop + m0 + m1 + m2 + m3; } Rd = concat(rest, resb); x=0
- Parameters:
t – [in] unsigned long long type of value stored in t
a – [in] unsigned long long type of value stored in a
b – [in] unsigned long long type of value stored in b
- Returns:
value stored in unsigned long long type
- __STATIC_FORCEINLINE long long __RV_DKMDA32 (unsigned long long a, unsigned long long b)
DKMDA32 (Two Signed 32x32 with 64-bit Saturation Add)
Type: DSP
Syntax:
DKMDA32 Rd, Rs1, Rs2
Purpose
:
Do two signed 32x32 add the signed multiplication results with Q63 saturation. The results are written into Rd.
Description
:
For the
KMDA32instruction, it multiplies the bottom 32-bit element of Rs1 with the bottom 32-bit element of Rs2 and then adds the result to the result of multiplying the top 32-bit element of Rs1 with the top 32-bit element of Rs2.Operations:
op1t = Rs1.W[x+1]; op2t = Rs2.W[x+1]; // top op1b = Rs1.W[x]; op2b = Rs2.W[x]; // bottom t0 = op1b s* op2b; t1 = op1t s* op2t; Rd = concat(rest, resb); x=0
- Parameters:
a – [in] unsigned long long type of value stored in a
b – [in] unsigned long long type of value stored in b
- Returns:
value stored in long long type
- __STATIC_FORCEINLINE long long __RV_DKMXDA32 (unsigned long long a, unsigned long long b)
DKMXDA32 (Two Cross Signed 32x32 with 64-bit Saturation Add)
Type: DSP
Syntax:
DKMXDA32 Rd, Rs1, Rs2
Purpose
:
Do two cross signed 32x32 and add the signed multiplication results with Q63 saturation. The results are written into Rd.
Description
:
It multiplies the bottom 32-bit element of Rs1 with the top 32-bit element of Rs2 and then adds the result to the result of multiplying the top 32-bit element of Rs1 with the bottom 32-bit element of Rs2.
Operations:
op1t = Rs1.W[x+1]; op2t = Rs2.W[x+1]; // top op1b = Rs1.W[x]; op2b = Rs2.W[x]; // bottom t01 = op1b s* op2t; t10 = op1t s* op2b; Rd = sat.q63(t01 + t10); x=0
- Parameters:
a – [in] unsigned long long type of value stored in a
b – [in] unsigned long long type of value stored in b
- Returns:
value stored in long long type
- __STATIC_FORCEINLINE long long __RV_DKMADA32 (long long t, unsigned long long a, unsigned long long b)
DKMADA32 (Two Signed 32x32 with 64-bit Saturation Add)
Type: DSP
Syntax:
DKMADA32 Rd, Rs1, Rs2
Purpose
:
Do two signed 32x32 and add the signed multiplication results and a third register with Q63 saturation. The results are written into Rd.
Description
:
It multiplies the bottom 32-bit element of Rs1 with the bottom 32-bit element of Rs2 and then adds the result to the result of multiplying the top 32-bit element of Rs1 with the top 32-bit element of Rs2.
Operations:
op1t = Rs1.W[x+1]; op2t = Rs2.W[x+1]; // top op1b = Rs1.W[x]; op2b = Rs2.W[x]; // bottom t01 = op1b s* op2b; t10 = op1t s* op2t; Rd = sat.q63(t01 + t10); x=0
- Parameters:
t – [in] long long type of value stored in t
a – [in] unsigned long long type of value stored in a
b – [in] unsigned long long type of value stored in b
- Returns:
value stored in long long type
- __STATIC_FORCEINLINE long long __RV_DKMAXDA32 (long long t, unsigned long long a, unsigned long long b)
DKMAXDA32 (Two Cross Signed 32x32 with 64-bit Saturation Add)
Type: DSP
Syntax:
DKMAXDA32 Rd, Rs1, Rs2
Purpose
:
Do two cross signed 32x32 and add the signed multiplication results and a third register with Q63 saturation. The results are written into Rd.
Description
:
It multiplies the top 32-bit element in Rs1 with the bottom 32-bit element in Rs2 and then adds the result to the result of multiplying the bottom 32-bit element in Rs1 with the top 32-bit element in Rs2.
Operations:
op1t = Rs1.W[x+1]; op2t = Rs2.W[x+1]; // top op1b = Rs1.W[x]; op2b = Rs2.W[x]; // bottom t01 = op1b s* op2t; t10 = op1t s* op2b; Rd = sat.q63(Rd + t01 + t10); x=0
- Parameters:
t – [in] long long type of value stored in t
a – [in] unsigned long long type of value stored in a
b – [in] unsigned long long type of value stored in b
- Returns:
value stored in long long type
- __STATIC_FORCEINLINE long long __RV_DKMADS32 (long long t, unsigned long long a, unsigned long long b)
DKMADS32 (Two Signed 32x32 with 64-bit Saturation Add and Sub)
Type: DSP
Syntax:
DKMADS32 Rd, Rs1, Rs2
Purpose
:
Do two signed 32x32 and add the top signed multiplication results and subtraction bottom signed multiplication results and add a third register with Q63 saturation. The results are written into Rd.
Description
:
It multiplies the top 32-bit element in Rs1 with the bottom 32-bit element in Rs2 and then subtracts the result to the result of multiplying the top 32-bit element in Rs1 with the top 32-bit element in Rs2.
Operations:
op1t = Rs1.W[x+1]; op2t = Rs2.W[x+1]; // top op1b = Rs1.W[x]; op2b = Rs2.W[x]; // bottom t0 = op1b s* op2b; t1 = op1t s* op2t; Rd = sat.q63(Rd - t0 + t1); x=0
- Parameters:
t – [in] long long type of value stored in t
a – [in] unsigned long long type of value stored in a
b – [in] unsigned long long type of value stored in b
- Returns:
value stored in long long type
- __STATIC_FORCEINLINE long long __RV_DKMADRS32 (long long t, unsigned long long a, unsigned long long b)
DKMADRS32 (Two Signed 32x32 with 64-bit Saturation Revered Add and Sub)
Type: DSP
Syntax:
DKMADRS32 Rd, Rs1, Rs2
Purpose
:
Do two signed 32x32 and add the signed multiplication results and a third register with Q63 saturation. The results are written into Rd.Do two signed 32x32 and subtraction the top signed multiplication results and add bottom signed multiplication results and add a third register with Q63 saturation. The results are written into Rd.
Description
:
It multiplies the top 32-bit element in Rs1 with the top 32-bit element in Rs2 and then subtracts the result from the result of multiplying the bottom 32-bit element in Rs1 with the bottom 32-bit element in Rs2.
Operations:
op1t = Rs1.W[x+1]; op2t = Rs2.W[x+1]; // top op1b = Rs1.W[x]; op2b = Rs2.W[x]; // bottom t0 = op1b s* op2b; t1 = op1t s* op2t; Rd = sat.q63(Rd + t0 - t1); x=0
- Parameters:
t – [in] long long type of value stored in t
a – [in] unsigned long long type of value stored in a
b – [in] unsigned long long type of value stored in b
- Returns:
value stored in long long type
- __STATIC_FORCEINLINE long long __RV_DKMAXDS32 (long long t, unsigned long long a, unsigned long long b)
DKMAXDS32 (Two Cross Signed 32x32 with 64-bit Saturation Add and Sub)
Type: DSP
Syntax:
DKMAXDS32 Rd, Rs1, Rs2
Purpose
:
Do two signed 32x32 and add the top signed multiplication results and subtraction bottom signed multiplication results and add a third register with Q63 saturation. The results are written into Rd.
Description
:
It multiplies the bottom 32-bit element in Rs1 with the top 32-bit element in Rs2 and then subtracts the result from the result of multiplying the top 32-bit element in Rs1 with the bottom 32-bit element in Rs2.
Operations:
op1t = Rs1.W[x+1]; op2t = Rs2.W[x+1]; // top op1b = Rs1.W[x]; op2b = Rs2.W[x]; // bottom t01 = op1b s* op2t; t10 = op1t s* op2b; Rd = sat.q63(Rd - t01 + t10); x=0
- Parameters:
t – [in] long long type of value stored in t
a – [in] unsigned long long type of value stored in a
b – [in] unsigned long long type of value stored in b
- Returns:
value stored in long long type
- __STATIC_FORCEINLINE long long __RV_DKMSDA32 (long long t, unsigned long long a, unsigned long long b)
DKMSDA32 (Two Signed 32x32 with 64-bit Saturation Sub)
Type: DSP
Syntax:
DKMSDA32 Rd, Rs1, Rs2
Purpose
:
Do two signed 32x32 and subtraction the top signed multiplication results and subtraction bottom signed multiplication results and add a third register with Q63 saturation. The results are written into Rd.
Description
:
It multiplies the bottom 32-bit element of Rs1 with the bottom 32-bit element of Rs2 and multiplies the top 32-bit element of Rs1 with the top 32-bit element of Rs2.
Operations:
op1t = Rs1.W[x+1]; op2t = Rs2.W[x+1]; // top op1b = Rs1.W[x]; op2b = Rs2.W[x]; // bottom t0 = op1b s* op2b; t1 = op1t s* op2t; Rd = sat.q63(Rd - t0 - t1); x=0
- Parameters:
t – [in] long long type of value stored in t
a – [in] unsigned long long type of value stored in a
b – [in] unsigned long long type of value stored in b
- Returns:
value stored in long long type
- __STATIC_FORCEINLINE long long __RV_DKMSXDA32 (long long t, unsigned long long a, unsigned long long b)
DKMSXDA32 (Two Cross Signed 32x32 with 64-bit Saturation Sub)
Type: DSP
Syntax:
DKMSXDA32 Rd, Rs1, Rs2
Purpose
:
Do two cross signed 32x32 and subtraction the top signed multiplication results and subtraction bottom signed multiplication results and add a third register with Q63 saturation. The results are written into Rd.
Description
:
It multiplies the bottom 32-bit element of Rs1 with the top 32-bit element of Rs2 and multiplies the top 32-bit element of Rs1 with the bottom 32-bit element of Rs2.
Operations:
op1t = Rs1.W[x+1]; op2t = Rs2.W[x+1]; // top op1b = Rs1.W[x]; op2b = Rs2.W[x]; // bottom t0 = op1b s* op2t; t1 = op1t s* op2b; Rd = sat.q63(Rd - t0 - t1); x=0
- Parameters:
t – [in] long long type of value stored in t
a – [in] unsigned long long type of value stored in a
b – [in] unsigned long long type of value stored in b
- Returns:
value stored in long long type
- __STATIC_FORCEINLINE long long __RV_DSMDS32 (unsigned long long a, unsigned long long b)
DSMDS32 (Two Signed 32x32 with 64-bit Sub)
Type: DSP
Syntax:
DSMDS32 Rd, Rs1, Rs2
Purpose
:
Do two signed 32x32 and add the top signed multiplication results and subtraction bottom signed multiplication. The results are written into Rd.
Description
:
It multiplies the bottom 32-bit element of Rs1 with the bottom 32-bit element of Rs2 and then subtracts the result from the result of multiplying the top 32-bit element of Rs1 with the top 32-bit element of Rs2.
Operations:
op1t = Rs1.W[x+1]; op2t = Rs2.W[x+1]; // top op1b = Rs1.W[x]; op2b = Rs2.W[x]; // bottom t0 = op1b s* op2t; t1 = op1t s* op2b; Rd = t1 - t0; x=0
- Parameters:
a – [in] unsigned long long type of value stored in a
b – [in] unsigned long long type of value stored in b
- Returns:
value stored in long long type
- __STATIC_FORCEINLINE long long __RV_DSMDRS32 (unsigned long long a, unsigned long long b)
DSMDRS32 (Two Signed 32x32 with 64-bit Revered Sub)
Type: DSP
Syntax:
DSMDRS32 Rd, Rs1, Rs2
Purpose
:
Do two signed 32x32 and subtraction the top signed multiplication results and add bottom signed multiplication. The results are written into Rd
Description
:
It multiplies the top 32-bit element of Rs1 with the top 32-bit element of Rs2 and then subtracts the result from the result of multiplying the bottom 32-bit element of Rs1 with the bottom 32-bit element of Rs2.
Operations:
op1t = Rs1.W[x+1]; op2t = Rs2.W[x+1]; // top op1b = Rs1.W[x]; op2b = Rs2.W[x]; // bottom t0 = op1b s* op2b; t1 = op1t s* op2t; Rd = t1 - t0; x=0
- Parameters:
a – [in] unsigned long long type of value stored in a
b – [in] unsigned long long type of value stored in b
- Returns:
value stored in long long type
- __STATIC_FORCEINLINE long long __RV_DSMXDS32 (unsigned long long a, unsigned long long b)
DSMXDS32 (Two Cross Signed 32x32 with 64-bit Sub)
Type: DSP
Syntax:
DSMXDS32 Rd, Rs1, Rs2
Purpose
:
Do two cross signed 32x32 and add the top signed multiplication results and subtraction bottom signed multiplication. The results are written into Rd.
Description
:
It multiplies the bottom 32-bit element of Rs1 with the top 32-bit element of Rs2 and then subtracts the result from the result of multiplying the top 32-bit element of Rs1 with the bottom 32-bit element of Rs2.
Operations:
op1t = Rs1.W[x+1]; op2t = Rs2.W[x+1]; // top op1b = Rs1.W[x]; op2b = Rs2.W[x]; // bottom t01 = op1b s* op2t; t10 = op1t s* op2b; Rd = t1 - t0; x=0
- Parameters:
a – [in] unsigned long long type of value stored in a
b – [in] unsigned long long type of value stored in b
- Returns:
value stored in long long type
- __STATIC_FORCEINLINE long long __RV_DSMALDA (long long t, unsigned long long a, unsigned long long b)
DSMALDA (Four Signed 16x16 with 64-bit Add)
Type: DSP
Syntax:
DSMALDA Rd, Rs1, Rs2
Purpose
:
Do four signed 16x16 and add signed multiplication results and a third register. The results are written into Rd.
Description
:
It multiplies the bottom 16-bit content of Rs1 with the bottom 16-bit content of Rs2 and then adds the result to the result of multiplying the top 16-bit content of Rs1 with the top 16-bit content of Rs2 with unlimited precision
Operations:
op1t = Rs1.W[x+1]; op2t = Rs2.W[x+1]; // top op1b = Rs1.W[x]; op2b = Rs2.W[x]; // bottom m0 = op1b.H[0] s* op2b.H[0]; m1 = op1b.H[1] s* op2b.H[1]; m2 = op1t.H[0] s* op2t.H[0]; m3 = op1t.H[1] s* op2t.H[1]; Rd = Rd + m0 + m1 + m2 + m3; x=0
- Parameters:
t – [in] long long type of value stored in t
a – [in] unsigned long long type of value stored in a
b – [in] unsigned long long type of value stored in b
- Returns:
value stored in long long type
- __STATIC_FORCEINLINE long long __RV_DSMALXDA (long long t, unsigned long long a, unsigned long long b)
DSMALXDA (Four Signed 16x16 with 64-bit Add)
Type: DSP
Syntax:
DSMALXDA Rd, Rs1, Rs2
Purpose
:
Do four cross signed 16x16 and add signed multiplication results and a third register. The results are written into Rd.
Description
:
It multiplies the top 16-bit content of Rs1 with the bottom 16-bit content of Rs2 and then adds the result to the result of multiplying the bottom 16-bit content of Rs1 with the top 16-bit content of Rs2 with unlimited precision.
Operations:
op1t = Rs1.W[x+1]; op2t = Rs2.W[x+1]; // top op1b = Rs1.W[x]; op2b = Rs2.W[x]; // bottom m0 = op1b.H[0] s* op2b.H[1]; m1 = op1b.H[1] s* op2b.H[0]; m2 = op1t.H[0] s* op2t.H[1]; m3 = op1t.H[1] s* op2t.H[0]; Rd = Rd + m0 + m1 + m2 + m3; x=0
- Parameters:
t – [in] long long type of value stored in t
a – [in] unsigned long long type of value stored in a
b – [in] unsigned long long type of value stored in b
- Returns:
value stored in long long type
- __STATIC_FORCEINLINE long long __RV_DSMALDS (long long t, unsigned long long a, unsigned long long b)
DSMALDS (Four Signed 16x16 with 64-bit Add and Sub)
Type: DSP
Syntax:
DSMALDS Rd, Rs1, Rs2
Purpose
:
Do four signed 16x16 and add and subtraction signed multiplication results and a third register. The results are written into Rd.
Description
:
It multiplies the bottom 16-bit content of Rs1 with the bottom 16-bit content of Rs2 and then subtracts the result from the result of multiplying the top 16-bit content of Rs1 with the top 16-bit content of Rs2.
Operations:
op1t = Rs1.W[x+1]; op2t = Rs2.W[x+1]; // top op1b = Rs1.W[x]; op2b = Rs2.W[x]; // bottom m0 = op1b.H[1] s* op2b.H[1]; m1 = op1b.H[0] s* op2b.H[0]; m2 = op1t.H[1] s* op2t.H[1]; m3 = op1t.H[0] s* op2t.H[0]; Rd = Rd + m0 - m1 + m2 - m3; x=0
- Parameters:
t – [in] long long type of value stored in t
a – [in] unsigned long long type of value stored in a
b – [in] unsigned long long type of value stored in b
- Returns:
value stored in long long type
- __STATIC_FORCEINLINE long long __RV_DSMALDRS (long long t, unsigned long long a, unsigned long long b)
DSMALDRS (Four Signed 16x16 with 64-bit Add and Revered Sub)
Type: DSP
Syntax:
DSMALDRS Rd, Rs1, Rs2
Purpose
:
Do two signed 16x16 and add and revered subtraction signed multiplication results and a third register. The results are written into Rd.
Description
:
It multiplies the top 16-bit content of Rs1 with the top 16-bit content of Rs2 and then subtracts the result from the result of multiplying the bottom 16-bit content of Rs1 with the bottom 16-bit content of Rs2.
Operations:
op1t = Rs1.W[x+1]; op2t = Rs2.W[x+1]; // top op1b = Rs1.W[x]; op2b = Rs2.W[x]; // bottom m0 = op1b.H[0] s* op2b.H[0]; m1 = op1b.H[1] s* op2b.H[1]; m2 = op1t.H[0] s* op2t.H[0]; m3 = op1t.H[1] s* op2t.H[1]; Rd = Rd + m0 - m1 + m2 - m3; x=0
- Parameters:
t – [in] long long type of value stored in t
a – [in] unsigned long long type of value stored in a
b – [in] unsigned long long type of value stored in b
- Returns:
value stored in long long type
- __STATIC_FORCEINLINE long long __RV_DSMALXDS (long long t, unsigned long long a, unsigned long long b)
DSMALXDS (Four Cross Signed 16x16 with 64-bit Add and Sub)
Type: DSP
Syntax:
DSMALXDS Rd, Rs1, Rs2
Purpose
:
Do four cross signed 16x16 and add and subtraction signed multiplication results and a third register. The results are written into Rd.
Description
:
It multiplies the bottom 16-bit content of Rs1 with the top 16-bit content of Rs2 and then subtracts the result from the result of multiplying the top 16-bit content of Rs1 with the bottom 16-bit content of Rs2.
Operations:
op1t = Rs1.W[x+1]; op2t = Rs2.W[x+1]; // top op1b = Rs1.W[x]; op2b = Rs2.W[x]; // bottom m0 = op1b.H[1] s* op2b.H[0]; m1 = op1b.H[0] s* op2b.H[1]; m2 = op1t.H[1] s* op2t.H[0]; m3 = op1t.H[0] s* op2t.H[1]; Rd = Rd + m0 - m1 + m2 - m3; x=0
- Parameters:
t – [in] long long type of value stored in t
a – [in] unsigned long long type of value stored in a
b – [in] unsigned long long type of value stored in b
- Returns:
value stored in long long type
- __STATIC_FORCEINLINE long long __RV_DSMSLDA (long long t, unsigned long long a, unsigned long long b)
DSMSLDA (Four Signed 16x16 with 64-bit Sub)
Type: DSP
Syntax:
DSMSLDA Rd, Rs1, Rs2
Purpose
:
Do four signed 16x16 and subtraction signed multiplication results and add a third register. The results are written into Rd.
Description
:
It multiplies the bottom 16-bit content of Rs1 with the bottom 16-bit content Rs2 and multiplies the top 16-bit content of Rs1 with the top 16-bit content of Rs2.
Operations:
op1t = Rs1.W[x+1]; op2t = Rs2.W[x+1]; // top op1b = Rs1.W[x]; op2b = Rs2.W[x]; // bottom m0 = op1b.H[0] s* op2b.H[0]; m1 = op1b.H[1] s* op2b.H[1]; m2 = op1t.H[0] s* op2t.H[0]; m3 = op1t.H[1] s* op2t.H[1]; Rd = Rd - m0 - m1 - m2 - m3; x=0
- Parameters:
t – [in] long long type of value stored in t
a – [in] unsigned long long type of value stored in a
b – [in] unsigned long long type of value stored in b
- Returns:
value stored in long long type
- __STATIC_FORCEINLINE long long __RV_DSMSLXDA (long long t, unsigned long long a, unsigned long long b)
DSMSLXDA (Four Cross Signed 16x16 with 64-bit Sub)
Type: DSP
Syntax:
DSMSLXDA Rd, Rs1, Rs2
Purpose
:
Do four signed 16x16 and subtraction signed multiplication results and add a third register. The results are written into Rd.
Description
:
It multiplies the top 16-bit content of Rs1 with the bottom 16-bit content of Rs2 and multiplies the bottom 16-bit content of Rs1 with the top 16-bit content of Rs2.
Operations:
op1t = Rs1.W[x+1]; op2t = Rs2.W[x+1]; // top op1b = Rs1.W[x]; op2b = Rs2.W[x]; // bottom m0 = op1b.H[0] s* op2b.H[1]; m1 = op1b.H[1] s* op2b.H[0]; m2 = op1t.H[0] s* op2t.H[1]; m3 = op1t.H[1] s* op2t.H[0]; Rd = Rd - m0 - m1 - m2 - m3; x=0
- Parameters:
t – [in] long long type of value stored in t
a – [in] unsigned long long type of value stored in a
b – [in] unsigned long long type of value stored in b
- Returns:
value stored in long long type
- __STATIC_FORCEINLINE long long __RV_DDSMAQA (long long t, unsigned long long a, unsigned long long b)
DDSMAQA (Eight Signed 8x8 with 64-bit Add)
Type: DSP
Syntax:
DDSMAQA Rd, Rs1, Rs2
Purpose
:
Do eight signed 8x8 and add signed multiplication results and a third register. The results are written into Rd.
Description
:
Do eight signed 8-bit multiplications from eight 8-bit chunks of two registers; and then adds the eight 16-bit results and the content of 64-bit chunks of a third register.
Operations:
op1t = Rs1.W[x+1]; op2t = Rs2.W[x+1]; // top op1b = Rs1.W[x]; op2b = Rs2.W[x]; // bottom m0 = op1b.B[0] s* op2b.B[0]; m1 = op1b.B[1] s* op2b.B[1]; m2 = op1b.B[2] s* op2b.B[2]; m3 = op1b.B[3] s* op2b.B[3]; m4 = op1t.B[0] s* op2t.B[0]; m5 = op1t.B[1] s* op2t.B[1]; m6 = op1t.B[2] s* op2t.B[2]; m7 = op1t.B[3] s* op2t.B[3]; s0 = m0 + m1 + m2 + m3; s1 = m4 + m5 + m6 + m7; Rd = Rd + s0 + s1; x=0
- Parameters:
t – [in] long long type of value stored in t
a – [in] unsigned long long type of value stored in a
b – [in] unsigned long long type of value stored in b
- Returns:
value stored in long long type
- __STATIC_FORCEINLINE long long __RV_DDSMAQA_SU (long long t, unsigned long long a, unsigned long long b)
DDSMAQA.SU (Eight Signed 8 x Unsigned 8 with 64-bit Add)
Type: DSP
Syntax:
DDSMAQA.SU Rd, Rs1, Rs2
Purpose
:
Do eight signed 8 x unsigned 8 and add signed multiplication results and a third register. The results are written into Rd.
Description
:
Do eight signed 8 x unsigned 8 and add signed multiplication results and a third register; and then adds the eight 16-bit results and the content of 64-bit chunks of a third register.
Operations:
op1t = Rs1.W[x+1]; op2t = Rs2.W[x+1]; // top op1b = Rs1.W[x]; op2b = Rs2.W[x]; // bottom m0 = op1b.B[0] su* op2b.B[0]; m1 = op1b.B[1] su* op2b.B[1]; m2 = op1b.B[2] su* op2b.B[2]; m3 = op1b.B[3] su* op2b.B[3]; m4 = op1t.B[0] su* op2t.B[0]; m5 = op1t.B[1] su* op2t.B[1]; m6 = op1t.B[2] su* op2t.B[2]; m7 = op1t.B[3] su* op2t.B[3]; s0 = m0 + m1 + m2 + m3; s1 = m4 + m5 + m6 + m7; Rd = Rd + s0 + s1; x=0
- Parameters:
t – [in] long long type of value stored in t
a – [in] unsigned long long type of value stored in a
b – [in] unsigned long long type of value stored in b
- Returns:
value stored in long long type
- __STATIC_FORCEINLINE long long __RV_DDUMAQA (long long t, unsigned long long a, unsigned long long b)
DDUMAQA (Eight Unsigned 8x8 with 64-bit Unsigned Add)
Type: DSP
Syntax:
DDUMAQA Rd, Rs1, Rs2
Purpose
:
Do eight unsigned 8x8 and add unsigned multiplication results and a third register. The results are written into Rd.
Description
:
Do eight unsigned 8x8 and add unsigned multiplication results and a third register; and then adds the eight 16-bit results and the content of 64-bit chunks of a third register.
Operations:
op1t = Rs1.W[x+1]; op2t = Rs2.W[x+1]; // top op1b = Rs1.W[x]; op2b = Rs2.W[x]; // bottom m0 = op1b.B[0] u* op2b.B[0]; m1 = op1b.B[1] u* op2b.B[1]; m2 = op1b.B[2] u* op2b.B[2]; m3 = op1b.B[3] u* op2b.B[3]; m4 = op1t.B[0] u* op2t.B[0]; m5 = op1t.B[1] u* op2t.B[1]; m6 = op1t.B[2] u* op2t.B[2]; m7 = op1t.B[3] u* op2t.B[3]; s0 = m0 + m1 + m2 + m3; s1 = m4 + m5 + m6 + m7; Rd = Rd + s0 + s1; x=0
- Parameters:
t – [in] long long type of value stored in t
a – [in] unsigned long long type of value stored in a
b – [in] unsigned long long type of value stored in b
- Returns:
value stored in long long type
- __STATIC_FORCEINLINE long __RV_DSMA32_U (unsigned long long a, unsigned long long b)
DSMA32.u (64-bit SIMD 32-bit Signed Multiply Addition With Rounding and Clip)
Type: DSP
Syntax:
DSMA32.u Rd, Rs1, Rs2
Purpose
:
Do two signed 32x32 and add signed multiplication results with Rounding, then right shift 32-bit and clip q63 to q31. The result is written to Rd.
Description
:
For the
DSMA32.uinstruction, multiply the top 32-bit Q31 content of 64-bit chunks in Rs1 with the top 32-bit Q31 content of 64-bit chunks in Rs2. At the same time, multiply the bottom 32-bit Q31 content of 64-bit chunks in Rs1 with the bottom 32-bit Q31 content of 64-bit chunks in Rs2. Then, do the addtion for the results above and perform the addtional rounding operations, and then move the data to the right by 32-bit, and clip the 64-bit data into 32-bit.The result is written to Rd.Operations:
Rd = (q31_t)((Rs1.W[x] s* Rs2.W[x] + Rs1.W[x + 1] s* Rs2.W[x + 1] + 0x80000000LL) s>> 32); x=0
- Parameters:
a – [in] unsigned long long type of value stored in a
b – [in] unsigned long long type of value stored in b
- Returns:
value stored in long type
- __STATIC_FORCEINLINE long __RV_DSMXS32_U (unsigned long long a, unsigned long long b)
DSMXS32.u (64-bit SIMD 32-bit Signed Multiply Cross Subtraction With Rounding and Clip)
Type: DSP
Syntax:
DSMXS32.u Rd, Rs1, Rs2
Purpose
:
Do two cross signed 32x32 and sub signed multiplication results with Rounding, then right shift 32-bit and clip q63 to q31. The result is written to Rd.
Description
:
For the
DSMXS32.uinstruction, multiply the top 32-bit Q31 content of 64-bit chunks in Rs1 with the bottom 32-bit Q31 content of 64-bit chunks in Rs2. At the same time, multiply the bottom 32-bit Q31 content of 64-bit chunks in Rs1 with the top 32-bit Q31 content of 64-bit chunks in Rs2. Then, do the subtraction for the results above and perform the addtional rounding operations, and then move the data to the right by 32-bit, and clip the 64-bit data into 32-bit.The result is written to Rd.Operations:
Rd = (q31_t)((Rs1.W[x + 1] s* Rs2.W[x] - Rs1.W[x] s* Rs2.W[x + 1] + 0x80000000LL) s>> 32); x=0
- Parameters:
a – [in] unsigned long long type of value stored in a
b – [in] unsigned long long type of value stored in b
- Returns:
value stored in long type
- __STATIC_FORCEINLINE long __RV_DSMXA32_U (unsigned long long a, unsigned long long b)
DSMXA32.u (64-bit SIMD 32-bit Signed Cross Multiply Addition with Rounding and Clip)
Type: DSP
Syntax:
DSMXA32.u Rd, Rs1, Rs2
Purpose
:
Do two cross signed 32x32 and add signed multiplication results with Rounding, then right shift 32-bit and clip q63 to q31. The result is written to Rd.
Description
:
For the
DSMXA32.uinstruction,multiply the top 32-bit Q31 content of 64-bit chunks in Rs1 with the bottom 32-bit Q31 content of 64-bit chunks in Rs2. At the same time, multiply the bottom 32-bit Q31 content of 64-bit chunks in Rs1 with the top 32-bit Q31 content of 64-bit chunks in Rs2. Then, do the addtion for the results above and perform the addtional rounding operations, and then move the data to the right by 32-bit, and clip the 64-bit data into 32-bit.The result is written to Rd.Operations:
Rd = (q31_t)((Rs1.W[x + 1] s* Rs2.W[x] + Rs1.W[x] s* Rs2.W[x + 1] + 0x80000000LL) s>> 32); x=0
- Parameters:
a – [in] unsigned long long type of value stored in a
b – [in] unsigned long long type of value stored in b
- Returns:
value stored in long type
- __STATIC_FORCEINLINE long __RV_DSMS32_U (unsigned long long a, unsigned long long b)
DSMS32.u (64-bit SIMD 32-bit Signed Multiply Subtraction with Rounding and Clip)
Type: DSP
Syntax:
DSMS32.u Rd, Rs1, Rs2
Purpose
:
Do two signed 32x32 and sub signed multiplication results with Rounding, then right shift 32-bit and clip q63 to q31. The result is written to Rd.
Description
:
For the
DSMS32.uinstruction, multiply the bottom 32-bit Q31 content of 64-bit chunks in Rs1 with the bottom 32-bit Q31 content of 64-bit chunks in Rs2. At the same time, multiply the top 32-bit Q31 content of 64-bit chunks in Rs1 with the top 32-bit Q31 content of 64-bit chunks in Rs2. Then, do the subtraction for the results above and perform the addtional rounding operations, and then move the data to the right by 32-bit, and clip the 64-bit data into 32-bit.The result is written to Rd.Operations:
Rd = (q31_t)((Rs1.W[x] s* Rs2.W[x] - Rs1.W[x + 1] s* Rs2.W[x + 1] + 0x80000000LL) s>> 32); x=0
- Parameters:
a – [in] unsigned long long type of value stored in a
b – [in] unsigned long long type of value stored in b
- Returns:
value stored in long type
- __STATIC_FORCEINLINE long __RV_DSMADA16 (long long t, unsigned long long a, unsigned long long b)
DSMADA16 (Signed Multiply Two Halfs and Two Adds 32-bit)
Type: SIMD
Syntax:
DSMADA16 Rd, Rs1, Rs2
Purpose
:
Do two signed 16-bit multiplications of two 32-bit registers; and then adds the 32-bit results and the 32-bit value of an even/odd pair of registers together.
DSMADA16: rt pair+ top*top + bottom*bottom
Description
:
This instruction multiplies the per 16-bit content of the 32-bit elements of Rs1 with the corresponding 16-bit content of the 32-bit elements of Rs2. The result is added to the 32-bit value of an even/odd pair of registers specified by Rd(4,1). The 32-bit addition result is written back to the register-pair. The 16-bit values of Rs1 and Rs2, and the 32-bit value of the register-pair are treated as signed integers.
Operations:
Mres0[0][31:0] = (Rs1.W[0].H[0] * Rs2.W[0].H[0]); Mres1[0][31:0] = (Rs1.W[0].H[1] * Rs2.W[0].H[1]); Mres0[1][31:0] = (Rs1.W[1].H[0] * Rs2.W[1].H[0]); Mres1[1][31:0] = (Rs1.W[1].H[1] * Rs2.W[1].H[1]); Rd.W = Rd.W + SE32(Mres0[0][31:0]) + SE32(Mres1[0][31:0]) + SE32(Mres0[1][31:0]) + SE32(Mres1[1][31:0]);
- Parameters:
t – [in] long long type of value stored in t
a – [in] unsigned long long type of value stored in a
b – [in] unsigned long long type of value stored in b
- Returns:
value stored in long type
- __STATIC_FORCEINLINE long __RV_DSMAXDA16 (long long t, unsigned long long a, unsigned long long b)
DSMAXDA16 (Signed Crossed Multiply Two Halfs and Two Adds 32-bit)
Type: SIMD
Syntax:
DSMAXDA16 Rd, Rs1, Rs2
Purpose
:
Do two signed 16-bit multiplications of two 32-bit registers; and then adds the 32-bit results and the 32-bit value of an even/odd pair of registers together.
DSMAXDA: rt pair+ top*bottom + bottom*top (all 32-bit elements)
Description
:
This instruction crossly multiplies the top 16-bit content of the 32-bit elements of Rs1 with the bottom 16-bit content of the 32-bit elements of Rs2 and then adds the result to the result of multiplying the bottom 16-bit content of the 32-bit elements of Rs1 with the top 16-bit content of the 32-bit elements of Rs2 with unlimited precision. The result is added to the 64-bit value of an even/odd pair of registers specified by Rd(4,1).The 64-bit addition result is clipped to 32-bit result.
Operations:
Mres0[0][31:0] = (Rs1.W[0].H[0] * Rs2.W[0].H[1]); Mres1[0][31:0] = (Rs1.W[0].H[1] * Rs2.W[0].H[0]); Mres0[1][31:0] = (Rs1.W[1].H[0] * Rs2.W[1].H[1]); Mres1[1][31:0] = (Rs1.W[1].H[1] * Rs2.W[1].H[0]); Rd.W = Rd.W + SE32(Mres0[0][31:0]) + SE32(Mres1[0][31:0]) + SE32(Mres0[1][31:0]) + SE32(Mres1[1][31:0]);
- Parameters:
t – [in] long long type of value stored in t
a – [in] unsigned long long type of value stored in a
b – [in] unsigned long long type of value stored in b
- Returns:
value stored in long type
- __STATIC_FORCEINLINE unsigned long long __RV_DKSMS32_U (unsigned long long t, unsigned long long a, unsigned long long b)
DKSMS32.u (Two Signed Multiply Shift-clip and Saturation with Rounding)
Type: SIMD
Syntax:
DKSMS32.u Rd, Rs1, Rs2
Purpose
:
Computes saturated multiplication of two pairs of q31 type with shifted rounding.
Description
:
Compute the multiplication of Rs1 and Rs2 of type q31_t, intercept [47:16] for the resulting 64-bit product to get the 32-bit number, then add 1 to it to do rounding, and finally saturate the result after rounding.
Operations:
Mres[x][63:0] = Rs1.W[x] s* Rs2.W[x]; Round[x][32:0] = Mres[x][47:15] + 1; Rd.W[x] = sat.31(Rd.W[x] + Round[x][32:1]); x=1...0
- Parameters:
t – [in] unsigned long long type of value stored in t
a – [in] unsigned long long type of value stored in a
b – [in] unsigned long long type of value stored in b
- Returns:
value stored in unsigned long long type
- __STATIC_FORCEINLINE long __RV_DMADA32 (long long t, unsigned long long a, unsigned long long b)
DMADA32 ((Two Cross Signed 32x32 with 64-bit Add and Clip to 32-bit)
Type: SIMD
Syntax:
DMADA32 Rd, Rs1, Rs2
Purpose
:
Do two cross signed 32x32 and add the signed multiplication results to q63, then clip the q63 result to q31 , the final results are written into Rd.
Description
:
For the
DMADA32instruction, it multiplies the top 32-bit element in Rs1 with the bottom 32-bit element in Rs2 and then adds the result to the result of multiplying the bottom 32-bit element in Rs1 with the top 32-bit element in Rs2, then clip the q63 result to q31.Operations:
res = (q31_t)((((q63_t) Rd.w[0] << 32) + (q63_t)Rs1.w[0] s* Rs2.w[1] + (q63_t)Rs1.w[1] s* Rs2.w[0]) s>> 32); rd = res;
- Parameters:
t – [in] long long type of value stored in t
a – [in] unsigned long long type of value stored in a
b – [in] unsigned long long type of value stored in b
- Returns:
value stored in long type
- __STATIC_FORCEINLINE long long __RV_DSMALBB (long long t, unsigned long long a, unsigned long long b)
DSMALBB (Signed Multiply Bottom Halfs & Add 64-bit)
Type: SIMD
Syntax:
DSMALBB Rd, Rs1, Rs2
Purpose
:
Multiply the signed 16-bit content of the 32-bit elements of a register with the 16-bit content of the corresponding 32-bit elements of another register and add the results with a 64-bit value of an even/odd pair of registers. The addition result is written back to the register-pair.
DSMALBB: rt pair + bottom*bottom (all 32-bit elements)
Description
:
For the
DSMALBBinstruction, it multiplies the bottom 16-bit content of Rs1 with the bottom 16-bit content of Rs2.The multiplication results are added with the 64-bit value of Rd. The 64-bit addition result is written back to Rd.Operations:
Mres[0][31:0] = Rs1.W[0].H[0] * Rs2.W[0].H[0]; Mres[1][31:0] = Rs1.W[1].H[0] * Rs2.W[1].H[0]; Rd = Rd + SE64(Mres[0][31:0]) + SE64(Mres[1][31:0]);
- Parameters:
t – [in] long long type of value stored in t
a – [in] unsigned long long type of value stored in a
b – [in] unsigned long long type of value stored in b
- Returns:
value stored in long long type
- __STATIC_FORCEINLINE long long __RV_DSMALBT (long long t, unsigned long long a, unsigned long long b)
DSMALBT (Signed Multiply Bottom Half & Top Half & Add 64-bit)
Type: SIMD
Syntax:
DSMALBT Rd, Rs1, Rs2
Purpose
:
Multiply the signed 16-bit content of the 32-bit elements of a register with the 16-bit content of the corresponding 32-bit elements of another register and add the results with a 64-bit value of an even/odd pair of registers. The addition result is written back to the register-pair.
DSMALBT: rt pair + bottom*top (all 32-bit elements)
Description
:
For the
DSMALBTinstruction, it multiplies the bottom 16-bit content of the 32-bit elements of Rs1 with the top 16-bit content of the 32-bit elements of Rs2. The multiplication results are added with the 64-bit value of Rd. The 64-bit addition result is written back to Rd. The 16-bit values of Rs1 and Rs2, and the 64-bit value of Rd are treated as signed integersOperations:
Mres[0][31:0] = Rs1.W[0].H[0] * Rs2.W[0].H[1]; Mres[1][31:0] = Rs1.W[1].H[0] * Rs2.W[1].H[1]; Rd = Rd + SE64(Mres[0][31:0]) + SE64(Mres[1][31:0]);
- Parameters:
t – [in] long long type of value stored in t
a – [in] unsigned long long type of value stored in a
b – [in] unsigned long long type of value stored in b
- Returns:
value stored in long long type
- __STATIC_FORCEINLINE long long __RV_DSMALTT (long long t, unsigned long long a, unsigned long long b)
DSMALTT (Signed Multiply Top Half & Add 64-bit)
Type: SIMD
Syntax:
DSMALTT Rd, Rs1, Rs2
Purpose
:
Multiply the signed 16-bit content of the 32-bit elements of a register with the 16-bit content of the corresponding 32-bit elements of another register and add the results with a 64-bit value of an even/odd pair of registers. The addition result is written back to the register-pair.
DSMALTT: DSMALTT rt pair + top*top (all 32-bit elements)
Description
:
For the
DSMALTTinstruction, it multiplies the top 16-bit content of the 32-bit elements of Rs1 with the top 16-bit content of the 32-bit elements of Rs2. The multiplication results are added with the 64-bit value of Rd. The 64-bit addition result is written back to Rd. The 16-bit values of Rs1 and Rs2, and the 64-bit value of Rd are treated as signed integers.Operations:
Mres[0][31:0] = Rs1.W[0].H[1] * Rs2.W[0].H[1]; Mres[1][31:0] = Rs1.W[1].H[1] * Rs2.W[1].H[1]; Rd = Rd + SE64(Mres[0][31:0]) + SE64(Mres[1][31:0]);
- Parameters:
t – [in] long long type of value stored in t
a – [in] unsigned long long type of value stored in a
b – [in] unsigned long long type of value stored in b
- Returns:
value stored in long long type
- __STATIC_FORCEINLINE long long __RV_DKMABB32 (long long t, unsigned long long a, unsigned long long b)
DKMABB32 (Saturating Signed Multiply Bottom Words & Add)
Type: SIMD
Syntax:
DKMABB32 Rd, Rs1, Rs2
Purpose
:
Multiply the signed 32-bit element in a register with the 32-bit element in another register and add the result to the content of 64-bit data in the third register. The addition result may besaturated and is written to the third register.
DKMABB32: rd + bottom*bottom
Description
:
For the
DKMABB32instruction, it multiplies the bottom 32-bit element in Rs1 with the bottom 32-bit element in Rs2 The multiplication result is added to the content of 64-bit data in Rd. If the addition result is beyond the Q63 number range (-2^63 <= Q63 <= 2^63-1), it is saturated to the range and the OV bit is set to 1. The result after saturation is written to Rd. The 32-bit contents of Rs1 and Rs2 are treated as signed integers.Operations:
res = Rd + (Rs1.W[0] * Rs2.W[0]); if (res > (2^63)-1) { res = (2^63)-1; OV = 1; } else if (res < -2^63) { res = -2^63; OV = 1; } Rd = res;
- Parameters:
t – [in] long long type of value stored in t
a – [in] unsigned long long type of value stored in a
b – [in] unsigned long long type of value stored in b
- Returns:
value stored in long long type
- __STATIC_FORCEINLINE long long __RV_DKMABT32 (long long t, unsigned long long a, unsigned long long b)
DKMABT32 (Saturating Signed Multiply Bottom & Top Words & Add)
Type: SIMD
Syntax:
DKMABT32 Rd, Rs1, Rs2
Purpose
:
Multiply the signed 32-bit element in a register with the 32-bit element in another register and add the result to the content of 64-bit data in the third register. The addition result may be saturated and is written to the third register.
DKMABT32: rd + bottom*top
Description
:
For the
DKMABT32instruction, it multiplies the bottom 32-bit element in Rs1 with the top 32-bit element in Rs2 The multiplication result is added to the content of 64-bit data in Rd. If the addition result is beyond the Q63 number range (-2^63 <= Q63 <= 2^63-1), it is saturated to the range and the OV bit is set to 1. The result after saturation is written to Rd. The 32-bit contents of Rs1 and Rs2 are treated as signed integers.Operations:
res = Rd + (Rs1.W[0] * Rs2.W[1]); if (res > (2^63)-1) { res = (2^63)-1; OV = 1; } else if (res < -2^63) { res = -2^63; OV = 1; } Rd = res;
- Parameters:
t – [in] long long type of value stored in t
a – [in] unsigned long long type of value stored in a
b – [in] unsigned long long type of value stored in b
- Returns:
value stored in long long type
- __STATIC_FORCEINLINE long long __RV_DKMATT32 (long long t, unsigned long long a, unsigned long long b)
DKMATT32 (Saturating Signed Multiply Bottom & Top Words & Add)
Type: SIMD
Syntax:
DKMATT32 Rd, Rs1, Rs2
Purpose
:
Multiply the signed 32-bit element in a register with the 32-bit element in another register and add the result to the content of 64-bit data in the third register. The addition result may be saturated and is written to the third register.
DKMATT32: rd + top*top
Description
:
For the
DKMATT32instruction, it multiplies the top 32-bit element in Rs1 with the top 32-bit element in Rs2 The multiplication result is added to the content of 64-bit data in Rd. If the addition result is beyond the Q63 number range (-2^63 <= Q63 <= 2^63-1), it is saturated to the range and the OV bit is set to 1. The result after saturation is written to Rd. The 32-bit contents of Rs1 and Rs2 are treated as signed integers.Operations:
res = Rd + (Rs1.W[1] * Rs2.W[1]); if (res > (2^63)-1) { res = (2^63)-1; OV = 1; } else if (res < -2^63) { res = -2^63; OV = 1; } Rd = res;
- Parameters:
t – [in] long long type of value stored in t
a – [in] unsigned long long type of value stored in a
b – [in] unsigned long long type of value stored in b
- Returns:
value stored in unsigned long long type