Nuclei N3 SIMD DSP Additional Instructions

__STATIC_FORCEINLINE unsigned long long __RV_DKMMAC (unsigned long long t, unsigned long long a, unsigned long long b)
__STATIC_FORCEINLINE unsigned long long __RV_DKMMAC_U (unsigned long long t, unsigned long long a, unsigned long long b)
__STATIC_FORCEINLINE unsigned long long __RV_DKMMSB (unsigned long long t, unsigned long long a, unsigned long long b)
__STATIC_FORCEINLINE unsigned long long __RV_DKMMSB_U (unsigned long long t, unsigned long long a, unsigned long long b)
__STATIC_FORCEINLINE unsigned long long __RV_DKMADA (unsigned long long t, unsigned long long a, unsigned long long b)
__STATIC_FORCEINLINE unsigned long long __RV_DKMAXDA (unsigned long long t, unsigned long long a, unsigned long long b)
__STATIC_FORCEINLINE unsigned long long __RV_DKMADS (unsigned long long t, unsigned long long a, unsigned long long b)
__STATIC_FORCEINLINE unsigned long long __RV_DKMADRS (unsigned long long t, unsigned long long a, unsigned long long b)
__STATIC_FORCEINLINE unsigned long long __RV_DKMAXDS (unsigned long long t, unsigned long long a, unsigned long long b)
__STATIC_FORCEINLINE unsigned long long __RV_DKMSDA (unsigned long long t, unsigned long long a, unsigned long long b)
__STATIC_FORCEINLINE unsigned long long __RV_DKMSXDA (unsigned long long t, unsigned long long a, unsigned long long b)
__STATIC_FORCEINLINE unsigned long long __RV_DSMAQA (unsigned long long t, unsigned long long a, unsigned long long b)
__STATIC_FORCEINLINE unsigned long long __RV_DSMAQA_SU (unsigned long long t, unsigned long long a, unsigned long long b)
__STATIC_FORCEINLINE unsigned long long __RV_DUMAQA (unsigned long long t, unsigned long long a, unsigned long long b)
__STATIC_FORCEINLINE long long __RV_DKMDA32 (unsigned long long a, unsigned long long b)
__STATIC_FORCEINLINE long long __RV_DKMXDA32 (unsigned long long a, unsigned long long b)
__STATIC_FORCEINLINE long long __RV_DKMADA32 (long long t, unsigned long long a, unsigned long long b)
__STATIC_FORCEINLINE long long __RV_DKMAXDA32 (long long t, unsigned long long a, unsigned long long b)
__STATIC_FORCEINLINE long long __RV_DKMADS32 (long long t, unsigned long long a, unsigned long long b)
__STATIC_FORCEINLINE long long __RV_DKMADRS32 (long long t, unsigned long long a, unsigned long long b)
__STATIC_FORCEINLINE long long __RV_DKMAXDS32 (long long t, unsigned long long a, unsigned long long b)
__STATIC_FORCEINLINE long long __RV_DKMSDA32 (long long t, unsigned long long a, unsigned long long b)
__STATIC_FORCEINLINE long long __RV_DKMSXDA32 (long long t, unsigned long long a, unsigned long long b)
__STATIC_FORCEINLINE long long __RV_DSMDS32 (unsigned long long a, unsigned long long b)
__STATIC_FORCEINLINE long long __RV_DSMDRS32 (unsigned long long a, unsigned long long b)
__STATIC_FORCEINLINE long long __RV_DSMXDS32 (unsigned long long a, unsigned long long b)
__STATIC_FORCEINLINE long long __RV_DSMALDA (long long t, unsigned long long a, unsigned long long b)
__STATIC_FORCEINLINE long long __RV_DSMALXDA (long long t, unsigned long long a, unsigned long long b)
__STATIC_FORCEINLINE long long __RV_DSMALDS (long long t, unsigned long long a, unsigned long long b)
__STATIC_FORCEINLINE long long __RV_DSMALDRS (long long t, unsigned long long a, unsigned long long b)
__STATIC_FORCEINLINE long long __RV_DSMALXDS (long long t, unsigned long long a, unsigned long long b)
__STATIC_FORCEINLINE long long __RV_DSMSLDA (long long t, unsigned long long a, unsigned long long b)
__STATIC_FORCEINLINE long long __RV_DSMSLXDA (long long t, unsigned long long a, unsigned long long b)
__STATIC_FORCEINLINE long long __RV_DDSMAQA (long long t, unsigned long long a, unsigned long long b)
__STATIC_FORCEINLINE long long __RV_DDSMAQA_SU (long long t, unsigned long long a, unsigned long long b)
__STATIC_FORCEINLINE long long __RV_DDUMAQA (long long t, unsigned long long a, unsigned long long b)
__STATIC_FORCEINLINE long __RV_DSMA32_U (unsigned long long a, unsigned long long b)
__STATIC_FORCEINLINE long __RV_DSMXS32_U (unsigned long long a, unsigned long long b)
__STATIC_FORCEINLINE long __RV_DSMXA32_U (unsigned long long a, unsigned long long b)
__STATIC_FORCEINLINE long __RV_DSMS32_U (unsigned long long a, unsigned long long b)
__STATIC_FORCEINLINE long __RV_DSMADA16 (long long t, unsigned long long a, unsigned long long b)
__STATIC_FORCEINLINE long __RV_DSMAXDA16 (long long t, unsigned long long a, unsigned long long b)
__STATIC_FORCEINLINE unsigned long long __RV_DKSMS32_U (unsigned long long t, unsigned long long a, unsigned long long b)
__STATIC_FORCEINLINE long __RV_DMADA32 (long long t, unsigned long long a, unsigned long long b)
__STATIC_FORCEINLINE long long __RV_DSMALBB (long long t, unsigned long long a, unsigned long long b)
__STATIC_FORCEINLINE long long __RV_DSMALBT (long long t, unsigned long long a, unsigned long long b)
__STATIC_FORCEINLINE long long __RV_DSMALTT (long long t, unsigned long long a, unsigned long long b)
__STATIC_FORCEINLINE long long __RV_DKMABB32 (long long t, unsigned long long a, unsigned long long b)
__STATIC_FORCEINLINE long long __RV_DKMABT32 (long long t, unsigned long long a, unsigned long long b)
__STATIC_FORCEINLINE long long __RV_DKMATT32 (long long t, unsigned long long a, unsigned long long b)
group Nuclei N3 SIMD DSP Additional Instructions

(RV32 only)Nuclei Customized N3 DSP Instructions

This is Nuclei customized DSP N3 instructions only for RV32

Functions

__STATIC_FORCEINLINE unsigned long long __RV_DKMMAC (unsigned long long t, unsigned long long a, unsigned long long b)

DKMMAC (64-bit MSW 32x32 Signed Multiply and Saturating Add)

Type: SIMD

Syntax:

DKMMAC Rd, Rs1, Rs2
# Rd, Rs1, Rs2 are all even/odd pair of registers

Purpose

:

Do MSW 32x32 element signed multiplications and saturating addition simultaneously. The results are written into Rd.

Description

:

This instruction multiplies the signed 32-bit elements of Rs1 with the signed 32-bit elements of Rs2 and adds the most significant 32-bit multiplication results with the signed 32-bit elements of Rd. If the addition result is beyond the Q31 number range (-2^31 <= Q31 <= 2^31-1), it is saturated to the range and the OV bit is set to 1. The results after saturation are written to Rd. The .u form of the instruction additionally rounds up the most significant 32-bit of the 64-bit multiplication results by adding a 1 to bit 31 of the results.

Operations:

op1t = Rs1.W[x+1]; op2t = Rs2.W[x+1]; op3t = Rd.W[x+1] // top
op1b = Rs1.W[x]; op2b = Rs2.W[x]; op3b = Rd.W[x] // bottom
for ((aop,bop,dop,res) in [(op1t,op2t,op3t,rest), (op1b,op2b,op3b,resb)]) {
   res = sat.q31(dop + (aop s* bop)[63:32]);
}
Rd = concat(rest, resb);
x=0

Parameters:
  • t[in] unsigned long long type of value stored in t

  • a[in] unsigned long long type of value stored in a

  • b[in] unsigned long long type of value stored in b

Returns:

value stored in unsigned long long type

__STATIC_FORCEINLINE unsigned long long __RV_DKMMAC_U (unsigned long long t, unsigned long long a, unsigned long long b)

DKMMAC.u (64-bit MSW 32x32 Unsigned Multiply and Saturating Add)

Type: SIMD

Syntax:

DKMMAC.u Rd, Rs1, Rs2
# Rd, Rs1, Rs2 are all even/odd pair of registers

Purpose

:

Do MSW 32x32 element unsigned multiplications and saturating addition simultaneously. The results are written into Rd.

Description

:

This instruction multiplies the signed 32-bit elements of Rs1 with the signed 32-bit elements of Rs2 and adds the most significant 32-bit multiplication results with the signed 32-bit elements of Rd. If the addition result is beyond the Q31 number range (-2^31 <= Q31 <= 2^31-1), it is saturated to the range and the OV bit is set to 1. The results after saturation are written to Rd. The .u form of the instruction additionally rounds up the most significant 32-bit of the 64-bit multiplication results by adding a 1 to bit 31 of the results.

Operations:

op1t = Rs1.W[x+1]; op2t = Rs2.W[x+1]; op3t = Rd.W[x+1] // top
op1b = Rs1.W[x]; op2b = Rs2.W[x]; op3b = Rd.W[x] // bottom
for ((aop,bop,dop,res) in [(op1t,op2t,op3t,rest), (op1b,op2b,op3b,resb)]) {
  res = sat.q31(dop + RUND(aop u* bop)[63:32]);
}
Rd = concat(rest, resb);
x=0

Parameters:
  • t[in] unsigned long long type of value stored in t

  • a[in] unsigned long long type of value stored in a

  • b[in] unsigned long long type of value stored in b

Returns:

value stored in unsigned long long type

__STATIC_FORCEINLINE unsigned long long __RV_DKMMSB (unsigned long long t, unsigned long long a, unsigned long long b)

DKMMSB (64-bit MSW 32x32 Signed Multiply and Saturating Sub)

Type: SIMD

Syntax:

DKMMSB Rd, Rs1, Rs2
# Rd, Rs1, Rs2 are all even/odd pair of registers

Purpose

:

Do MSW 32x32 element signed multiplications and saturating subtraction simultaneously. The results are written into Rd.

Description

:

This instruction multiplies the signed 32-bit elements of Rs1 with the signed 32-bit elements of Rs2 and subtracts the most significant 32-bit multiplication results from the signed 32-bit elements of Rd. If the subtraction result is beyond the Q31 number range (-2^31 <= Q31 <= 2^31-1), it is saturated to the range and the OV bit is set to 1. The results after saturation are written to Rd. The .u form of the instruction additionally rounds up the most significant 32-bit of the 64-bit multiplication results by adding a 1 to bit 31 of the results.

Operations:

op1t = Rs1.W[x+1]; op2t = Rs2.W[x+1]; op3t = Rd.W[x+1] // top
op1b = Rs1.W[x]; op2b = Rs2.W[x]; op3b = Rd.W[x] // bottom
for ((aop,bop,dop,res) in [(op1t,op2t,op3t,rest), (op1b,op2b,op3b,resb)]) {
   res = sat.q31(dop - (aop s* bop)[63:32]);
}
Rd = concat(rest, resb);
x=0

Parameters:
  • t[in] unsigned long long type of value stored in t

  • a[in] unsigned long long type of value stored in a

  • b[in] unsigned long long type of value stored in b

Returns:

value stored in unsigned long long type

__STATIC_FORCEINLINE unsigned long long __RV_DKMMSB_U (unsigned long long t, unsigned long long a, unsigned long long b)

DKMMSB.u (64-bit MSW 32x32 Unsigned Multiply and Saturating Sub)

Type: SIMD

Syntax:

DKMMSB.u Rd, Rs1, Rs2
# Rd, Rs1, Rs2 are all even/odd pair of registers

Purpose

:

Do MSW 32x32 element unsigned multiplications and saturating subtraction simultaneously. The results are written into Rd.

Description

:

This instruction multiplies the signed 32-bit elements of Rs1 with the signed 32-bit elements of Rs2 and subtracts the most significant 32-bit multiplication results from the signed 32-bit elements of Rd. If the subtraction result is beyond the Q31 number range (-2^31 <= Q31 <= 2^31-1), it is saturated to the range and the OV bit is set to 1. The results after saturation are written to Rd. The .u form of the instruction additionally rounds up the most significant 32-bit of the 64-bit multiplication results by adding a 1 to bit 31 of the results.

Operations:

op1t = Rs1.W[x+1]; op2t = Rs2.W[x+1]; op3t = Rd.W[x+1] // top
op1b = Rs1.W[x]; op2b = Rs2.W[x]; op3b = Rd.W[x] // bottom
for ((aop,bop,dop,res) in [(op1t,op2t,op3t,rest), (op1b,op2b,op3b,resb)]) {
   res = sat.q31(dop - (aop u* bop)[63:32]);
}
Rd = concat(rest, resb);
x=0

Parameters:
  • t[in] unsigned long long type of value stored in t

  • a[in] unsigned long long type of value stored in a

  • b[in] unsigned long long type of value stored in b

Returns:

value stored in unsigned long long type

__STATIC_FORCEINLINE unsigned long long __RV_DKMADA (unsigned long long t, unsigned long long a, unsigned long long b)

DKMADA (Saturating Signed Multiply Two Halfs and Two Adds)

Type: DSP

Syntax:

DKMADA Rd, Rs1, Rs2

Purpose

:

Do two 16x16 with 32-bit signed double addition simultaneously. The results are written into Rd.

Description

:

It multiplies the bottom 16-bit content of 32-bit elements in Rs1 with the bottom 16-bit content of 32-bit elements in Rs2 and then adds the result to the result of multiplying the top 16-bit content of 32-bit elements in Rs1 with the top 16-bit content of 32-bit elements in Rs2.

Operations:

op1t = Rs1.W[x+1]; op2t = Rs2.W[x+1]; op3t = Rd.W[x+1] // top
op1b = Rs1.W[x]; op2b = Rs2.W[x]; op3b = Rd.W[x] // bottom

for ((aop,bop,dop,res) in [(op1t,op2t,op3t,rest), (op1b,op2b,op3b,resb)]) {
  mul1 = aop.H[1] s* bop.H[1];
  mul2 = aop.H[0] s* bop.H[0];
  res = sat.q31(dop + mul1 + mul2);
}
Rd = concat(rest, resb);
x=0

Parameters:
  • t[in] unsigned long long type of value stored in t

  • a[in] unsigned long long type of value stored in a

  • b[in] unsigned long long type of value stored in b

Returns:

value stored in unsigned long long type

__STATIC_FORCEINLINE unsigned long long __RV_DKMAXDA (unsigned long long t, unsigned long long a, unsigned long long b)

DKMAXDA (Two Cross 16x16 with 32-bit Signed Double Add)

Type: DSP

Syntax:

DKMAXDA Rd, Rs1, Rs2

Purpose

:

Do two cross 16x16 with 32-bit signed double addition simultaneously. The results are written into Rd.

Description

:

It multiplies the top 16-bit content of 32-bit elements in Rs1 with the bottom 16-bit content of 32-bit elements in Rs2 and then adds the result to the result of multiplying the bottom 16-bit content of 32-bit elements in Rs1 with the top 16-bit content of 32-bit elements in elements in Rs2.

Operations:

op1t = Rs1.W[x+1]; op2t = Rs2.W[x+1]; op3t = Rd.W[x+1] // top
op1b = Rs1.W[x]; op2b = Rs2.W[x]; op3b = Rd.W[x] // bottom

for ((aop,bop,dop,res) in [(op1t,op2t,op3t,rest), (op1b,op2b,op3b,resb)]) {
  mul1 = aop.H[1] s* bop.H[0];
  mul2 = aop.H[0] s* bop.H[1];
  res = sat.q31(dop + mul1 + mul2);
}
Rd = concat(rest, resb);
x=0

Parameters:
  • t[in] unsigned long long type of value stored in t

  • a[in] unsigned long long type of value stored in a

  • b[in] unsigned long long type of value stored in b

Returns:

value stored in unsigned long long type

__STATIC_FORCEINLINE unsigned long long __RV_DKMADS (unsigned long long t, unsigned long long a, unsigned long long b)

DKMADS (Two 16x16 with 32-bit Signed Add and Sub)

Type: DSP

Syntax:

DKMADS Rd, Rs1, Rs2

Purpose

:

Do two 16x16 with 32-bit signed addition and subtraction simultaneously. The results are written into Rd.

Description

:

It multiplies the bottom 16-bit content of 32-bit elements in Rs1 with the bottom 16-bit content of 32-bit elements in Rs2 and then subtracts the result from the result of multiplying the top 16-bit content of 32-bit elements in Rs1 with the top 16-bit content of 32-bit elements in Rs2.

Operations:

op1t = Rs1.W[x+1]; op2t = Rs2.W[x+1]; op3t = Rd.W[x+1] // top
op1b = Rs1.W[x]; op2b = Rs2.W[x]; op3b = Rd.W[x] // bottom

for ((aop,bop,dop,res) in [(op1t,op2t,op3t,rest), (op1b,op2b,op3b,resb)]) {
  mul1 = aop.H[1] s* bop.H[1];
  mul2 = aop.H[0] s* bop.H[0];
  res = sat.q31(dop + mul1 - mul2);
}
Rd = concat(rest, resb);
x=0

Parameters:
  • t[in] unsigned long long type of value stored in t

  • a[in] unsigned long long type of value stored in a

  • b[in] unsigned long long type of value stored in b

Returns:

value stored in unsigned long long type

__STATIC_FORCEINLINE unsigned long long __RV_DKMADRS (unsigned long long t, unsigned long long a, unsigned long long b)

DKMADRS (Two 16x16 with 32-bit Signed Add and Reversed Sub)

Type: DSP

Syntax:

DKMADRS Rd, Rs1, Rs2

Purpose

:

Do two 16x16 with 32-bit signed addition and revered subtraction simultaneously. The results are written into Rd.

Description

:

it multiplies the top 16-bit content of 32-bit elements in Rs1 with the top 16-bit content of 32-bit elements in Rs2 and then subtracts the result from the result of multiplying the bottom 16-bit content of 32-bit elements in Rs1 with the bottom 16-bit content of 32- bit elements in Rs2

Operations:

op1t = Rs1.W[x+1]; op2t = Rs2.W[x+1]; op3t = Rd.W[x+1] // top
op1b = Rs1.W[x]; op2b = Rs2.W[x]; op3b = Rd.W[x] // bottom

for ((aop,bop,dop,res) in [(op1t,op2t,op3t,rest), (op1b,op2b,op3b,resb)]) {
  mul1 = aop.H[1] s* bop.H[1];
  mul2 = aop.H[0] s* bop.H[0];
  res = sat.q31(dop - mul1 + mul2);
}
Rd = concat(rest, resb);
x=0

Parameters:
  • t[in] unsigned long long type of value stored in t

  • a[in] unsigned long long type of value stored in a

  • b[in] unsigned long long type of value stored in b

Returns:

value stored in unsigned long long type

__STATIC_FORCEINLINE unsigned long long __RV_DKMAXDS (unsigned long long t, unsigned long long a, unsigned long long b)

DKMAXDS (Saturating Signed Crossed Multiply Two Halfs & Subtract & Add)

Type: DSP

Syntax:

DKMAXDS Rd, Rs1, Rs2

Purpose

:

Do two cross 16x16 with 32-bit signed addition and subtraction simultaneously. The results are written into Rd.

Description

:

Do two signed 16-bit multiplications from 32-bit elements in two registers; and then perform a subtraction operation between the two 32-bit results. Then add the subtraction result to the corresponding 32-bit elements in a third register. The addition result may be saturated.

Operations:

op1t = Rs1.W[x+1]; op2t = Rs2.W[x+1]; op3t = Rd.W[x+1] // top
op1b = Rs1.W[x]; op2b = Rs2.W[x]; op3b = Rd.W[x] // bottom

for ((aop,bop,dop,res) in [(op1t,op2t,op3t,rest), (op1b,op2b,op3b,resb)]) {
  mul1 = aop.H[1] s* bop.H[0];
  mul2 = aop.H[0] s* bop.H[1];
  res = sat.q31(dop + mul1 - mul2);
}
Rd = concat(rest, resb);
x=0

Parameters:
  • t[in] unsigned long long type of value stored in t

  • a[in] unsigned long long type of value stored in a

  • b[in] unsigned long long type of value stored in b

Returns:

value stored in unsigned long long type

__STATIC_FORCEINLINE unsigned long long __RV_DKMSDA (unsigned long long t, unsigned long long a, unsigned long long b)

DKMSDA (Two 16x16 with 32-bit Signed Double Sub)

Type: DSP

Syntax:

DKMSDA Rd, Rs1, Rs2

Purpose

:

Do two 16x16 with 32-bit signed double subtraction simultaneously. The results are written into Rd.

Description

:

it multiplies the bottom 16-bit content of the 32-bit elements of Rs1 with the bottom 16-bit content of the 32-bit elements of Rs2 and multiplies the top 16-bit content of the 32-bit elements of Rs1 with the top 16-bit content of the 32-bit elements of Rs2.

Operations:

op1t = Rs1.W[x+1]; op2t = Rs2.W[x+1]; op3t = Rd.W[x+1] // top
op1b = Rs1.W[x]; op2b = Rs2.W[x]; op3b = Rd.W[x] // bottom

for ((aop,bop,dop,res) in [(op1t,op2t,op3t,rest), (op1b,op2b,op3b,resb)]) {
  mul1 = aop.H[1] s* bop.H[0];
  mul2 = aop.H[0] s* bop.H[1];
  res = sat.q31(dop - mul1 - mul2);
}
Rd = concat(rest, resb);
x=0

Parameters:
  • t[in] unsigned long long type of value stored in t

  • a[in] unsigned long long type of value stored in a

  • b[in] unsigned long long type of value stored in b

Returns:

value stored in unsigned long long type

__STATIC_FORCEINLINE unsigned long long __RV_DKMSXDA (unsigned long long t, unsigned long long a, unsigned long long b)

DKMSXDA (Two Cross 16x16 with 32-bit Signed Double Sub)

Type: DSP

Syntax:

DKMSXDA Rd, Rs1, Rs2

Purpose

:

Do two cross 16x16 with 32-bit signed double subtraction simultaneously. The results are written into Rd.

Description

:

It multiplies the bottom 16-bit content of the 32-bit elements of Rs1 with the top 16-bit content of the 32-bit elements of Rs2 and multiplies the top 16-bit content of the 32-bit elements of Rs1 with the bottom 16-bit content of the 32-bit elements of Rs2.

Operations:

op1t = Rs1.W[x+1]; op2t = Rs2.W[x+1]; op3t = Rd.W[x+1] // top
op1b = Rs1.W[x]; op2b = Rs2.W[x]; op3b = Rd.W[x] // bottom

for ((aop,bop,dop,res) in [(op1t,op2t,op3t,rest), (op1b,op2b,op3b,resb)]) {
  mul1 = aop.H[1] s* bop.H[0];
  mul2 = aop.H[0] s* bop.H[1];
  res = sat.q31(dop - mul1 - mul2);
}
Rd = concat(rest, resb);
x=0

Parameters:
  • t[in] unsigned long long type of value stored in t

  • a[in] unsigned long long type of value stored in a

  • b[in] unsigned long long type of value stored in b

Returns:

value stored in unsigned long long type

__STATIC_FORCEINLINE unsigned long long __RV_DSMAQA (unsigned long long t, unsigned long long a, unsigned long long b)

DSMAQA (Four Signed 8x8 with 32-bit Signed Add)

Type: DSP

Syntax:

DSMAQA Rd, Rs1, Rs2

Purpose

:

Do four signed 8x8 with 32-bit signed addition simultaneously. The results are written into Rd.

Description

:

This instruction multiplies the four signed 8-bit elements of 32-bit chunks of Rs1 with the four signed 8-bit elements of 32-bit chunks of Rs2 and then adds the four results together with the signed content of the corresponding 32-bit chunks of Rd. The final results are written back to the corresponding 32-bit chunks in Rd.

Operations:

op1t = Rs1.W[x+1]; op2t = Rs2.W[x+1]; op3t = Rd.W[x+1] // top
op1b = Rs1.W[x]; op2b = Rs2.W[x]; op3b = Rd.W[x] // bottom

for ((aop,bop,dop,res) in [(op1t,op2t,op3t,rest), (op1b,op2b,op3b,resb)]) {
  m0 = aop.B[0] s* bop.B[0];
  m1 = aop.B[1] s* bop.B[1];
  m2 = aop.B[2] s* bop.B[2];
  m3 = aop.B[3] s* bop.B[3];
  res = dop + m0 + m1 + m2 + m3;
}
Rd = concat(rest, resb);
x=0

Parameters:
  • t[in] unsigned long long type of value stored in t

  • a[in] unsigned long long type of value stored in a

  • b[in] unsigned long long type of value stored in b

Returns:

value stored in unsigned long long type

__STATIC_FORCEINLINE unsigned long long __RV_DSMAQA_SU (unsigned long long t, unsigned long long a, unsigned long long b)

DSMAQA.SU (Four Signed 8 x Unsigned 8 with 32-bit Signed Add)

Type: DSP

Syntax:

DSMAQA.SU Rd, Rs1, Rs2

Purpose

:

Do four Signed 8 x Unsigned 8 with 32-bit unsigned addition simultaneously. The results are written into Rd.

Description

:

This instruction multiplies the four unsigned 8-bit elements of 32-bit chunks of Rs1 with the four signed 8-bit elements of 32-bit chunks of Rs2 and then adds the four results together with the unsigned content of the corresponding 32-bit chunks of Rd. The final results are written back to the corresponding 32-bit chunks in Rd.

Operations:

op1t = Rs1.W[x+1]; op2t = Rs2.W[x+1]; op3t = Rd.W[x+1] // top
op1b = Rs1.W[x]; op2b = Rs2.W[x]; op3b = Rd.W[x] // bottom

for ((aop,bop,dop,res) in [(op1t,op2t,op3t,rest), (op1b,op2b,op3b,resb)]) {
  m0 = aop.B[0] su* bop.B[0];
  m1 = aop.B[1] su* bop.B[1];
  m2 = aop.B[2] su* bop.B[2];
  m3 = aop.B[3] su* bop.B[3];
  res = dop + m0 + m1 + m2 + m3;
}
Rd = concat(rest, resb);
x=0

Parameters:
  • t[in] unsigned long long type of value stored in t

  • a[in] unsigned long long type of value stored in a

  • b[in] unsigned long long type of value stored in b

Returns:

value stored in unsigned long long type

__STATIC_FORCEINLINE unsigned long long __RV_DUMAQA (unsigned long long t, unsigned long long a, unsigned long long b)

DUMAQA (Four Unsigned 8x8 with 32-bit Unsigned Add)

Type: DSP

Syntax:

DUMAQA Rd, Rs1, Rs2

Purpose

:

Do four unsigned 8x8 with 32-bit unsigned addition simultaneously. The results are written into Rd.

Description

:

This instruction multiplies the four unsigned 8-bit elements of 32-bit chunks of Rs1 with the four unsigned 8-bit elements of 32-bit chunks of Rs2 and then adds the four results together with the unsigned content of the corresponding 32-bit chunks of Rd. The final results are written back to the corresponding 32-bit chunks in Rd.

Operations:

op1t = Rs1.W[x+1]; op2t = Rs2.W[x+1]; op3t = Rd.W[x+1] // top
op1b = Rs1.W[x]; op2b = Rs2.W[x]; op3b = Rd.W[x] // bottom

for ((aop,bop,dop,res) in [(op1t,op2t,op3t,rest), (op1b,op2b,op3b,resb)]) {
  m0 = aop.B[0] su* bop.B[0];
  m1 = aop.B[1] su* bop.B[1];
  m2 = aop.B[2] su* bop.B[2];
  m3 = aop.B[3] su* bop.B[3];
  res = dop + m0 + m1 + m2 + m3;
}
Rd = concat(rest, resb);
x=0

Parameters:
  • t[in] unsigned long long type of value stored in t

  • a[in] unsigned long long type of value stored in a

  • b[in] unsigned long long type of value stored in b

Returns:

value stored in unsigned long long type

__STATIC_FORCEINLINE long long __RV_DKMDA32 (unsigned long long a, unsigned long long b)

DKMDA32 (Two Signed 32x32 with 64-bit Saturation Add)

Type: DSP

Syntax:

DKMDA32 Rd, Rs1, Rs2

Purpose

:

Do two signed 32x32 add the signed multiplication results with Q63 saturation. The results are written into Rd.

Description

:

For the

KMDA32 instruction, it multiplies the bottom 32-bit element of Rs1 with the bottom 32-bit element of Rs2 and then adds the result to the result of multiplying the top 32-bit element of Rs1 with the top 32-bit element of Rs2.

Operations:

op1t = Rs1.W[x+1]; op2t = Rs2.W[x+1]; // top
op1b = Rs1.W[x]; op2b = Rs2.W[x]; // bottom
t0 = op1b s* op2b;
t1 = op1t s* op2t;
Rd = concat(rest, resb);
x=0

Parameters:
  • a[in] unsigned long long type of value stored in a

  • b[in] unsigned long long type of value stored in b

Returns:

value stored in long long type

__STATIC_FORCEINLINE long long __RV_DKMXDA32 (unsigned long long a, unsigned long long b)

DKMXDA32 (Two Cross Signed 32x32 with 64-bit Saturation Add)

Type: DSP

Syntax:

DKMXDA32 Rd, Rs1, Rs2

Purpose

:

Do two cross signed 32x32 and add the signed multiplication results with Q63 saturation. The results are written into Rd.

Description

:

It multiplies the bottom 32-bit element of Rs1 with the top 32-bit element of Rs2 and then adds the result to the result of multiplying the top 32-bit element of Rs1 with the bottom 32-bit element of Rs2.

Operations:

op1t = Rs1.W[x+1]; op2t = Rs2.W[x+1]; // top
op1b = Rs1.W[x]; op2b = Rs2.W[x]; // bottom
t01 = op1b s* op2t;
t10 = op1t s* op2b;
Rd = sat.q63(t01 + t10);
x=0

Parameters:
  • a[in] unsigned long long type of value stored in a

  • b[in] unsigned long long type of value stored in b

Returns:

value stored in long long type

__STATIC_FORCEINLINE long long __RV_DKMADA32 (long long t, unsigned long long a, unsigned long long b)

DKMADA32 (Two Signed 32x32 with 64-bit Saturation Add)

Type: DSP

Syntax:

DKMADA32 Rd, Rs1, Rs2

Purpose

:

Do two signed 32x32 and add the signed multiplication results and a third register with Q63 saturation. The results are written into Rd.

Description

:

It multiplies the bottom 32-bit element of Rs1 with the bottom 32-bit element of Rs2 and then adds the result to the result of multiplying the top 32-bit element of Rs1 with the top 32-bit element of Rs2.

Operations:

op1t = Rs1.W[x+1]; op2t = Rs2.W[x+1]; // top
op1b = Rs1.W[x]; op2b = Rs2.W[x]; // bottom
t01 = op1b s* op2b;
t10 = op1t s* op2t;
Rd = sat.q63(t01 + t10);
x=0

Parameters:
  • t[in] long long type of value stored in t

  • a[in] unsigned long long type of value stored in a

  • b[in] unsigned long long type of value stored in b

Returns:

value stored in long long type

__STATIC_FORCEINLINE long long __RV_DKMAXDA32 (long long t, unsigned long long a, unsigned long long b)

DKMAXDA32 (Two Cross Signed 32x32 with 64-bit Saturation Add)

Type: DSP

Syntax:

DKMAXDA32 Rd, Rs1, Rs2

Purpose

:

Do two cross signed 32x32 and add the signed multiplication results and a third register with Q63 saturation. The results are written into Rd.

Description

:

It multiplies the top 32-bit element in Rs1 with the bottom 32-bit element in Rs2 and then adds the result to the result of multiplying the bottom 32-bit element in Rs1 with the top 32-bit element in Rs2.

Operations:

op1t = Rs1.W[x+1]; op2t = Rs2.W[x+1]; // top
op1b = Rs1.W[x]; op2b = Rs2.W[x]; // bottom
t01 = op1b s* op2t;
t10 = op1t s* op2b;
Rd = sat.q63(Rd + t01 + t10);
x=0

Parameters:
  • t[in] long long type of value stored in t

  • a[in] unsigned long long type of value stored in a

  • b[in] unsigned long long type of value stored in b

Returns:

value stored in long long type

__STATIC_FORCEINLINE long long __RV_DKMADS32 (long long t, unsigned long long a, unsigned long long b)

DKMADS32 (Two Signed 32x32 with 64-bit Saturation Add and Sub)

Type: DSP

Syntax:

DKMADS32 Rd, Rs1, Rs2

Purpose

:

Do two signed 32x32 and add the top signed multiplication results and subtraction bottom signed multiplication results and add a third register with Q63 saturation. The results are written into Rd.

Description

:

It multiplies the top 32-bit element in Rs1 with the bottom 32-bit element in Rs2 and then subtracts the result to the result of multiplying the top 32-bit element in Rs1 with the top 32-bit element in Rs2.

Operations:

op1t = Rs1.W[x+1]; op2t = Rs2.W[x+1]; // top
op1b = Rs1.W[x]; op2b = Rs2.W[x]; // bottom

t0 = op1b s* op2b;
t1 = op1t s* op2t;
Rd = sat.q63(Rd - t0 + t1);
x=0

Parameters:
  • t[in] long long type of value stored in t

  • a[in] unsigned long long type of value stored in a

  • b[in] unsigned long long type of value stored in b

Returns:

value stored in long long type

__STATIC_FORCEINLINE long long __RV_DKMADRS32 (long long t, unsigned long long a, unsigned long long b)

DKMADRS32 (Two Signed 32x32 with 64-bit Saturation Revered Add and Sub)

Type: DSP

Syntax:

DKMADRS32 Rd, Rs1, Rs2

Purpose

:

Do two signed 32x32 and add the signed multiplication results and a third register with Q63 saturation. The results are written into Rd.Do two signed 32x32 and subtraction the top signed multiplication results and add bottom signed multiplication results and add a third register with Q63 saturation. The results are written into Rd.

Description

:

It multiplies the top 32-bit element in Rs1 with the top 32-bit element in Rs2 and then subtracts the result from the result of multiplying the bottom 32-bit element in Rs1 with the bottom 32-bit element in Rs2.

Operations:

op1t = Rs1.W[x+1]; op2t = Rs2.W[x+1]; // top
op1b = Rs1.W[x]; op2b = Rs2.W[x]; // bottom
t0 = op1b s* op2b;
t1 = op1t s* op2t;
Rd = sat.q63(Rd + t0 - t1);
x=0

Parameters:
  • t[in] long long type of value stored in t

  • a[in] unsigned long long type of value stored in a

  • b[in] unsigned long long type of value stored in b

Returns:

value stored in long long type

__STATIC_FORCEINLINE long long __RV_DKMAXDS32 (long long t, unsigned long long a, unsigned long long b)

DKMAXDS32 (Two Cross Signed 32x32 with 64-bit Saturation Add and Sub)

Type: DSP

Syntax:

DKMAXDS32 Rd, Rs1, Rs2

Purpose

:

Do two signed 32x32 and add the top signed multiplication results and subtraction bottom signed multiplication results and add a third register with Q63 saturation. The results are written into Rd.

Description

:

It multiplies the bottom 32-bit element in Rs1 with the top 32-bit element in Rs2 and then subtracts the result from the result of multiplying the top 32-bit element in Rs1 with the bottom 32-bit element in Rs2.

Operations:

op1t = Rs1.W[x+1]; op2t = Rs2.W[x+1]; // top
op1b = Rs1.W[x]; op2b = Rs2.W[x]; // bottom

t01 = op1b s* op2t;
t10 = op1t s* op2b;
Rd = sat.q63(Rd - t01 + t10);
x=0

Parameters:
  • t[in] long long type of value stored in t

  • a[in] unsigned long long type of value stored in a

  • b[in] unsigned long long type of value stored in b

Returns:

value stored in long long type

__STATIC_FORCEINLINE long long __RV_DKMSDA32 (long long t, unsigned long long a, unsigned long long b)

DKMSDA32 (Two Signed 32x32 with 64-bit Saturation Sub)

Type: DSP

Syntax:

DKMSDA32 Rd, Rs1, Rs2

Purpose

:

Do two signed 32x32 and subtraction the top signed multiplication results and subtraction bottom signed multiplication results and add a third register with Q63 saturation. The results are written into Rd.

Description

:

It multiplies the bottom 32-bit element of Rs1 with the bottom 32-bit element of Rs2 and multiplies the top 32-bit element of Rs1 with the top 32-bit element of Rs2.

Operations:

op1t = Rs1.W[x+1]; op2t = Rs2.W[x+1]; // top
op1b = Rs1.W[x]; op2b = Rs2.W[x]; // bottom

t0 = op1b s* op2b;
t1 = op1t s* op2t;
Rd = sat.q63(Rd - t0 - t1);
x=0

Parameters:
  • t[in] long long type of value stored in t

  • a[in] unsigned long long type of value stored in a

  • b[in] unsigned long long type of value stored in b

Returns:

value stored in long long type

__STATIC_FORCEINLINE long long __RV_DKMSXDA32 (long long t, unsigned long long a, unsigned long long b)

DKMSXDA32 (Two Cross Signed 32x32 with 64-bit Saturation Sub)

Type: DSP

Syntax:

DKMSXDA32 Rd, Rs1, Rs2

Purpose

:

Do two cross signed 32x32 and subtraction the top signed multiplication results and subtraction bottom signed multiplication results and add a third register with Q63 saturation. The results are written into Rd.

Description

:

It multiplies the bottom 32-bit element of Rs1 with the top 32-bit element of Rs2 and multiplies the top 32-bit element of Rs1 with the bottom 32-bit element of Rs2.

Operations:

op1t = Rs1.W[x+1]; op2t = Rs2.W[x+1]; // top
op1b = Rs1.W[x]; op2b = Rs2.W[x]; // bottom

t0 = op1b s* op2t;
t1 = op1t s* op2b;
Rd = sat.q63(Rd - t0 - t1);
x=0

Parameters:
  • t[in] long long type of value stored in t

  • a[in] unsigned long long type of value stored in a

  • b[in] unsigned long long type of value stored in b

Returns:

value stored in long long type

__STATIC_FORCEINLINE long long __RV_DSMDS32 (unsigned long long a, unsigned long long b)

DSMDS32 (Two Signed 32x32 with 64-bit Sub)

Type: DSP

Syntax:

DSMDS32 Rd, Rs1, Rs2

Purpose

:

Do two signed 32x32 and add the top signed multiplication results and subtraction bottom signed multiplication. The results are written into Rd.

Description

:

It multiplies the bottom 32-bit element of Rs1 with the bottom 32-bit element of Rs2 and then subtracts the result from the result of multiplying the top 32-bit element of Rs1 with the top 32-bit element of Rs2.

Operations:

op1t = Rs1.W[x+1]; op2t = Rs2.W[x+1]; // top
op1b = Rs1.W[x]; op2b = Rs2.W[x]; // bottom

t0 = op1b s* op2t;
t1 = op1t s* op2b;
Rd = t1 - t0;
x=0

Parameters:
  • a[in] unsigned long long type of value stored in a

  • b[in] unsigned long long type of value stored in b

Returns:

value stored in long long type

__STATIC_FORCEINLINE long long __RV_DSMDRS32 (unsigned long long a, unsigned long long b)

DSMDRS32 (Two Signed 32x32 with 64-bit Revered Sub)

Type: DSP

Syntax:

DSMDRS32 Rd, Rs1, Rs2

Purpose

:

Do two signed 32x32 and subtraction the top signed multiplication results and add bottom signed multiplication. The results are written into Rd

Description

:

It multiplies the top 32-bit element of Rs1 with the top 32-bit element of Rs2 and then subtracts the result from the result of multiplying the bottom 32-bit element of Rs1 with the bottom 32-bit element of Rs2.

Operations:

op1t = Rs1.W[x+1]; op2t = Rs2.W[x+1]; // top
op1b = Rs1.W[x]; op2b = Rs2.W[x]; // bottom

t0 = op1b s* op2b;
t1 = op1t s* op2t;
Rd = t1 - t0;
x=0

Parameters:
  • a[in] unsigned long long type of value stored in a

  • b[in] unsigned long long type of value stored in b

Returns:

value stored in long long type

__STATIC_FORCEINLINE long long __RV_DSMXDS32 (unsigned long long a, unsigned long long b)

DSMXDS32 (Two Cross Signed 32x32 with 64-bit Sub)

Type: DSP

Syntax:

DSMXDS32 Rd, Rs1, Rs2

Purpose

:

Do two cross signed 32x32 and add the top signed multiplication results and subtraction bottom signed multiplication. The results are written into Rd.

Description

:

It multiplies the bottom 32-bit element of Rs1 with the top 32-bit element of Rs2 and then subtracts the result from the result of multiplying the top 32-bit element of Rs1 with the bottom 32-bit element of Rs2.

Operations:

op1t = Rs1.W[x+1]; op2t = Rs2.W[x+1]; // top
op1b = Rs1.W[x]; op2b = Rs2.W[x]; // bottom

t01 = op1b s* op2t;
t10 = op1t s* op2b;
Rd = t1 - t0;
x=0

Parameters:
  • a[in] unsigned long long type of value stored in a

  • b[in] unsigned long long type of value stored in b

Returns:

value stored in long long type

__STATIC_FORCEINLINE long long __RV_DSMALDA (long long t, unsigned long long a, unsigned long long b)

DSMALDA (Four Signed 16x16 with 64-bit Add)

Type: DSP

Syntax:

DSMALDA Rd, Rs1, Rs2

Purpose

:

Do four signed 16x16 and add signed multiplication results and a third register. The results are written into Rd.

Description

:

It multiplies the bottom 16-bit content of Rs1 with the bottom 16-bit content of Rs2 and then adds the result to the result of multiplying the top 16-bit content of Rs1 with the top 16-bit content of Rs2 with unlimited precision

Operations:

op1t = Rs1.W[x+1]; op2t = Rs2.W[x+1]; // top
op1b = Rs1.W[x]; op2b = Rs2.W[x]; // bottom

m0 = op1b.H[0] s* op2b.H[0];
m1 = op1b.H[1] s* op2b.H[1];
m2 = op1t.H[0] s* op2t.H[0];
m3 = op1t.H[1] s* op2t.H[1];

Rd = Rd + m0 + m1 + m2 + m3;
x=0

Parameters:
  • t[in] long long type of value stored in t

  • a[in] unsigned long long type of value stored in a

  • b[in] unsigned long long type of value stored in b

Returns:

value stored in long long type

__STATIC_FORCEINLINE long long __RV_DSMALXDA (long long t, unsigned long long a, unsigned long long b)

DSMALXDA (Four Signed 16x16 with 64-bit Add)

Type: DSP

Syntax:

DSMALXDA Rd, Rs1, Rs2

Purpose

:

Do four cross signed 16x16 and add signed multiplication results and a third register. The results are written into Rd.

Description

:

It multiplies the top 16-bit content of Rs1 with the bottom 16-bit content of Rs2 and then adds the result to the result of multiplying the bottom 16-bit content of Rs1 with the top 16-bit content of Rs2 with unlimited precision.

Operations:

op1t = Rs1.W[x+1]; op2t = Rs2.W[x+1]; // top
op1b = Rs1.W[x]; op2b = Rs2.W[x]; // bottom

m0 = op1b.H[0] s* op2b.H[1];
m1 = op1b.H[1] s* op2b.H[0];
m2 = op1t.H[0] s* op2t.H[1];
m3 = op1t.H[1] s* op2t.H[0];

Rd = Rd + m0 + m1 + m2 + m3;
x=0

Parameters:
  • t[in] long long type of value stored in t

  • a[in] unsigned long long type of value stored in a

  • b[in] unsigned long long type of value stored in b

Returns:

value stored in long long type

__STATIC_FORCEINLINE long long __RV_DSMALDS (long long t, unsigned long long a, unsigned long long b)

DSMALDS (Four Signed 16x16 with 64-bit Add and Sub)

Type: DSP

Syntax:

DSMALDS Rd, Rs1, Rs2

Purpose

:

Do four signed 16x16 and add and subtraction signed multiplication results and a third register. The results are written into Rd.

Description

:

It multiplies the bottom 16-bit content of Rs1 with the bottom 16-bit content of Rs2 and then subtracts the result from the result of multiplying the top 16-bit content of Rs1 with the top 16-bit content of Rs2.

Operations:

op1t = Rs1.W[x+1]; op2t = Rs2.W[x+1]; // top
op1b = Rs1.W[x]; op2b = Rs2.W[x]; // bottom

m0 = op1b.H[1] s* op2b.H[1];
m1 = op1b.H[0] s* op2b.H[0];
m2 = op1t.H[1] s* op2t.H[1];
m3 = op1t.H[0] s* op2t.H[0];

Rd = Rd + m0 - m1 + m2 - m3;
x=0

Parameters:
  • t[in] long long type of value stored in t

  • a[in] unsigned long long type of value stored in a

  • b[in] unsigned long long type of value stored in b

Returns:

value stored in long long type

__STATIC_FORCEINLINE long long __RV_DSMALDRS (long long t, unsigned long long a, unsigned long long b)

DSMALDRS (Four Signed 16x16 with 64-bit Add and Revered Sub)

Type: DSP

Syntax:

DSMALDRS Rd, Rs1, Rs2

Purpose

:

Do two signed 16x16 and add and revered subtraction signed multiplication results and a third register. The results are written into Rd.

Description

:

It multiplies the top 16-bit content of Rs1 with the top 16-bit content of Rs2 and then subtracts the result from the result of multiplying the bottom 16-bit content of Rs1 with the bottom 16-bit content of Rs2.

Operations:

op1t = Rs1.W[x+1]; op2t = Rs2.W[x+1]; // top
op1b = Rs1.W[x]; op2b = Rs2.W[x]; // bottom

m0 = op1b.H[0] s* op2b.H[0];
m1 = op1b.H[1] s* op2b.H[1];
m2 = op1t.H[0] s* op2t.H[0];
m3 = op1t.H[1] s* op2t.H[1];

Rd = Rd + m0 - m1 + m2 - m3;
x=0

Parameters:
  • t[in] long long type of value stored in t

  • a[in] unsigned long long type of value stored in a

  • b[in] unsigned long long type of value stored in b

Returns:

value stored in long long type

__STATIC_FORCEINLINE long long __RV_DSMALXDS (long long t, unsigned long long a, unsigned long long b)

DSMALXDS (Four Cross Signed 16x16 with 64-bit Add and Sub)

Type: DSP

Syntax:

DSMALXDS Rd, Rs1, Rs2

Purpose

:

Do four cross signed 16x16 and add and subtraction signed multiplication results and a third register. The results are written into Rd.

Description

:

It multiplies the bottom 16-bit content of Rs1 with the top 16-bit content of Rs2 and then subtracts the result from the result of multiplying the top 16-bit content of Rs1 with the bottom 16-bit content of Rs2.

Operations:

op1t = Rs1.W[x+1]; op2t = Rs2.W[x+1]; // top
op1b = Rs1.W[x]; op2b = Rs2.W[x]; // bottom

m0 = op1b.H[1] s* op2b.H[0];
m1 = op1b.H[0] s* op2b.H[1];
m2 = op1t.H[1] s* op2t.H[0];
m3 = op1t.H[0] s* op2t.H[1];

Rd = Rd + m0 - m1 + m2 - m3;
x=0

Parameters:
  • t[in] long long type of value stored in t

  • a[in] unsigned long long type of value stored in a

  • b[in] unsigned long long type of value stored in b

Returns:

value stored in long long type

__STATIC_FORCEINLINE long long __RV_DSMSLDA (long long t, unsigned long long a, unsigned long long b)

DSMSLDA (Four Signed 16x16 with 64-bit Sub)

Type: DSP

Syntax:

DSMSLDA Rd, Rs1, Rs2

Purpose

:

Do four signed 16x16 and subtraction signed multiplication results and add a third register. The results are written into Rd.

Description

:

It multiplies the bottom 16-bit content of Rs1 with the bottom 16-bit content Rs2 and multiplies the top 16-bit content of Rs1 with the top 16-bit content of Rs2.

Operations:

op1t = Rs1.W[x+1]; op2t = Rs2.W[x+1]; // top
op1b = Rs1.W[x]; op2b = Rs2.W[x]; // bottom

m0 = op1b.H[0] s* op2b.H[0];
m1 = op1b.H[1] s* op2b.H[1];
m2 = op1t.H[0] s* op2t.H[0];
m3 = op1t.H[1] s* op2t.H[1];

Rd = Rd - m0 - m1 - m2 - m3;
x=0

Parameters:
  • t[in] long long type of value stored in t

  • a[in] unsigned long long type of value stored in a

  • b[in] unsigned long long type of value stored in b

Returns:

value stored in long long type

__STATIC_FORCEINLINE long long __RV_DSMSLXDA (long long t, unsigned long long a, unsigned long long b)

DSMSLXDA (Four Cross Signed 16x16 with 64-bit Sub)

Type: DSP

Syntax:

DSMSLXDA Rd, Rs1, Rs2

Purpose

:

Do four signed 16x16 and subtraction signed multiplication results and add a third register. The results are written into Rd.

Description

:

It multiplies the top 16-bit content of Rs1 with the bottom 16-bit content of Rs2 and multiplies the bottom 16-bit content of Rs1 with the top 16-bit content of Rs2.

Operations:

op1t = Rs1.W[x+1]; op2t = Rs2.W[x+1]; // top
op1b = Rs1.W[x]; op2b = Rs2.W[x]; // bottom

m0 = op1b.H[0] s* op2b.H[1];
m1 = op1b.H[1] s* op2b.H[0];
m2 = op1t.H[0] s* op2t.H[1];
m3 = op1t.H[1] s* op2t.H[0];

Rd = Rd - m0 - m1 - m2 - m3;
x=0

Parameters:
  • t[in] long long type of value stored in t

  • a[in] unsigned long long type of value stored in a

  • b[in] unsigned long long type of value stored in b

Returns:

value stored in long long type

__STATIC_FORCEINLINE long long __RV_DDSMAQA (long long t, unsigned long long a, unsigned long long b)

DDSMAQA (Eight Signed 8x8 with 64-bit Add)

Type: DSP

Syntax:

DDSMAQA Rd, Rs1, Rs2

Purpose

:

Do eight signed 8x8 and add signed multiplication results and a third register. The results are written into Rd.

Description

:

Do eight signed 8-bit multiplications from eight 8-bit chunks of two registers; and then adds the eight 16-bit results and the content of 64-bit chunks of a third register.

Operations:

op1t = Rs1.W[x+1]; op2t = Rs2.W[x+1]; // top
op1b = Rs1.W[x]; op2b = Rs2.W[x]; // bottom

m0 = op1b.B[0] s* op2b.B[0];
m1 = op1b.B[1] s* op2b.B[1];
m2 = op1b.B[2] s* op2b.B[2];
m3 = op1b.B[3] s* op2b.B[3];
m4 = op1t.B[0] s* op2t.B[0];
m5 = op1t.B[1] s* op2t.B[1];
m6 = op1t.B[2] s* op2t.B[2];
m7 = op1t.B[3] s* op2t.B[3];

s0 = m0 + m1 + m2 + m3;
s1 = m4 + m5 + m6 + m7;
Rd = Rd + s0 + s1;
x=0

Parameters:
  • t[in] long long type of value stored in t

  • a[in] unsigned long long type of value stored in a

  • b[in] unsigned long long type of value stored in b

Returns:

value stored in long long type

__STATIC_FORCEINLINE long long __RV_DDSMAQA_SU (long long t, unsigned long long a, unsigned long long b)

DDSMAQA.SU (Eight Signed 8 x Unsigned 8 with 64-bit Add)

Type: DSP

Syntax:

DDSMAQA.SU Rd, Rs1, Rs2

Purpose

:

Do eight signed 8 x unsigned 8 and add signed multiplication results and a third register. The results are written into Rd.

Description

:

Do eight signed 8 x unsigned 8 and add signed multiplication results and a third register; and then adds the eight 16-bit results and the content of 64-bit chunks of a third register.

Operations:

op1t = Rs1.W[x+1]; op2t = Rs2.W[x+1]; // top
op1b = Rs1.W[x]; op2b = Rs2.W[x]; // bottom

m0 = op1b.B[0] su* op2b.B[0];
m1 = op1b.B[1] su* op2b.B[1];
m2 = op1b.B[2] su* op2b.B[2];
m3 = op1b.B[3] su* op2b.B[3];
m4 = op1t.B[0] su* op2t.B[0];
m5 = op1t.B[1] su* op2t.B[1];
m6 = op1t.B[2] su* op2t.B[2];
m7 = op1t.B[3] su* op2t.B[3];

s0 = m0 + m1 + m2 + m3;
s1 = m4 + m5 + m6 + m7;
Rd = Rd + s0 + s1;
x=0

Parameters:
  • t[in] long long type of value stored in t

  • a[in] unsigned long long type of value stored in a

  • b[in] unsigned long long type of value stored in b

Returns:

value stored in long long type

__STATIC_FORCEINLINE long long __RV_DDUMAQA (long long t, unsigned long long a, unsigned long long b)

DDUMAQA (Eight Unsigned 8x8 with 64-bit Unsigned Add)

Type: DSP

Syntax:

DDUMAQA Rd, Rs1, Rs2

Purpose

:

Do eight unsigned 8x8 and add unsigned multiplication results and a third register. The results are written into Rd.

Description

:

Do eight unsigned 8x8 and add unsigned multiplication results and a third register; and then adds the eight 16-bit results and the content of 64-bit chunks of a third register.

Operations:

op1t = Rs1.W[x+1]; op2t = Rs2.W[x+1]; // top
op1b = Rs1.W[x]; op2b = Rs2.W[x]; // bottom

m0 = op1b.B[0] u* op2b.B[0];
m1 = op1b.B[1] u* op2b.B[1];
m2 = op1b.B[2] u* op2b.B[2];
m3 = op1b.B[3] u* op2b.B[3];
m4 = op1t.B[0] u* op2t.B[0];
m5 = op1t.B[1] u* op2t.B[1];
m6 = op1t.B[2] u* op2t.B[2];
m7 = op1t.B[3] u* op2t.B[3];

s0 = m0 + m1 + m2 + m3;
s1 = m4 + m5 + m6 + m7;
Rd = Rd + s0 + s1;
x=0

Parameters:
  • t[in] long long type of value stored in t

  • a[in] unsigned long long type of value stored in a

  • b[in] unsigned long long type of value stored in b

Returns:

value stored in long long type

__STATIC_FORCEINLINE long __RV_DSMA32_U (unsigned long long a, unsigned long long b)

DSMA32.u (64-bit SIMD 32-bit Signed Multiply Addition With Rounding and Clip)

Type: DSP

Syntax:

DSMA32.u Rd, Rs1, Rs2

Purpose

:

Do two signed 32x32 and add signed multiplication results with Rounding, then right shift 32-bit and clip q63 to q31. The result is written to Rd.

Description

:

For the

DSMA32.u instruction, multiply the top 32-bit Q31 content of 64-bit chunks in Rs1 with the top 32-bit Q31 content of 64-bit chunks in Rs2. At the same time, multiply the bottom 32-bit Q31 content of 64-bit chunks in Rs1 with the bottom 32-bit Q31 content of 64-bit chunks in Rs2. Then, do the addtion for the results above and perform the addtional rounding operations, and then move the data to the right by 32-bit, and clip the 64-bit data into 32-bit.The result is written to Rd.

Operations:

Rd = (q31_t)((Rs1.W[x] s* Rs2.W[x] + Rs1.W[x + 1] s* Rs2.W[x + 1] + 0x80000000LL) s>> 32);
x=0

Parameters:
  • a[in] unsigned long long type of value stored in a

  • b[in] unsigned long long type of value stored in b

Returns:

value stored in long type

__STATIC_FORCEINLINE long __RV_DSMXS32_U (unsigned long long a, unsigned long long b)

DSMXS32.u (64-bit SIMD 32-bit Signed Multiply Cross Subtraction With Rounding and Clip)

Type: DSP

Syntax:

DSMXS32.u Rd, Rs1, Rs2

Purpose

:

Do two cross signed 32x32 and sub signed multiplication results with Rounding, then right shift 32-bit and clip q63 to q31. The result is written to Rd.

Description

:

For the

DSMXS32.u instruction, multiply the top 32-bit Q31 content of 64-bit chunks in Rs1 with the bottom 32-bit Q31 content of 64-bit chunks in Rs2. At the same time, multiply the bottom 32-bit Q31 content of 64-bit chunks in Rs1 with the top 32-bit Q31 content of 64-bit chunks in Rs2. Then, do the subtraction for the results above and perform the addtional rounding operations, and then move the data to the right by 32-bit, and clip the 64-bit data into 32-bit.The result is written to Rd.

Operations:

Rd = (q31_t)((Rs1.W[x + 1] s* Rs2.W[x] - Rs1.W[x] s* Rs2.W[x + 1] + 0x80000000LL) s>> 32);
x=0

Parameters:
  • a[in] unsigned long long type of value stored in a

  • b[in] unsigned long long type of value stored in b

Returns:

value stored in long type

__STATIC_FORCEINLINE long __RV_DSMXA32_U (unsigned long long a, unsigned long long b)

DSMXA32.u (64-bit SIMD 32-bit Signed Cross Multiply Addition with Rounding and Clip)

Type: DSP

Syntax:

DSMXA32.u Rd, Rs1, Rs2

Purpose

:

Do two cross signed 32x32 and add signed multiplication results with Rounding, then right shift 32-bit and clip q63 to q31. The result is written to Rd.

Description

:

For the

DSMXA32.u instruction,multiply the top 32-bit Q31 content of 64-bit chunks in Rs1 with the bottom 32-bit Q31 content of 64-bit chunks in Rs2. At the same time, multiply the bottom 32-bit Q31 content of 64-bit chunks in Rs1 with the top 32-bit Q31 content of 64-bit chunks in Rs2. Then, do the addtion for the results above and perform the addtional rounding operations, and then move the data to the right by 32-bit, and clip the 64-bit data into 32-bit.The result is written to Rd.

Operations:

Rd = (q31_t)((Rs1.W[x + 1] s* Rs2.W[x] + Rs1.W[x] s* Rs2.W[x + 1] + 0x80000000LL) s>> 32);
x=0

Parameters:
  • a[in] unsigned long long type of value stored in a

  • b[in] unsigned long long type of value stored in b

Returns:

value stored in long type

__STATIC_FORCEINLINE long __RV_DSMS32_U (unsigned long long a, unsigned long long b)

DSMS32.u (64-bit SIMD 32-bit Signed Multiply Subtraction with Rounding and Clip)

Type: DSP

Syntax:

DSMS32.u Rd, Rs1, Rs2

Purpose

:

Do two signed 32x32 and sub signed multiplication results with Rounding, then right shift 32-bit and clip q63 to q31. The result is written to Rd.

Description

:

For the

DSMS32.u instruction, multiply the bottom 32-bit Q31 content of 64-bit chunks in Rs1 with the bottom 32-bit Q31 content of 64-bit chunks in Rs2. At the same time, multiply the top 32-bit Q31 content of 64-bit chunks in Rs1 with the top 32-bit Q31 content of 64-bit chunks in Rs2. Then, do the subtraction for the results above and perform the addtional rounding operations, and then move the data to the right by 32-bit, and clip the 64-bit data into 32-bit.The result is written to Rd.

Operations:

Rd = (q31_t)((Rs1.W[x] s* Rs2.W[x] - Rs1.W[x + 1] s* Rs2.W[x + 1] + 0x80000000LL) s>> 32);
x=0

Parameters:
  • a[in] unsigned long long type of value stored in a

  • b[in] unsigned long long type of value stored in b

Returns:

value stored in long type

__STATIC_FORCEINLINE long __RV_DSMADA16 (long long t, unsigned long long a, unsigned long long b)

DSMADA16 (Signed Multiply Two Halfs and Two Adds 32-bit)

Type: SIMD

Syntax:

DSMADA16 Rd, Rs1, Rs2

Purpose

:

Do two signed 16-bit multiplications of two 32-bit registers; and then adds the 32-bit results and the 32-bit value of an even/odd pair of registers together.

  • DSMADA16: rt pair+ top*top + bottom*bottom

Description

:

This instruction multiplies the per 16-bit content of the 32-bit elements of Rs1 with the corresponding 16-bit content of the 32-bit elements of Rs2. The result is added to the 32-bit value of an even/odd pair of registers specified by Rd(4,1). The 32-bit addition result is written back to the register-pair. The 16-bit values of Rs1 and Rs2, and the 32-bit value of the register-pair are treated as signed integers.

Operations:

Mres0[0][31:0] = (Rs1.W[0].H[0] * Rs2.W[0].H[0]);
Mres1[0][31:0] = (Rs1.W[0].H[1] * Rs2.W[0].H[1]);
Mres0[1][31:0] = (Rs1.W[1].H[0] * Rs2.W[1].H[0]);
Mres1[1][31:0] = (Rs1.W[1].H[1] * Rs2.W[1].H[1]);
Rd.W = Rd.W + SE32(Mres0[0][31:0]) + SE32(Mres1[0][31:0]) + SE32(Mres0[1][31:0]) + SE32(Mres1[1][31:0]);

Parameters:
  • t[in] long long type of value stored in t

  • a[in] unsigned long long type of value stored in a

  • b[in] unsigned long long type of value stored in b

Returns:

value stored in long type

__STATIC_FORCEINLINE long __RV_DSMAXDA16 (long long t, unsigned long long a, unsigned long long b)

DSMAXDA16 (Signed Crossed Multiply Two Halfs and Two Adds 32-bit)

Type: SIMD

Syntax:

DSMAXDA16 Rd, Rs1, Rs2

Purpose

:

Do two signed 16-bit multiplications of two 32-bit registers; and then adds the 32-bit results and the 32-bit value of an even/odd pair of registers together.

  • DSMAXDA: rt pair+ top*bottom + bottom*top (all 32-bit elements)

Description

:

This instruction crossly multiplies the top 16-bit content of the 32-bit elements of Rs1 with the bottom 16-bit content of the 32-bit elements of Rs2 and then adds the result to the result of multiplying the bottom 16-bit content of the 32-bit elements of Rs1 with the top 16-bit content of the 32-bit elements of Rs2 with unlimited precision. The result is added to the 64-bit value of an even/odd pair of registers specified by Rd(4,1).The 64-bit addition result is clipped to 32-bit result.

Operations:

Mres0[0][31:0] = (Rs1.W[0].H[0] * Rs2.W[0].H[1]);
Mres1[0][31:0] = (Rs1.W[0].H[1] * Rs2.W[0].H[0]);
Mres0[1][31:0] = (Rs1.W[1].H[0] * Rs2.W[1].H[1]);
Mres1[1][31:0] = (Rs1.W[1].H[1] * Rs2.W[1].H[0]);
Rd.W = Rd.W + SE32(Mres0[0][31:0]) + SE32(Mres1[0][31:0]) + SE32(Mres0[1][31:0]) + SE32(Mres1[1][31:0]);

Parameters:
  • t[in] long long type of value stored in t

  • a[in] unsigned long long type of value stored in a

  • b[in] unsigned long long type of value stored in b

Returns:

value stored in long type

__STATIC_FORCEINLINE unsigned long long __RV_DKSMS32_U (unsigned long long t, unsigned long long a, unsigned long long b)

DKSMS32.u (Two Signed Multiply Shift-clip and Saturation with Rounding)

Type: SIMD

Syntax:

DKSMS32.u Rd, Rs1, Rs2

Purpose

:

Computes saturated multiplication of two pairs of q31 type with shifted rounding.

Description

:

Compute the multiplication of Rs1 and Rs2 of type q31_t, intercept [47:16] for the resulting 64-bit product to get the 32-bit number, then add 1 to it to do rounding, and finally saturate the result after rounding.

Operations:

Mres[x][63:0] = Rs1.W[x] s* Rs2.W[x];
Round[x][32:0] = Mres[x][47:15] + 1;
Rd.W[x] = sat.31(Rd.W[x] + Round[x][32:1]);
x=1...0

Parameters:
  • t[in] unsigned long long type of value stored in t

  • a[in] unsigned long long type of value stored in a

  • b[in] unsigned long long type of value stored in b

Returns:

value stored in unsigned long long type

__STATIC_FORCEINLINE long __RV_DMADA32 (long long t, unsigned long long a, unsigned long long b)

DMADA32 ((Two Cross Signed 32x32 with 64-bit Add and Clip to 32-bit)

Type: SIMD

Syntax:

DMADA32 Rd, Rs1, Rs2

Purpose

:

Do two cross signed 32x32 and add the signed multiplication results to q63, then clip the q63 result to q31 , the final results are written into Rd.

Description

:

For the

DMADA32 instruction, it multiplies the top 32-bit element in Rs1 with the bottom 32-bit element in Rs2 and then adds the result to the result of multiplying the bottom 32-bit element in Rs1 with the top 32-bit element in Rs2, then clip the q63 result to q31.

Operations:

res = (q31_t)((((q63_t) Rd.w[0] << 32) + (q63_t)Rs1.w[0] s*  Rs2.w[1] + (q63_t)Rs1.w[1] s*  Rs2.w[0]) s>> 32);
rd = res;

Parameters:
  • t[in] long long type of value stored in t

  • a[in] unsigned long long type of value stored in a

  • b[in] unsigned long long type of value stored in b

Returns:

value stored in long type

__STATIC_FORCEINLINE long long __RV_DSMALBB (long long t, unsigned long long a, unsigned long long b)

DSMALBB (Signed Multiply Bottom Halfs & Add 64-bit)

Type: SIMD

Syntax:

DSMALBB Rd, Rs1, Rs2

Purpose

:

Multiply the signed 16-bit content of the 32-bit elements of a register with the 16-bit content of the corresponding 32-bit elements of another register and add the results with a 64-bit value of an even/odd pair of registers. The addition result is written back to the register-pair.

  • DSMALBB: rt pair + bottom*bottom (all 32-bit elements)

Description

:

For the

DSMALBB instruction, it multiplies the bottom 16-bit content of Rs1 with the bottom 16-bit content of Rs2.The multiplication results are added with the 64-bit value of Rd. The 64-bit addition result is written back to Rd.

Operations:

Mres[0][31:0] = Rs1.W[0].H[0] * Rs2.W[0].H[0];
Mres[1][31:0] = Rs1.W[1].H[0] * Rs2.W[1].H[0];
Rd = Rd + SE64(Mres[0][31:0]) + SE64(Mres[1][31:0]);

Parameters:
  • t[in] long long type of value stored in t

  • a[in] unsigned long long type of value stored in a

  • b[in] unsigned long long type of value stored in b

Returns:

value stored in long long type

__STATIC_FORCEINLINE long long __RV_DSMALBT (long long t, unsigned long long a, unsigned long long b)

DSMALBT (Signed Multiply Bottom Half & Top Half & Add 64-bit)

Type: SIMD

Syntax:

DSMALBT Rd, Rs1, Rs2

Purpose

:

Multiply the signed 16-bit content of the 32-bit elements of a register with the 16-bit content of the corresponding 32-bit elements of another register and add the results with a 64-bit value of an even/odd pair of registers. The addition result is written back to the register-pair.

  • DSMALBT: rt pair + bottom*top (all 32-bit elements)

Description

:

For the

DSMALBT instruction, it multiplies the bottom 16-bit content of the 32-bit elements of Rs1 with the top 16-bit content of the 32-bit elements of Rs2. The multiplication results are added with the 64-bit value of Rd. The 64-bit addition result is written back to Rd. The 16-bit values of Rs1 and Rs2, and the 64-bit value of Rd are treated as signed integers

Operations:

Mres[0][31:0] = Rs1.W[0].H[0] * Rs2.W[0].H[1];
Mres[1][31:0] = Rs1.W[1].H[0] * Rs2.W[1].H[1];
Rd = Rd + SE64(Mres[0][31:0]) + SE64(Mres[1][31:0]);

Parameters:
  • t[in] long long type of value stored in t

  • a[in] unsigned long long type of value stored in a

  • b[in] unsigned long long type of value stored in b

Returns:

value stored in long long type

__STATIC_FORCEINLINE long long __RV_DSMALTT (long long t, unsigned long long a, unsigned long long b)

DSMALTT (Signed Multiply Top Half & Add 64-bit)

Type: SIMD

Syntax:

DSMALTT Rd, Rs1, Rs2

Purpose

:

Multiply the signed 16-bit content of the 32-bit elements of a register with the 16-bit content of the corresponding 32-bit elements of another register and add the results with a 64-bit value of an even/odd pair of registers. The addition result is written back to the register-pair.

  • DSMALTT: DSMALTT rt pair + top*top (all 32-bit elements)

Description

:

For the

DSMALTT instruction, it multiplies the top 16-bit content of the 32-bit elements of Rs1 with the top 16-bit content of the 32-bit elements of Rs2. The multiplication results are added with the 64-bit value of Rd. The 64-bit addition result is written back to Rd. The 16-bit values of Rs1 and Rs2, and the 64-bit value of Rd are treated as signed integers.

Operations:

Mres[0][31:0] = Rs1.W[0].H[1] * Rs2.W[0].H[1];
Mres[1][31:0] = Rs1.W[1].H[1] * Rs2.W[1].H[1];
Rd = Rd + SE64(Mres[0][31:0]) + SE64(Mres[1][31:0]);

Parameters:
  • t[in] long long type of value stored in t

  • a[in] unsigned long long type of value stored in a

  • b[in] unsigned long long type of value stored in b

Returns:

value stored in long long type

__STATIC_FORCEINLINE long long __RV_DKMABB32 (long long t, unsigned long long a, unsigned long long b)

DKMABB32 (Saturating Signed Multiply Bottom Words & Add)

Type: SIMD

Syntax:

DKMABB32 Rd, Rs1, Rs2

Purpose

:

Multiply the signed 32-bit element in a register with the 32-bit element in another register and add the result to the content of 64-bit data in the third register. The addition result may besaturated and is written to the third register.

  • DKMABB32: rd + bottom*bottom

Description

:

For the

DKMABB32 instruction, it multiplies the bottom 32-bit element in Rs1 with the bottom 32-bit element in Rs2 The multiplication result is added to the content of 64-bit data in Rd. If the addition result is beyond the Q63 number range (-2^63 <= Q63 <= 2^63-1), it is saturated to the range and the OV bit is set to 1. The result after saturation is written to Rd. The 32-bit contents of Rs1 and Rs2 are treated as signed integers.

Operations:

res = Rd + (Rs1.W[0] * Rs2.W[0]);
if (res > (2^63)-1) {
  res = (2^63)-1;
  OV = 1;
} else if (res < -2^63) {
  res = -2^63;
  OV = 1;
}
Rd = res;

Parameters:
  • t[in] long long type of value stored in t

  • a[in] unsigned long long type of value stored in a

  • b[in] unsigned long long type of value stored in b

Returns:

value stored in long long type

__STATIC_FORCEINLINE long long __RV_DKMABT32 (long long t, unsigned long long a, unsigned long long b)

DKMABT32 (Saturating Signed Multiply Bottom & Top Words & Add)

Type: SIMD

Syntax:

DKMABT32 Rd, Rs1, Rs2

Purpose

:

Multiply the signed 32-bit element in a register with the 32-bit element in another register and add the result to the content of 64-bit data in the third register. The addition result may be saturated and is written to the third register.

  • DKMABT32: rd + bottom*top

Description

:

For the

DKMABT32 instruction, it multiplies the bottom 32-bit element in Rs1 with the top 32-bit element in Rs2 The multiplication result is added to the content of 64-bit data in Rd. If the addition result is beyond the Q63 number range (-2^63 <= Q63 <= 2^63-1), it is saturated to the range and the OV bit is set to 1. The result after saturation is written to Rd. The 32-bit contents of Rs1 and Rs2 are treated as signed integers.

Operations:

res = Rd + (Rs1.W[0] * Rs2.W[1]);
if (res > (2^63)-1) {
  res = (2^63)-1;
  OV = 1;
} else if (res < -2^63) {
  res = -2^63;
  OV = 1;
}
Rd = res;

Parameters:
  • t[in] long long type of value stored in t

  • a[in] unsigned long long type of value stored in a

  • b[in] unsigned long long type of value stored in b

Returns:

value stored in long long type

__STATIC_FORCEINLINE long long __RV_DKMATT32 (long long t, unsigned long long a, unsigned long long b)

DKMATT32 (Saturating Signed Multiply Bottom & Top Words & Add)

Type: SIMD

Syntax:

DKMATT32 Rd, Rs1, Rs2

Purpose

:

Multiply the signed 32-bit element in a register with the 32-bit element in another register and add the result to the content of 64-bit data in the third register. The addition result may be saturated and is written to the third register.

  • DKMATT32: rd + top*top

Description

:

For the

DKMATT32 instruction, it multiplies the top 32-bit element in Rs1 with the top 32-bit element in Rs2 The multiplication result is added to the content of 64-bit data in Rd. If the addition result is beyond the Q63 number range (-2^63 <= Q63 <= 2^63-1), it is saturated to the range and the OV bit is set to 1. The result after saturation is written to Rd. The 32-bit contents of Rs1 and Rs2 are treated as signed integers.

Operations:

res = Rd + (Rs1.W[1] * Rs2.W[1]);
if (res > (2^63)-1) {
  res = (2^63)-1;
  OV = 1;
} else if (res < -2^63) {
  res = -2^63;
  OV = 1;
}
Rd = res;

Parameters:
  • t[in] long long type of value stored in t

  • a[in] unsigned long long type of value stored in a

  • b[in] unsigned long long type of value stored in b

Returns:

value stored in unsigned long long type