Nuclei Customized DSP InstructionsΒΆ

__STATIC_FORCEINLINE unsigned long long __RV_DKHM8(unsigned long long a, unsigned long long b)
__STATIC_FORCEINLINE unsigned long long __RV_DKHM16(unsigned long long a, unsigned long long b)
__STATIC_FORCEINLINE unsigned long long __RV_DKABS8(unsigned long long a)
__STATIC_FORCEINLINE unsigned long long __RV_DKABS16(unsigned long long a)
__STATIC_FORCEINLINE unsigned long long __RV_DKSLRA8(unsigned long long a, int b)
__STATIC_FORCEINLINE unsigned long long __RV_DKSLRA16(unsigned long long a, int b)
__STATIC_FORCEINLINE unsigned long long __RV_DKADD8(unsigned long long a, unsigned long long b)
__STATIC_FORCEINLINE unsigned long long __RV_DKADD16(unsigned long long a, unsigned long long b)
__STATIC_FORCEINLINE unsigned long long __RV_DKSUB8(unsigned long long a, unsigned long long b)
__STATIC_FORCEINLINE unsigned long long __RV_DKSUB16(unsigned long long a, unsigned long long b)
__STATIC_FORCEINLINE unsigned long __RV_EXPD80(unsigned long a)
__STATIC_FORCEINLINE unsigned long __RV_EXPD81(unsigned long a)
__STATIC_FORCEINLINE unsigned long __RV_EXPD82(unsigned long a)
__STATIC_FORCEINLINE unsigned long __RV_EXPD83(unsigned long a)
group NMSIS_Core_DSP_Intrinsic_NUCLEI_CUSTOM

(RV32 only)Nuclei Customized DSP Instructions

This is Nuclei customized DSP instructions only for RV32

Functions

__STATIC_FORCEINLINE unsigned long long __RV_DKHM8(unsigned long long a, unsigned long long b)

DKHM8 (64-bit SIMD Signed Saturating Q7 Multiply)

Type: SIMD

Syntax:

DKHM8 Rd, Rs1, Rs2
# Rd, Rs1, Rs2 are all even/odd pair of registers

Purpose:

Do Q7xQ7 element multiplications simultaneously. The Q14 results are then reduced to Q7 numbers again.

Description:

For the

DKHM8 instruction, multiply the top 8-bit Q7 content of 16-bit chunks in Rs1 with the top 8-bit Q7 content of 16-bit chunks in Rs2. At the same time, multiply the bottom 8-bit Q7 content of 16-bit chunks in Rs1 with the bottom 8-bit Q7 content of 16-bit chunks in Rs2.

The Q14 results are then right-shifted 7-bits and saturated into Q7 values. The Q7 results are then written into Rd. When both the two Q7 inputs of a multiplication are 0x80, saturation will happen. The result will be saturated to 0x7F and the overflow flag OV will be set.

Operations:

op1t = Rs1.B[x+1]; op2t = Rs2.B[x+1]; // top
op1b = Rs1.B[x]; op2b = Rs2.B[x]; // bottom
for ((aop,bop,res) in [(op1t,op2t,rest), (op1b,op2b,resb)]) {
  if (0x80 != aop | 0x80 != bop) {
    res = (aop s* bop) >> 7;
  } else {
    res= 0x7F;
    OV = 1;
  }
}
Rd.H[x/2] = concat(rest, resb);
for RV32, x=0,2,4,6

Return

value stored in unsigned long long type

Parameters
  • [in] a: unsigned long long type of value stored in a

  • [in] b: unsigned long long type of value stored in b

__STATIC_FORCEINLINE unsigned long long __RV_DKHM16(unsigned long long a, unsigned long long b)

DKHM16 (64-bit SIMD Signed Saturating Q15 Multiply)

Type: SIMD

Syntax:

DKHM16 Rd, Rs1, Rs2
# Rd, Rs1, Rs2 are all even/odd pair of registers

Purpose:

Do Q15xQ15 element multiplications simultaneously. The Q30 results are then reduced to Q15 numbers again.

Description:

For the

DKHM16 instruction, multiply the top 16-bit Q15 content of 32-bit chunks in Rs1 with the top 16-bit Q15 content of 32-bit chunks in Rs2. At the same time, multiply the bottom 16-bit Q15 content of 32-bit chunks in Rs1 with the bottom 16-bit Q15 content of 32-bit chunks in Rs2.

The Q30 results are then right-shifted 15-bits and saturated into Q15 values. The Q15 results are then written into Rd. When both the two Q15 inputs of a multiplication are 0x8000, saturation will happen. The result will be saturated to 0x7FFF and the overflow flag OV will be set.

Operations:

op1t = Rs1.H[x+1]; op2t = Rs2.H[x+1]; // top
op1b = Rs1.H[x]; op2b = Rs2.H[x]; // bottom
for ((aop,bop,res) in [(op1t,op2t,rest), (op1b,op2b,resb)]) {
  if (0x8000 != aop | 0x8000 != bop) {
    res = (aop s* bop) >> 15;
  } else {
    res= 0x7FFF;
    OV = 1;
  }
}
Rd.W[x/2] = concat(rest, resb);
for RV32: x=0, 2

Return

value stored in unsigned long long type

Parameters
  • [in] a: unsigned long long type of value stored in a

  • [in] b: unsigned long long type of value stored in b

__STATIC_FORCEINLINE unsigned long long __RV_DKABS8(unsigned long long a)

DKABS8 (64-bit SIMD 8-bit Saturating Absolute)

Type: SIMD

Syntax:

DKABS8 Rd, Rs1
# Rd, Rs1 are all even/odd pair of registers

Purpose:

Get the absolute value of 8-bit signed integer elements simultaneously.

Description:

This instruction calculates the absolute value of 8-bit signed integer elements stored in Rs1 and writes the element results to Rd. If the input number is 0x80, this instruction generates 0x7f as the output and sets the OV bit to 1.

Operations:

src = Rs1.B[x];
if (src == 0x80) {
  src = 0x7f;
  OV = 1;
} else if (src[7] == 1)
  src = -src;
}
Rd.B[x] = src;
for RV32: x=7...0,

Return

value stored in unsigned long long type

Parameters
  • [in] a: unsigned long long type of value stored in a

__STATIC_FORCEINLINE unsigned long long __RV_DKABS16(unsigned long long a)

DKABS16 (64-bit SIMD 16-bit Saturating Absolute)

Type: SIMD

Syntax:

DKABS16 Rd, Rs1
# Rd, Rs1 are all even/odd pair of registers

Purpose:

Get the absolute value of 16-bit signed integer elements simultaneously.

Description:

This instruction calculates the absolute value of 16-bit signed integer elements stored in Rs1 and writes the element results to Rd. If the input number is 0x8000, this instruction generates 0x7fff as the output and sets the OV bit to 1.

Operations:

src = Rs1.H[x];
if (src == 0x8000) {
  src = 0x7fff;
  OV = 1;
} else if (src[15] == 1)
  src = -src;
}
Rd.H[x] = src;
for RV32: x=3...0,

Return

value stored in unsigned long long type

Parameters
  • [in] a: unsigned long long type of value stored in a

__STATIC_FORCEINLINE unsigned long long __RV_DKSLRA8(unsigned long long a, int b)

DKSLRA8 (64-bit SIMD 8-bit Shift Left Logical with Saturation or Shift Right Arithmetic)

Type: SIMD

Syntax:

DKSLRA8 Rd, Rs1, Rs2
# Rd, Rs1 are all even/odd pair of registers

Purpose:

Do 8-bit elements logical left (positive) or arithmetic right (negative) shift operation with Q7 saturation for the left shift.

Description:

The 8-bit data elements of Rs1 are left-shifted logically or right-shifted arithmetically based on the value of Rs2[3:0]. Rs2[3:0] is in the signed range of [-2^3, 2^3-1]. A positive Rs2[3:0] means logical left shift and a negative Rs2[3:0] means arithmetic right shift. The shift amount is the absolute value of Rs2[3:0]. However, the behavior of

Rs2[3:0]==-2^3 (0x8) is defined to be equivalent to the behavior of Rs2[3:0]==-(2^3-1) (0x9). The left-shifted results are saturated to the 8-bit signed integer range of [-2^7, 2^7-1]. If any saturation happens, this instruction sets the OV flag. The value of Rs2[31:4] will not affect this instruction.

Operations:

if (Rs2[3:0] < 0) {
  sa = -Rs2[3:0];
  sa = (sa == 8)? 7 : sa;
  Rd.B[x] = SE8(Rs1.B[x][7:sa]);
} else {
  sa = Rs2[2:0];
  res[(7+sa):0] = Rs1.B[x] <<(logic) sa;
  if (res > (2^7)-1) {
    res[7:0] = 0x7f; OV = 1;
  } else if (res < -2^7) {
    res[7:0] = 0x80; OV = 1;
  }
  Rd.B[x] = res[7:0];
}
for RV32: x=7...0,

Return

value stored in unsigned long long type

Parameters
  • [in] a: unsigned long long type of value stored in a

  • [in] b: int type of value stored in b

__STATIC_FORCEINLINE unsigned long long __RV_DKSLRA16(unsigned long long a, int b)

DKSLRA16 (64-bit SIMD 16-bit Shift Left Logical with Saturation or Shift Right Arithmetic)

Type: SIMD

Syntax:

DKSLRA16 Rd, Rs1, Rs2
# Rd, Rs1 are all even/odd pair of registers

Purpose:

Do 16-bit elements logical left (positive) or arithmetic right (negative) shift operation with Q15 saturation for the left shift.

Description:

The 16-bit data elements of Rs1 are left-shifted logically or right-shifted arithmetically based on the value of Rs2[4:0]. Rs2[4:0] is in the signed range of [-2^4, 2^4-1]. A positive Rs2[4:0] means logical left shift and a negative Rs2[4:0] means arithmetic right shift. The shift amount is the absolute value of Rs2[4:0]. However, the behavior of

Rs2[4:0]==-2^4 (0x10) is defined to be equivalent to the behavior of Rs2[4:0]==-(2^4-1) (0x11). The left-shifted results are saturated to the 16-bit signed integer range of [-2^15, 2^15-1]. After the shift, saturation, or rounding, the final results are written to Rd. If any saturation happens, this instruction sets the OV flag. The value of Rs2[31:5] will not affect this instruction.

Operations:

if (Rs2[4:0] < 0) {
  sa = -Rs2[4:0];
  sa = (sa == 16)? 15 : sa;
  Rd.H[x] = SE16(Rs1.H[x][15:sa]);
} else {
  sa = Rs2[3:0];
  res[(15+sa):0] = Rs1.H[x] <<(logic) sa;
  if (res > (2^15)-1) {
    res[15:0] = 0x7fff; OV = 1;
  } else if (res < -2^15) {
    res[15:0] = 0x8000; OV = 1;
  }
  d.H[x] = res[15:0];
}
for RV32: x=3...0,

Return

value stored in unsigned long long type

Parameters
  • [in] a: unsigned long long type of value stored in a

  • [in] b: int type of value stored in b

__STATIC_FORCEINLINE unsigned long long __RV_DKADD8(unsigned long long a, unsigned long long b)

DKADD8 (64-bit SIMD 8-bit Signed Saturating Addition)

Type: SIMD

Syntax:

DKADD8 Rd, Rs1, Rs2
# Rd, Rs1, Rs2 are all even/odd pair of registers

Purpose:

Do 8-bit signed integer element saturating additions simultaneously.

Description:

This instruction adds the 8-bit signed integer elements in Rs1 with the 8-bit signed integer elements in Rs2. If any of the results are beyond the Q7 number range (-2^7 <= Q7 <= 2^7-1), they are saturated to the range and the OV bit is set to 1. The saturated results are written to Rd.

Operations:

res[x] = Rs1.B[x] + Rs2.B[x];
if (res[x] > 127) {
  res[x] = 127;
  OV = 1;
} else if (res[x] < -128) {
  res[x] = -128;
  OV = 1;
}
Rd.B[x] = res[x];
for RV32: x=7...0,

Return

value stored in unsigned long long type

Parameters
  • [in] a: unsigned long long type of value stored in a

  • [in] b: unsigned long long type of value stored in b

__STATIC_FORCEINLINE unsigned long long __RV_DKADD16(unsigned long long a, unsigned long long b)

DKADD16 (64-bit SIMD 16-bit Signed Saturating Addition)

Type: SIMD

Syntax:

DKADD16 Rd, Rs1, Rs2
# Rd, Rs1, Rs2 are all even/odd pair of registers

Purpose:

Do 16-bit signed integer element saturating additions simultaneously.

Description:

This instruction adds the 16-bit signed integer elements in Rs1 with the 16-bit signed integer elements in Rs2. If any of the results are beyond the Q15 number range (-2^15 <= Q15 <= 2^15-1), they are saturated to the range and the OV bit is set to 1. The saturated results are written to Rd.

Operations:

res[x] = Rs1.H[x] + Rs2.H[x];
if (res[x] > 32767) {
  res[x] = 32767;
  OV = 1;
} else if (res[x] < -32768) {
  res[x] = -32768;
  OV = 1;
}
Rd.H[x] = res[x];
for RV32: x=3...0,

Return

value stored in unsigned long long type

Parameters
  • [in] a: unsigned long long type of value stored in a

  • [in] b: unsigned long long type of value stored in b

__STATIC_FORCEINLINE unsigned long long __RV_DKSUB8(unsigned long long a, unsigned long long b)

DKSUB8 (64-bit SIMD 8-bit Signed Saturating Subtraction)

Type: SIMD

Syntax:

DKSUB8 Rd, Rs1, Rs2
# Rd, Rs1, Rs2 are all even/odd pair of registers

Purpose:

Do 8-bit signed elements saturating subtractions simultaneously.

Description:

This instruction subtracts the 8-bit signed integer elements in Rs2 from the 8-bit signed integer elements in Rs1. If any of the results are beyond the Q7 number range (-2^7 <= Q7 <= 2^7-1), they are saturated to the range and the OV bit is set to 1. The saturated results are written to Rd.

Operations:

res[x] = Rs1.B[x] - Rs2.B[x];
if (res[x] > (2^7)-1) {
  res[x] = (2^7)-1;
  OV = 1;
} else if (res[x] < -2^7) {
  res[x] = -2^7;
  OV = 1;
}
Rd.B[x] = res[x];
for RV32: x=7...0,

Return

value stored in unsigned long long type

Parameters
  • [in] a: unsigned long long type of value stored in a

  • [in] b: unsigned long long type of value stored in b

__STATIC_FORCEINLINE unsigned long long __RV_DKSUB16(unsigned long long a, unsigned long long b)

DKSUB16 (64-bit SIMD 16-bit Signed Saturating Subtraction)

Type: SIMD

Syntax:

DKSUB16 Rd, Rs1, Rs2
# Rd, Rs1, Rs2 are all even/odd pair of registers

Purpose:

Do 16-bit signed integer elements saturating subtractions simultaneously.

Description:

This instruction subtracts the 16-bit signed integer elements in Rs2 from the 16-bit signed integer elements in Rs1. If any of the results are beyond the Q15 number range (-2^15 <= Q15 <= 2^15-1), they are saturated to the range and the OV bit is set to 1. The saturated results are written to Rd.

Operations:

res[x] = Rs1.H[x] - Rs2.H[x];
if (res[x] > (2^15)-1) {
  res[x] = (2^15)-1;
  OV = 1;
} else if (res[x] < -2^15) {
  res[x] = -2^15;
  OV = 1;
}
Rd.H[x] = res[x];
for RV32: x=3...0,

Return

value stored in unsigned long long type

Parameters
  • [in] a: unsigned long long type of value stored in a

  • [in] b: unsigned long long type of value stored in b

__STATIC_FORCEINLINE unsigned long __RV_EXPD80(unsigned long a)

EXPD80 (Expand and Copy Byte 0 to 32bit)

Type: DSP

Syntax:

EXPD80 Rd, Rs1

Purpose:

Copy 8-bit data from 32-bit chunks into 4 bytes in a register.

Description:

Moves Rs1.B[0][7:0] to Rd.[0][7:0], Rd.[1][7:0], Rd.[2][7:0], Rd.[3][7:0]

Operations:

Rd.W[x][31:0] = CONCAT(Rs1.B[0][7:0], Rs1.B[0][7:0], Rs1.B[0][7:0], Rs1.B[0][7:0]);
for RV32: x=0

Return

value stored in unsigned long type

Parameters
  • [in] a: unsigned long type of value stored in a

__STATIC_FORCEINLINE unsigned long __RV_EXPD81(unsigned long a)

EXPD81 (Expand and Copy Byte 1 to 32bit)

Type: DSP

Syntax:

EXPD81 Rd, Rs1

Purpose:

Copy 8-bit data from 32-bit chunks into 4 bytes in a register.

Description:

Moves Rs1.B[1][7:0] to Rd.[0][7:0], Rd.[1][7:0], Rd.[2][7:0], Rd.[3][7:0]

Operations:

Rd.W[x][31:0] = CONCAT(Rs1.B[1][7:0], Rs1.B[1][7:0], Rs1.B[1][7:0], Rs1.B[1][7:0]);
for RV32: x=0

Return

value stored in unsigned long type

Parameters
  • [in] a: unsigned long type of value stored in a

__STATIC_FORCEINLINE unsigned long __RV_EXPD82(unsigned long a)

EXPD82 (Expand and Copy Byte 2 to 32bit)

Type: DSP

Syntax:

EXPD82 Rd, Rs1

Purpose:

Copy 8-bit data from 32-bit chunks into 4 bytes in a register.

Description:

Moves Rs1.B[2][7:0] to Rd.[0][7:0], Rd.[1][7:0], Rd.[2][7:0], Rd.[3][7:0]

Operations:

Rd.W[x][31:0] = CONCAT(Rs1.B[2][7:0], Rs1.B[2][7:0], Rs1.B[2][7:0], Rs1.B[2][7:0]);
for RV32: x=0

Return

value stored in unsigned long type

Parameters
  • [in] a: unsigned long type of value stored in a

__STATIC_FORCEINLINE unsigned long __RV_EXPD83(unsigned long a)

EXPD83 (Expand and Copy Byte 3 to 32bit)

Type: DSP

Syntax:

EXPD83 Rd, Rs1

Purpose:

Copy 8-bit data from 32-bit chunks into 4 bytes in a register.

Description:

Moves Rs1.B[3][7:0] to Rd.[0][7:0], Rd.[1][7:0], Rd.[2][7:0], Rd.[3][7:0]

Operations:

Rd.W[x][31:0] = CONCAT(Rs1.B[3][7:0], Rs1.B[3][7:0], Rs1.B[3][7:0], Rs1.B[3][7:0]);
for RV32: x=0

Return

value stored in unsigned long type

Parameters
  • [in] a: unsigned long type of value stored in a