SIMD 16-bit Multiply Instructions. More...

Functions
__STATIC_FORCEINLINE unsigned long	__RV_KHM16 (unsigned long a, unsigned long b)
	KHM16 (SIMD Signed Saturating Q15 Multiply) More...

__STATIC_FORCEINLINE unsigned long	__RV_KHMX16 (unsigned long a, unsigned long b)
	KHMX16 (SIMD Signed Saturating Crossed Q15 Multiply) More...

__STATIC_FORCEINLINE unsigned long long	__RV_SMUL16 (unsigned int a, unsigned int b)
	SMUL16 (SIMD Signed 16-bit Multiply) More...

__STATIC_FORCEINLINE unsigned long long	__RV_SMULX16 (unsigned int a, unsigned int b)
	SMULX16 (SIMD Signed Crossed 16-bit Multiply) More...

__STATIC_FORCEINLINE unsigned long long	__RV_UMUL16 (unsigned int a, unsigned int b)
	UMUL16 (SIMD Unsigned 16-bit Multiply) More...

__STATIC_FORCEINLINE unsigned long long	__RV_UMULX16 (unsigned int a, unsigned int b)
	UMULX16 (SIMD Unsigned Crossed 16-bit Multiply) More...

Detailed Description

SIMD 16-bit Multiply Instructions.

there are 6 SIMD 16-bit Multiply instructions.

Function Documentation

◆ __RV_KHM16()

__STATIC_FORCEINLINE unsigned long __RV_KHM16	(	unsigned long	a,
		unsigned long	b
	)

KHM16 (SIMD Signed Saturating Q15 Multiply)

Type: SIMD

Syntax:

KHM16 Rd, Rs1, Rs2

KHMX16 Rd, Rs1, Rs2

Purpose:
Do Q15xQ15 element multiplications simultaneously. The Q30 results are then reduced to Q15 numbers again.

Description:
For the KHM16 instruction, multiply the top 16-bit Q15 content of 32-bit chunks in Rs1 with the top 16-bit Q15 content of 32-bit chunks in Rs2. At the same time, multiply the bottom 16-bit Q15 content of 32-bit chunks in Rs1 with the bottom 16-bit Q15 content of 32-bit chunks in Rs2. For the KHMX16 instruction, multiply the top 16-bit Q15 content of 32-bit chunks in Rs1 with the bottom 16-bit Q15 content of 32-bit chunks in Rs2. At the same time, multiply the bottom 16-bit Q15 content of 32-bit chunks in Rs1 with the top 16-bit Q15 content of 32-bit chunks in Rs2. The Q30 results are then right-shifted 15-bits and saturated into Q15 values. The Q15 results are then written into Rd. When both the two Q15 inputs of a multiplication are 0x8000, saturation will happen. The result will be saturated to 0x7FFF and the overflow flag OV will be set.

Operations:

if (is `KHM16`) {
  op1t = Rs1.H[x+1]; op2t = Rs2.H[x+1]; // top
  op1b = Rs1.H[x]; op2b = Rs2.H[x]; // bottom
} else if (is `KHMX16`) {
  op1t = Rs1.H[x+1]; op2t = Rs2.H[x]; // Rs1 top
  op1b = Rs1.H[x]; op2b = Rs2.H[x+1]; // Rs1 bottom
}
for ((aop,bop,res) in [(op1t,op2t,rest), (op1b,op2b,resb)]) {
  if (0x8000 != aop | 0x8000 != bop) {
    res = (aop s* bop) >> 15;
  } else {
    res= 0x7FFF;
    OV = 1;
  }
}
Rd.W[x/2] = concat(rest, resb);
for RV32: x=0
for RV64: x=0,2

Parameters

[in]	a	unsigned long type of value stored in a
[in]	b	unsigned long type of value stored in b

Returns: value stored in unsigned long type

Definition at line 2419 of file core_feature_dsp.h.

 {
     unsigned long result;
     __ASM volatile("khm16 %0, %1, %2" : "=r"(result) : "r"(a), "r"(b));
     return result;
 }

References __ASM.

◆ __RV_KHMX16()

__STATIC_FORCEINLINE unsigned long __RV_KHMX16	(	unsigned long	a,
		unsigned long	b
	)

KHMX16 (SIMD Signed Saturating Crossed Q15 Multiply)

Type: SIMD

Syntax:

KHM16 Rd, Rs1, Rs2

KHMX16 Rd, Rs1, Rs2

Purpose:
Do Q15xQ15 element multiplications simultaneously. The Q30 results are then reduced to Q15 numbers again.

Description:
For the KHM16 instruction, multiply the top 16-bit Q15 content of 32-bit chunks in Rs1 with the top 16-bit Q15 content of 32-bit chunks in Rs2. At the same time, multiply the bottom 16-bit Q15 content of 32-bit chunks in Rs1 with the bottom 16-bit Q15 content of 32-bit chunks in Rs2. For the KHMX16 instruction, multiply the top 16-bit Q15 content of 32-bit chunks in Rs1 with the bottom 16-bit Q15 content of 32-bit chunks in Rs2. At the same time, multiply the bottom 16-bit Q15 content of 32-bit chunks in Rs1 with the top 16-bit Q15 content of 32-bit chunks in Rs2. The Q30 results are then right-shifted 15-bits and saturated into Q15 values. The Q15 results are then written into Rd. When both the two Q15 inputs of a multiplication are 0x8000, saturation will happen. The result will be saturated to 0x7FFF and the overflow flag OV will be set.

Operations:

if (is `KHM16`) {
  op1t = Rs1.H[x+1]; op2t = Rs2.H[x+1]; // top
  op1b = Rs1.H[x]; op2b = Rs2.H[x]; // bottom
} else if (is `KHMX16`) {
  op1t = Rs1.H[x+1]; op2t = Rs2.H[x]; // Rs1 top
  op1b = Rs1.H[x]; op2b = Rs2.H[x+1]; // Rs1 bottom
}
for ((aop,bop,res) in [(op1t,op2t,rest), (op1b,op2b,resb)]) {
  if (0x8000 != aop | 0x8000 != bop) {
    res = (aop s* bop) >> 15;
  } else {
    res= 0x7FFF;
    OV = 1;
  }
}
Rd.W[x/2] = concat(rest, resb);
for RV32: x=0
for RV64: x=0,2

Parameters

[in]	a	unsigned long type of value stored in a
[in]	b	unsigned long type of value stored in b

Returns: value stored in unsigned long type

Definition at line 2482 of file core_feature_dsp.h.

 {
     unsigned long result;
     __ASM volatile("khmx16 %0, %1, %2" : "=r"(result) : "r"(a), "r"(b));
     return result;
 }

References __ASM.

◆ __RV_SMUL16()

__STATIC_FORCEINLINE unsigned long long __RV_SMUL16	(	unsigned int	a,
		unsigned int	b
	)

SMUL16 (SIMD Signed 16-bit Multiply)

Type: SIMD

Syntax:

SMUL16 Rd, Rs1, Rs2

SMULX16 Rd, Rs1, Rs2

Purpose:
Do signed 16-bit multiplications and generate two 32-bit results simultaneously.

RV32 Description:
For the SMUL16 instruction, multiply the top 16-bit Q15 content of Rs1 with the top 16-bit Q15 content of Rs2. At the same time, multiply the bottom 16-bit Q15 content of Rs1 with the bottom 16-bit Q15 content of Rs2. For the SMULX16 instruction, multiply the top 16-bit Q15 content of Rs1 with the bottom 16-bit Q15 content of Rs2. At the same time, multiply the bottom 16-bit Q15 content of Rs1 with the top 16- bit Q15 content of Rs2. The two Q30 results are then written into an even/odd pair of registers specified by Rd(4,1). Rd(4,1), i.e., d, determines the even/odd pair group of two registers. Specifically, the register pair includes register 2d and 2d+1. The odd 2d+1 register of the pair contains the 32-bit result calculated from the top part of Rs1 and the even 2d register of the pair contains the 32-bit result calculated from the bottom part of Rs1.

RV64 Description:
For the SMUL16 instruction, multiply the top 16-bit Q15 content of the lower 32-bit word in Rs1 with the top 16-bit Q15 content of the lower 32-bit word in Rs2. At the same time, multiply the bottom 16-bit Q15 content of the lower 32-bit word in Rs1 with the bottom 16-bit Q15 content of the lower 32-bit word in Rs2. For the SMULX16 instruction, multiply the top 16-bit Q15 content of the lower 32-bit word in Rs1 with the bottom 16-bit Q15 content of the lower 32-bit word in Rs2. At the same time, multiply the bottom 16-bit Q15 content of the lower 32-bit word in Rs1 with the top 16-bit Q15 content of the lower 32-bit word in Rs2. The two 32-bit Q30 results are then written into Rd. The result calculated from the top 16-bit of the lower 32-bit word in Rs1 is written to Rd.W[1]. And the result calculated from the bottom 16-bit of the lower 32-bit word in Rs1 is written to Rd.W[0]

Operations:

* RV32:
if (is `SMUL16`) {
  op1t = Rs1.H[1]; op2t = Rs2.H[1]; // top
  op1b = Rs1.H[0]; op2b = Rs2.H[0]; // bottom
} else if (is `SMULX16`) {
  op1t = Rs1.H[1]; op2t = Rs2.H[0]; // Rs1 top
  op1b = Rs1.H[0]; op2b = Rs2.H[1]; // Rs1 bottom
}
for ((aop,bop,res) in [(op1t,op2t,rest), (op1b,op2b,resb)]) {
  res = aop s* bop;
}
t_L = CONCAT(Rd(4,1),1'b0); t_H = CONCAT(Rd(4,1),1'b1);
R[t_H] = rest;
R[t_L] = resb;
* RV64:
if (is `SMUL16`) {
  op1t = Rs1.H[1]; op2t = Rs2.H[1]; // top
  op1b = Rs1.H[0]; op2b = Rs2.H[0]; // bottom
} else if (is `SMULX16`) {
  op1t = Rs1.H[1]; op2t = Rs2.H[0]; // Rs1 top
  op1b = Rs1.H[0]; op2b = Rs2.H[1]; // Rs1 bottom
}
for ((aop,bop,res) in [(op1t,op2t,rest), (op1b,op2b,resb)]) {
  res = aop s* bop;
}
Rd.W[1] = rest;
Rd.W[0] = resb;

Parameters

[in]	a	unsigned int type of value stored in a
[in]	b	unsigned int type of value stored in b

Returns: value stored in unsigned long long type

Definition at line 9484 of file core_feature_dsp.h.

 {
     unsigned long long result;
     __ASM volatile("smul16 %0, %1, %2" : "=r"(result) : "r"(a), "r"(b));
     return result;
 }

References __ASM.

◆ __RV_SMULX16()

__STATIC_FORCEINLINE unsigned long long __RV_SMULX16	(	unsigned int	a,
		unsigned int	b
	)

SMULX16 (SIMD Signed Crossed 16-bit Multiply)

Type: SIMD

Syntax:

SMUL16 Rd, Rs1, Rs2

SMULX16 Rd, Rs1, Rs2

Purpose:
Do signed 16-bit multiplications and generate two 32-bit results simultaneously.

RV32 Description:
For the SMUL16 instruction, multiply the top 16-bit Q15 content of Rs1 with the top 16-bit Q15 content of Rs2. At the same time, multiply the bottom 16-bit Q15 content of Rs1 with the bottom 16-bit Q15 content of Rs2. For the SMULX16 instruction, multiply the top 16-bit Q15 content of Rs1 with the bottom 16-bit Q15 content of Rs2. At the same time, multiply the bottom 16-bit Q15 content of Rs1 with the top 16- bit Q15 content of Rs2. The two Q30 results are then written into an even/odd pair of registers specified by Rd(4,1). Rd(4,1), i.e., d, determines the even/odd pair group of two registers. Specifically, the register pair includes register 2d and 2d+1. The odd 2d+1 register of the pair contains the 32-bit result calculated from the top part of Rs1 and the even 2d register of the pair contains the 32-bit result calculated from the bottom part of Rs1.

RV64 Description:
For the SMUL16 instruction, multiply the top 16-bit Q15 content of the lower 32-bit word in Rs1 with the top 16-bit Q15 content of the lower 32-bit word in Rs2. At the same time, multiply the bottom 16-bit Q15 content of the lower 32-bit word in Rs1 with the bottom 16-bit Q15 content of the lower 32-bit word in Rs2. For the SMULX16 instruction, multiply the top 16-bit Q15 content of the lower 32-bit word in Rs1 with the bottom 16-bit Q15 content of the lower 32-bit word in Rs2. At the same time, multiply the bottom 16-bit Q15 content of the lower 32-bit word in Rs1 with the top 16-bit Q15 content of the lower 32-bit word in Rs2. The two 32-bit Q30 results are then written into Rd. The result calculated from the top 16-bit of the lower 32-bit word in Rs1 is written to Rd.W[1]. And the result calculated from the bottom 16-bit of the lower 32-bit word in Rs1 is written to Rd.W[0]

Operations:

* RV32:
if (is `SMUL16`) {
  op1t = Rs1.H[1]; op2t = Rs2.H[1]; // top
  op1b = Rs1.H[0]; op2b = Rs2.H[0]; // bottom
} else if (is `SMULX16`) {
  op1t = Rs1.H[1]; op2t = Rs2.H[0]; // Rs1 top
  op1b = Rs1.H[0]; op2b = Rs2.H[1]; // Rs1 bottom
}
for ((aop,bop,res) in [(op1t,op2t,rest), (op1b,op2b,resb)]) {
  res = aop s* bop;
}
t_L = CONCAT(Rd(4,1),1'b0); t_H = CONCAT(Rd(4,1),1'b1);
R[t_H] = rest;
R[t_L] = resb;
* RV64:
if (is `SMUL16`) {
  op1t = Rs1.H[1]; op2t = Rs2.H[1]; // top
  op1b = Rs1.H[0]; op2b = Rs2.H[0]; // bottom
} else if (is `SMULX16`) {
  op1t = Rs1.H[1]; op2t = Rs2.H[0]; // Rs1 top
  op1b = Rs1.H[0]; op2b = Rs2.H[1]; // Rs1 bottom
}
for ((aop,bop,res) in [(op1t,op2t,rest), (op1b,op2b,resb)]) {
  res = aop s* bop;
}
Rd.W[1] = rest;
Rd.W[0] = resb;

Parameters

[in]	a	unsigned int type of value stored in a
[in]	b	unsigned int type of value stored in b

Returns: value stored in unsigned long long type

Definition at line 9569 of file core_feature_dsp.h.

 {
     unsigned long long result;
     __ASM volatile("smulx16 %0, %1, %2" : "=r"(result) : "r"(a), "r"(b));
     return result;
 }

References __ASM.

◆ __RV_UMUL16()

__STATIC_FORCEINLINE unsigned long long __RV_UMUL16	(	unsigned int	a,
		unsigned int	b
	)

UMUL16 (SIMD Unsigned 16-bit Multiply)

Type: SIMD

Syntax:

UMUL16 Rd, Rs1, Rs2

UMULX16 Rd, Rs1, Rs2

Purpose:
Do unsigned 16-bit multiplications and generate two 32-bit results simultaneously.

RV32 Description:
For the UMUL16 instruction, multiply the top 16-bit U16 content of Rs1 with the top 16-bit U16 content of Rs2. At the same time, multiply the bottom 16-bit U16 content of Rs1 with the bottom 16-bit U16 content of Rs2. For the UMULX16 instruction, multiply the top 16-bit U16 content of Rs1 with the bottom 16-bit U16 content of Rs2. At the same time, multiply the bottom 16-bit U16 content of Rs1 with the top 16- bit U16 content of Rs2. The two U32 results are then written into an even/odd pair of registers specified by Rd(4,1). Rd(4,1), i.e., d, determines the even/odd pair group of two registers. Specifically, the register pair includes register 2d and 2d+1. The odd 2d+1 register of the pair contains the 32-bit result calculated from the top part of Rs1 and the even 2d register of the pair contains the 32-bit result calculated from the bottom part of Rs1.

RV64 Description:
For the UMUL16 instruction, multiply the top 16-bit U16 content of the lower 32-bit word in Rs1 with the top 16-bit U16 content of the lower 32-bit word in Rs2. At the same time, multiply the bottom 16-bit U16 content of the lower 32-bit word in Rs1 with the bottom 16-bit U16 content of the lower 32-bit word in Rs2. For the UMULX16 instruction, multiply the top 16-bit U16 content of the lower 32-bit word in Rs1 with the bottom 16-bit U16 content of the lower 32-bit word in Rs2. At the same time, multiply the bottom 16-bit U16 content of the lower 32-bit word in Rs1 with the top 16-bit U16 content of the lower 32-bit word in Rs2. The two 32-bit U32 results are then written into Rd. The result calculated from the top 16-bit of the lower 32-bit word in Rs1 is written to Rd.W[1]. And the result calculated from the bottom 16-bit of the lower 32-bit word in Rs1 is written to Rd.W[0]

Operations:

* RV32:
if (is `UMUL16`) {
  op1t = Rs1.H[1]; op2t = Rs2.H[1]; // top
  op1b = Rs1.H[0]; op2b = Rs2.H[0]; // bottom
} else if (is `UMULX16`) {
  op1t = Rs1.H[1]; op2t = Rs2.H[0]; // Rs1 top
  op1b = Rs1.H[0]; op2b = Rs2.H[1]; // Rs1 bottom
}
for ((aop,bop,res) in [(op1t,op2t,rest), (op1b,op2b,resb)]) {
  res = aop u* bop;
}
t_L = CONCAT(Rd(4,1),1'b0); t_H = CONCAT(Rd(4,1),1'b1);
R[t_H] = rest;
R[t_L] = resb;
* RV64:
if (is `UMUL16`) {
  op1t = Rs1.H[1]; op2t = Rs2.H[1]; // top
  op1b = Rs1.H[0]; op2b = Rs2.H[0]; // bottom
} else if (is `UMULX16`) {
  op1t = Rs1.H[1]; op2t = Rs2.H[0]; // Rs1 top
  op1b = Rs1.H[0]; op2b = Rs2.H[1]; // Rs1 bottom
}
for ((aop,bop,res) in [(op1t,op2t,rest), (op1b,op2b,resb)]) {
  res = aop u* bop;
}
Rd.W[1] = rest;
Rd.W[0] = resb;

Parameters

[in]	a	unsigned int type of value stored in a
[in]	b	unsigned int type of value stored in b

Returns: value stored in unsigned long long type

Definition at line 12762 of file core_feature_dsp.h.

 {
     unsigned long long result;
     __ASM volatile("umul16 %0, %1, %2" : "=r"(result) : "r"(a), "r"(b));
     return result;
 }

References __ASM.

◆ __RV_UMULX16()

__STATIC_FORCEINLINE unsigned long long __RV_UMULX16	(	unsigned int	a,
		unsigned int	b
	)

UMULX16 (SIMD Unsigned Crossed 16-bit Multiply)

Type: SIMD

Syntax:

UMUL16 Rd, Rs1, Rs2

UMULX16 Rd, Rs1, Rs2

Purpose:
Do unsigned 16-bit multiplications and generate two 32-bit results simultaneously.

RV32 Description:
For the UMUL16 instruction, multiply the top 16-bit U16 content of Rs1 with the top 16-bit U16 content of Rs2. At the same time, multiply the bottom 16-bit U16 content of Rs1 with the bottom 16-bit U16 content of Rs2. For the UMULX16 instruction, multiply the top 16-bit U16 content of Rs1 with the bottom 16-bit U16 content of Rs2. At the same time, multiply the bottom 16-bit U16 content of Rs1 with the top 16- bit U16 content of Rs2. The two U32 results are then written into an even/odd pair of registers specified by Rd(4,1). Rd(4,1), i.e., d, determines the even/odd pair group of two registers. Specifically, the register pair includes register 2d and 2d+1. The odd 2d+1 register of the pair contains the 32-bit result calculated from the top part of Rs1 and the even 2d register of the pair contains the 32-bit result calculated from the bottom part of Rs1.

RV64 Description:
For the UMUL16 instruction, multiply the top 16-bit U16 content of the lower 32-bit word in Rs1 with the top 16-bit U16 content of the lower 32-bit word in Rs2. At the same time, multiply the bottom 16-bit U16 content of the lower 32-bit word in Rs1 with the bottom 16-bit U16 content of the lower 32-bit word in Rs2. For the UMULX16 instruction, multiply the top 16-bit U16 content of the lower 32-bit word in Rs1 with the bottom 16-bit U16 content of the lower 32-bit word in Rs2. At the same time, multiply the bottom 16-bit U16 content of the lower 32-bit word in Rs1 with the top 16-bit U16 content of the lower 32-bit word in Rs2. The two 32-bit U32 results are then written into Rd. The result calculated from the top 16-bit of the lower 32-bit word in Rs1 is written to Rd.W[1]. And the result calculated from the bottom 16-bit of the lower 32-bit word in Rs1 is written to Rd.W[0]

Operations:

* RV32:
if (is `UMUL16`) {
  op1t = Rs1.H[1]; op2t = Rs2.H[1]; // top
  op1b = Rs1.H[0]; op2b = Rs2.H[0]; // bottom
} else if (is `UMULX16`) {
  op1t = Rs1.H[1]; op2t = Rs2.H[0]; // Rs1 top
  op1b = Rs1.H[0]; op2b = Rs2.H[1]; // Rs1 bottom
}
for ((aop,bop,res) in [(op1t,op2t,rest), (op1b,op2b,resb)]) {
  res = aop u* bop;
}
t_L = CONCAT(Rd(4,1),1'b0); t_H = CONCAT(Rd(4,1),1'b1);
R[t_H] = rest;
R[t_L] = resb;
* RV64:
if (is `UMUL16`) {
  op1t = Rs1.H[1]; op2t = Rs2.H[1]; // top
  op1b = Rs1.H[0]; op2b = Rs2.H[0]; // bottom
} else if (is `UMULX16`) {
  op1t = Rs1.H[1]; op2t = Rs2.H[0]; // Rs1 top
  op1b = Rs1.H[0]; op2b = Rs2.H[1]; // Rs1 bottom
}
for ((aop,bop,res) in [(op1t,op2t,rest), (op1b,op2b,resb)]) {
  res = aop u* bop;
}
Rd.W[1] = rest;
Rd.W[0] = resb;

Parameters

[in]	a	unsigned int type of value stored in a
[in]	b	unsigned int type of value stored in b

Returns: value stored in unsigned long long type

Definition at line 12847 of file core_feature_dsp.h.

 {
     unsigned long long result;
     __ASM volatile("umulx16 %0, %1, %2" : "=r"(result) : "r"(a), "r"(b));
     return result;
 }

References __ASM.

Functions

Detailed Description

Function Documentation

◆ __RV_KHM16()

◆ __RV_KHMX16()

◆ __RV_SMUL16()

◆ __RV_SMULX16()

◆ __RV_UMUL16()

◆ __RV_UMULX16()