NMSIS-Core  Version 1.2.0
NMSIS-Core support for Nuclei processor-based devices

SIMD 8-bit Multiply Instructions. More...

Functions

__STATIC_FORCEINLINE unsigned long __RV_KHM8 (unsigned long a, unsigned long b)
 KHM8 (SIMD Signed Saturating Q7 Multiply) More...
 
__STATIC_FORCEINLINE unsigned long __RV_KHMX8 (unsigned long a, unsigned long b)
 KHMX8 (SIMD Signed Saturating Crossed Q7 Multiply) More...
 
__STATIC_FORCEINLINE unsigned long long __RV_SMUL8 (unsigned int a, unsigned int b)
 SMUL8 (SIMD Signed 8-bit Multiply) More...
 
__STATIC_FORCEINLINE unsigned long long __RV_SMULX8 (unsigned int a, unsigned int b)
 SMULX8 (SIMD Signed Crossed 8-bit Multiply) More...
 
__STATIC_FORCEINLINE unsigned long long __RV_UMUL8 (unsigned int a, unsigned int b)
 UMUL8 (SIMD Unsigned 8-bit Multiply) More...
 
__STATIC_FORCEINLINE unsigned long long __RV_UMULX8 (unsigned int a, unsigned int b)
 UMULX8 (SIMD Unsigned Crossed 8-bit Multiply) More...
 

Detailed Description

SIMD 8-bit Multiply Instructions.

there are 6 SIMD 8-bit Multiply instructions.

Function Documentation

◆ __RV_KHM8()

__STATIC_FORCEINLINE unsigned long __RV_KHM8 ( unsigned long  a,
unsigned long  b 
)

KHM8 (SIMD Signed Saturating Q7 Multiply)

Type: SIMD

Syntax:

KHM8 Rd, Rs1, Rs2
KHMX8 Rd, Rs1, Rs2

Purpose:
Do Q7xQ7 element multiplications simultaneously. The Q14 results are then reduced to Q7 numbers again.

Description:
For the KHM8 instruction, multiply the top 8-bit Q7 content of 16-bit chunks in Rs1 with the top 8-bit Q7 content of 16-bit chunks in Rs2. At the same time, multiply the bottom 8-bit Q7 content of 16-bit chunks in Rs1 with the bottom 8-bit Q7 content of 16-bit chunks in Rs2. For the KHMX16 instruction, multiply the top 8-bit Q7 content of 16-bit chunks in Rs1 with the bottom 8-bit Q7 content of 16-bit chunks in Rs2. At the same time, multiply the bottom 8-bit Q7 content of 16-bit chunks in Rs1 with the top 8-bit Q7 content of 16-bit chunks in Rs2. The Q14 results are then right-shifted 7-bits and saturated into Q7 values. The Q7 results are then written into Rd. When both the two Q7 inputs of a multiplication are 0x80, saturation will happen. The result will be saturated to 0x7F and the overflow flag OV will be set.

Operations:

if (is `KHM8`) {
op1t = Rs1.B[x+1]; op2t = Rs2.B[x+1]; // top
op1b = Rs1.B[x]; op2b = Rs2.B[x]; // bottom
} else if (is `KHMX8`) {
op1t = Rs1.H[x+1]; op2t = Rs2.H[x]; // Rs1 top
op1b = Rs1.H[x]; op2b = Rs2.H[x+1]; // Rs1 bottom
}
for ((aop,bop,res) in [(op1t,op2t,rest), (op1b,op2b,resb)]) {
if (0x80 != aop | 0x80 != bop) {
res = (aop s* bop) >> 7;
} else {
res= 0x7F;
OV = 1;
}
}
Rd.H[x/2] = concat(rest, resb);
for RV32, x=0,2
for RV64, x=0,2,4,6
Parameters
[in]aunsigned long type of value stored in a
[in]bunsigned long type of value stored in b
Returns
value stored in unsigned long type

Definition at line 2294 of file core_feature_dsp.h.

2295 {
2296  unsigned long result;
2297  __ASM volatile("khm8 %0, %1, %2" : "=r"(result) : "r"(a), "r"(b));
2298  return result;
2299 }

References __ASM.

◆ __RV_KHMX8()

__STATIC_FORCEINLINE unsigned long __RV_KHMX8 ( unsigned long  a,
unsigned long  b 
)

KHMX8 (SIMD Signed Saturating Crossed Q7 Multiply)

Type: SIMD

Syntax:

KHM8 Rd, Rs1, Rs2
KHMX8 Rd, Rs1, Rs2

Purpose:
Do Q7xQ7 element multiplications simultaneously. The Q14 results are then reduced to Q7 numbers again.

Description:
For the KHM8 instruction, multiply the top 8-bit Q7 content of 16-bit chunks in Rs1 with the top 8-bit Q7 content of 16-bit chunks in Rs2. At the same time, multiply the bottom 8-bit Q7 content of 16-bit chunks in Rs1 with the bottom 8-bit Q7 content of 16-bit chunks in Rs2. For the KHMX16 instruction, multiply the top 8-bit Q7 content of 16-bit chunks in Rs1 with the bottom 8-bit Q7 content of 16-bit chunks in Rs2. At the same time, multiply the bottom 8-bit Q7 content of 16-bit chunks in Rs1 with the top 8-bit Q7 content of 16-bit chunks in Rs2. The Q14 results are then right-shifted 7-bits and saturated into Q7 values. The Q7 results are then written into Rd. When both the two Q7 inputs of a multiplication are 0x80, saturation will happen. The result will be saturated to 0x7F and the overflow flag OV will be set.

Operations:

if (is `KHM8`) {
op1t = Rs1.B[x+1]; op2t = Rs2.B[x+1]; // top
op1b = Rs1.B[x]; op2b = Rs2.B[x]; // bottom
} else if (is `KHMX8`) {
op1t = Rs1.H[x+1]; op2t = Rs2.H[x]; // Rs1 top
op1b = Rs1.H[x]; op2b = Rs2.H[x+1]; // Rs1 bottom
}
for ((aop,bop,res) in [(op1t,op2t,rest), (op1b,op2b,resb)]) {
if (0x80 != aop | 0x80 != bop) {
res = (aop s* bop) >> 7;
} else {
res= 0x7F;
OV = 1;
}
}
Rd.H[x/2] = concat(rest, resb);
for RV32, x=0,2
for RV64, x=0,2,4,6
Parameters
[in]aunsigned long type of value stored in a
[in]bunsigned long type of value stored in b
Returns
value stored in unsigned long type

Definition at line 2356 of file core_feature_dsp.h.

2357 {
2358  unsigned long result;
2359  __ASM volatile("khmx8 %0, %1, %2" : "=r"(result) : "r"(a), "r"(b));
2360  return result;
2361 }

References __ASM.

◆ __RV_SMUL8()

__STATIC_FORCEINLINE unsigned long long __RV_SMUL8 ( unsigned int  a,
unsigned int  b 
)

SMUL8 (SIMD Signed 8-bit Multiply)

Type: SIMD

Syntax:

SMUL8 Rd, Rs1, Rs2
SMULX8 Rd, Rs1, Rs2

Purpose:
Do signed 8-bit multiplications and generate four 16-bit results simultaneously.

RV32 Description:
For the SMUL8 instruction, multiply the 8-bit data elements of Rs1 with the corresponding 8-bit data elements of Rs2. For the SMULX8 instruction, multiply the first and second 8-bit data elements of Rs1 with the second and first 8-bit data elements of Rs2. At the same time, multiply the third and fourth 8-bit data elements of Rs1 with the fourth and third 8-bit data elements of Rs2. The four 16-bit results are then written into an even/odd pair of registers specified by Rd(4,1). Rd(4,1), i.e., d, determines the even/odd pair group of two registers. Specifically, the register pair includes register 2d and 2d+1. The odd 2d+1 register of the pair contains the two 16-bit results calculated from the top part of Rs1 and the even 2d register of the pair contains the two 16-bit results calculated from the bottom part of Rs1.

RV64 Description:
For the SMUL8 instruction, multiply the 8-bit data elements of Rs1 with the corresponding 8-bit data elements of Rs2. For the SMULX8 instruction, multiply the first and second 8-bit data elements of Rs1 with the second and first 8-bit data elements of Rs2. At the same time, multiply the third and fourth 8-bit data elements of Rs1 with the fourth and third 8-bit data elements of Rs2. The four 16-bit results are then written into Rd. The Rd.W[1] contains the two 16-bit results calculated from the top part of Rs1 and the Rd.W[0] contains the two 16-bit results calculated from the bottom part of Rs1.

Operations:

* RV32:
if (is `SMUL8`) {
op1t[x/2] = Rs1.B[x+1]; op2t[x/2] = Rs2.B[x+1]; // top
op1b[x/2] = Rs1.B[x]; op2b[x/2] = Rs2.B[x]; // bottom
} else if (is `SMULX8`) {
op1t[x/2] = Rs1.B[x+1]; op2t[x/2] = Rs2.B[x]; // Rs1 top
op1b[x/2] = Rs1.B[x]; op2b[x/2] = Rs2.B[x+1]; // Rs1 bottom
}
rest[x/2] = op1t[x/2] s* op2t[x/2];
resb[x/2] = op1b[x/2] s* op2b[x/2];
t_L = CONCAT(Rd(4,1),1'b0); t_H = CONCAT(Rd(4,1),1'b1);
R[t_H].H[1] = rest[1]; R[t_H].H[0] = resb[1];
R[t_L].H[1] = rest[0]; R[t_L].H[0] = resb[0];
x = 0 and 2
* RV64:
if (is `SMUL8`) {
op1t[x/2] = Rs1.B[x+1]; op2t[x/2] = Rs2.B[x+1]; // top
op1b[x/2] = Rs1.B[x]; op2b[x/2] = Rs2.B[x]; // bottom
} else if (is `SMULX8`) {
op1t[x/2] = Rs1.B[x+1]; op2t[x/2] = Rs2.B[x]; // Rs1 top
op1b[x/2] = Rs1.B[x]; op2b[x/2] = Rs2.B[x+1]; // Rs1 bottom
}
rest[x/2] = op1t[x/2] s* op2t[x/2];
resb[x/2] = op1b[x/2] s* op2b[x/2];
t_L = CONCAT(Rd(4,1),1'b0); t_H = CONCAT(Rd(4,1),1'b1);
Rd.W[1].H[1] = rest[1]; Rd.W[1].H[0] = resb[1];
Rd.W[0].H[1] = rest[0]; Rd.W[0].H[0] = resb[0];
x = 0 and 2
Parameters
[in]aunsigned int type of value stored in a
[in]bunsigned int type of value stored in b
Returns
value stored in unsigned long long type

Definition at line 9316 of file core_feature_dsp.h.

9317 {
9318  unsigned long long result;
9319  __ASM volatile("smul8 %0, %1, %2" : "=r"(result) : "r"(a), "r"(b));
9320  return result;
9321 }

References __ASM.

◆ __RV_SMULX8()

__STATIC_FORCEINLINE unsigned long long __RV_SMULX8 ( unsigned int  a,
unsigned int  b 
)

SMULX8 (SIMD Signed Crossed 8-bit Multiply)

Type: SIMD

Syntax:

SMUL8 Rd, Rs1, Rs2
SMULX8 Rd, Rs1, Rs2

Purpose:
Do signed 8-bit multiplications and generate four 16-bit results simultaneously.

RV32 Description:
For the SMUL8 instruction, multiply the 8-bit data elements of Rs1 with the corresponding 8-bit data elements of Rs2. For the SMULX8 instruction, multiply the first and second 8-bit data elements of Rs1 with the second and first 8-bit data elements of Rs2. At the same time, multiply the third and fourth 8-bit data elements of Rs1 with the fourth and third 8-bit data elements of Rs2. The four 16-bit results are then written into an even/odd pair of registers specified by Rd(4,1). Rd(4,1), i.e., d, determines the even/odd pair group of two registers. Specifically, the register pair includes register 2d and 2d+1. The odd 2d+1 register of the pair contains the two 16-bit results calculated from the top part of Rs1 and the even 2d register of the pair contains the two 16-bit results calculated from the bottom part of Rs1.

RV64 Description:
For the SMUL8 instruction, multiply the 8-bit data elements of Rs1 with the corresponding 8-bit data elements of Rs2. For the SMULX8 instruction, multiply the first and second 8-bit data elements of Rs1 with the second and first 8-bit data elements of Rs2. At the same time, multiply the third and fourth 8-bit data elements of Rs1 with the fourth and third 8-bit data elements of Rs2. The four 16-bit results are then written into Rd. The Rd.W[1] contains the two 16-bit results calculated from the top part of Rs1 and the Rd.W[0] contains the two 16-bit results calculated from the bottom part of Rs1.

Operations:

* RV32:
if (is `SMUL8`) {
op1t[x/2] = Rs1.B[x+1]; op2t[x/2] = Rs2.B[x+1]; // top
op1b[x/2] = Rs1.B[x]; op2b[x/2] = Rs2.B[x]; // bottom
} else if (is `SMULX8`) {
op1t[x/2] = Rs1.B[x+1]; op2t[x/2] = Rs2.B[x]; // Rs1 top
op1b[x/2] = Rs1.B[x]; op2b[x/2] = Rs2.B[x+1]; // Rs1 bottom
}
rest[x/2] = op1t[x/2] s* op2t[x/2];
resb[x/2] = op1b[x/2] s* op2b[x/2];
t_L = CONCAT(Rd(4,1),1'b0); t_H = CONCAT(Rd(4,1),1'b1);
R[t_H].H[1] = rest[1]; R[t_H].H[0] = resb[1];
R[t_L].H[1] = rest[0]; R[t_L].H[0] = resb[0];
x = 0 and 2
* RV64:
if (is `SMUL8`) {
op1t[x/2] = Rs1.B[x+1]; op2t[x/2] = Rs2.B[x+1]; // top
op1b[x/2] = Rs1.B[x]; op2b[x/2] = Rs2.B[x]; // bottom
} else if (is `SMULX8`) {
op1t[x/2] = Rs1.B[x+1]; op2t[x/2] = Rs2.B[x]; // Rs1 top
op1b[x/2] = Rs1.B[x]; op2b[x/2] = Rs2.B[x+1]; // Rs1 bottom
}
rest[x/2] = op1t[x/2] s* op2t[x/2];
resb[x/2] = op1b[x/2] s* op2b[x/2];
t_L = CONCAT(Rd(4,1),1'b0); t_H = CONCAT(Rd(4,1),1'b1);
Rd.W[1].H[1] = rest[1]; Rd.W[1].H[0] = resb[1];
Rd.W[0].H[1] = rest[0]; Rd.W[0].H[0] = resb[0];
x = 0 and 2
Parameters
[in]aunsigned int type of value stored in a
[in]bunsigned int type of value stored in b
Returns
value stored in unsigned long long type

Definition at line 9399 of file core_feature_dsp.h.

9400 {
9401  unsigned long long result;
9402  __ASM volatile("smulx8 %0, %1, %2" : "=r"(result) : "r"(a), "r"(b));
9403  return result;
9404 }

References __ASM.

◆ __RV_UMUL8()

__STATIC_FORCEINLINE unsigned long long __RV_UMUL8 ( unsigned int  a,
unsigned int  b 
)

UMUL8 (SIMD Unsigned 8-bit Multiply)

Type: SIMD

Syntax:

UMUL8 Rd, Rs1, Rs2
UMULX8 Rd, Rs1, Rs2

Purpose:
Do unsigned 8-bit multiplications and generate four 16-bit results simultaneously.

RV32 Description:
For the UMUL8 instruction, multiply the unsigned 8-bit data elements of Rs1 with the corresponding unsigned 8-bit data elements of Rs2. For the UMULX8 instruction, multiply the first and second unsigned 8-bit data elements of Rs1 with the second and first unsigned 8-bit data elements of Rs2. At the same time, multiply the third and fourth unsigned 8-bit data elements of Rs1 with the fourth and third unsigned 8-bit data elements of Rs2. The four 16-bit results are then written into an even/odd pair of registers specified by Rd(4,1). Rd(4,1), i.e., d, determines the even/odd pair group of two registers. Specifically, the register pair includes register 2d and 2d+1. The odd 2d+1 register of the pair contains the two 16-bit results calculated from the top part of Rs1 and the even 2d register of the pair contains the two 16-bit results calculated from the bottom part of Rs1.

RV64 Description:
For the UMUL8 instruction, multiply the unsigned 8-bit data elements of Rs1 with the corresponding unsigned 8-bit data elements of Rs2. For the UMULX8 instruction, multiply the first and second unsigned 8-bit data elements of Rs1 with the second and first unsigned 8-bit data elements of Rs2. At the same time, multiply the third and fourth unsigned 8-bit data elements of Rs1 with the fourth and third unsigned 8-bit data elements of Rs2. The four 16-bit results are then written into Rd. The Rd.W[1] contains the two 16-bit results calculated from the top part of Rs1 and the Rd.W[0] contains the two 16-bit results calculated from the bottom part of Rs1.

Operations:

* RV32:
if (is `UMUL8`) {
op1t[x/2] = Rs1.B[x+1]; op2t[x/2] = Rs2.B[x+1]; // top
op1b[x/2] = Rs1.B[x]; op2b[x/2] = Rs2.B[x]; // bottom
} else if (is `UMULX8`) {
op1t[x/2] = Rs1.B[x+1]; op2t[x/2] = Rs2.B[x]; // Rs1 top
op1b[x/2] = Rs1.B[x]; op2b[x/2] = Rs2.B[x+1]; // Rs1 bottom
}
rest[x/2] = op1t[x/2] u* op2t[x/2];
resb[x/2] = op1b[x/2] u* op2b[x/2];
t_L = CONCAT(Rd(4,1),1'b0); t_H = CONCAT(Rd(4,1),1'b1);
R[t_H].H[1] = rest[1]; R[t_H].H[0] = resb[1];
R[t_L].H[1] = rest[0]; R[t_L].H[0] = resb[0];
x = 0 and 2
* RV64:
if (is `UMUL8`) {
op1t[x/2] = Rs1.B[x+1]; op2t[x/2] = Rs2.B[x+1]; // top
op1b[x/2] = Rs1.B[x]; op2b[x/2] = Rs2.B[x]; // bottom
} else if (is `UMULX8`) {
op1t[x/2] = Rs1.B[x+1]; op2t[x/2] = Rs2.B[x]; // Rs1 top
op1b[x/2] = Rs1.B[x]; op2b[x/2] = Rs2.B[x+1]; // Rs1 bottom
}
rest[x/2] = op1t[x/2] u* op2t[x/2];
resb[x/2] = op1b[x/2] u* op2b[x/2];
t_L = CONCAT(Rd(4,1),1'b0); t_H = CONCAT(Rd(4,1),1'b1);
Rd.W[1].H[1] = rest[1]; Rd.W[1].H[0] = resb[1];
Rd.W[0].H[1] = rest[0]; Rd.W[0].H[0] = resb[0]; x = 0 and 2
Parameters
[in]aunsigned int type of value stored in a
[in]bunsigned int type of value stored in b
Returns
value stored in unsigned long long type

Definition at line 12593 of file core_feature_dsp.h.

12594 {
12595  unsigned long long result;
12596  __ASM volatile("umul8 %0, %1, %2" : "=r"(result) : "r"(a), "r"(b));
12597  return result;
12598 }

References __ASM.

◆ __RV_UMULX8()

__STATIC_FORCEINLINE unsigned long long __RV_UMULX8 ( unsigned int  a,
unsigned int  b 
)

UMULX8 (SIMD Unsigned Crossed 8-bit Multiply)

Type: SIMD

Syntax:

UMUL8 Rd, Rs1, Rs2
UMULX8 Rd, Rs1, Rs2

Purpose:
Do unsigned 8-bit multiplications and generate four 16-bit results simultaneously.

RV32 Description:
For the UMUL8 instruction, multiply the unsigned 8-bit data elements of Rs1 with the corresponding unsigned 8-bit data elements of Rs2. For the UMULX8 instruction, multiply the first and second unsigned 8-bit data elements of Rs1 with the second and first unsigned 8-bit data elements of Rs2. At the same time, multiply the third and fourth unsigned 8-bit data elements of Rs1 with the fourth and third unsigned 8-bit data elements of Rs2. The four 16-bit results are then written into an even/odd pair of registers specified by Rd(4,1). Rd(4,1), i.e., d, determines the even/odd pair group of two registers. Specifically, the register pair includes register 2d and 2d+1. The odd 2d+1 register of the pair contains the two 16-bit results calculated from the top part of Rs1 and the even 2d register of the pair contains the two 16-bit results calculated from the bottom part of Rs1.

RV64 Description:
For the UMUL8 instruction, multiply the unsigned 8-bit data elements of Rs1 with the corresponding unsigned 8-bit data elements of Rs2. For the UMULX8 instruction, multiply the first and second unsigned 8-bit data elements of Rs1 with the second and first unsigned 8-bit data elements of Rs2. At the same time, multiply the third and fourth unsigned 8-bit data elements of Rs1 with the fourth and third unsigned 8-bit data elements of Rs2. The four 16-bit results are then written into Rd. The Rd.W[1] contains the two 16-bit results calculated from the top part of Rs1 and the Rd.W[0] contains the two 16-bit results calculated from the bottom part of Rs1.

Operations:

* RV32:
if (is `UMUL8`) {
op1t[x/2] = Rs1.B[x+1]; op2t[x/2] = Rs2.B[x+1]; // top
op1b[x/2] = Rs1.B[x]; op2b[x/2] = Rs2.B[x]; // bottom
} else if (is `UMULX8`) {
op1t[x/2] = Rs1.B[x+1]; op2t[x/2] = Rs2.B[x]; // Rs1 top
op1b[x/2] = Rs1.B[x]; op2b[x/2] = Rs2.B[x+1]; // Rs1 bottom
}
rest[x/2] = op1t[x/2] u* op2t[x/2];
resb[x/2] = op1b[x/2] u* op2b[x/2];
t_L = CONCAT(Rd(4,1),1'b0); t_H = CONCAT(Rd(4,1),1'b1);
R[t_H].H[1] = rest[1]; R[t_H].H[0] = resb[1];
R[t_L].H[1] = rest[0]; R[t_L].H[0] = resb[0];
x = 0 and 2
* RV64:
if (is `UMUL8`) {
op1t[x/2] = Rs1.B[x+1]; op2t[x/2] = Rs2.B[x+1]; // top
op1b[x/2] = Rs1.B[x]; op2b[x/2] = Rs2.B[x]; // bottom
} else if (is `UMULX8`) {
op1t[x/2] = Rs1.B[x+1]; op2t[x/2] = Rs2.B[x]; // Rs1 top
op1b[x/2] = Rs1.B[x]; op2b[x/2] = Rs2.B[x+1]; // Rs1 bottom
}
rest[x/2] = op1t[x/2] u* op2t[x/2];
resb[x/2] = op1b[x/2] u* op2b[x/2];
t_L = CONCAT(Rd(4,1),1'b0); t_H = CONCAT(Rd(4,1),1'b1);
Rd.W[1].H[1] = rest[1]; Rd.W[1].H[0] = resb[1];
Rd.W[0].H[1] = rest[0]; Rd.W[0].H[0] = resb[0]; x = 0 and 2
Parameters
[in]aunsigned int type of value stored in a
[in]bunsigned int type of value stored in b
Returns
value stored in unsigned long long type

Definition at line 12677 of file core_feature_dsp.h.

12678 {
12679  unsigned long long result;
12680  __ASM volatile("umulx8 %0, %1, %2" : "=r"(result) : "r"(a), "r"(b));
12681  return result;
12682 }

References __ASM.

__ASM
#define __ASM
Pass information from the compiler to the assembler.
Definition: nmsis_gcc.h:55