3.1.10. Nuclei Custom Xxlvw Extension
3.1.10.1. Introduction
3.1.10.2. 扩展
使用该扩展时,需要打开 V
扩展和 xxlvw
扩展,同时该扩展只支持rv32
示例:-march=rv32imafcv_xxlvw -mabi=ilp32f
Attention
使用该扩展的相关 intrinsic 时,需要添加以下头文件:
#include <riscv_vector.h>
3.1.10.3. 支持的指令
- Complex number format convert
vcpack.vv vd, vs2, vs1, vm
vcunpackr.v vd, vs2, vm
vcunpacki.v vd, vs2, vm
- Fix point dynamic scaling operations
vdsmul.vv/vs vd, vs2, vs1, vm
vdsmacini.v vs2, vm
vdsmacini.s rs1,vm
vdsmacini.i uimm, vm
vdsmac.vv/vs vs2, vs1, vm
vdsmaco.vv/vs vd, vs2, vs1, vm
vlsb.v vd, vs2, vm
- Complex dynamic scaling operations
vconj.v vd, vs2, vm
vdscmul.vv/vs vd, vs2, vs1, vm
vdscmulj.vv/vs vd, vs2, vs1, vm
vdscredsum.v vd, vs2, vm
vdscmac.vv/vs vs2, vs1, vm
vdscmacj.vv/vs vs2, vs1, vm
vdscmaco.vv/vs vd, vs2, vs1, vm
vdscmacjo.vv/vs vd, vs2, vs1, vm
vdscmacor.vv/vs vd, vs2, vs1, vm
vdscmacoi.vv/vs vd, vs2, vs1, vm
vdscmacjor.vv/vs vd, vs2, vs1, vm
vdscmacjoi.vv/vs vd, vs2, vs1, vm
vdscmulr.vv/vs vd, vs2, vs1, vm
vdscmuli.vv/vs vd, vs2, vs1, vm
vdscmuljr.vv/vs vd, vs2, vs1, vm
vdscmulji.vv/vs vd, vs2, vs1, vm
- Dynamic scaling Reduced operation
vdsredsum.v vd, vs2, vm
vdsredsumn.vs vd, vs2, rs1
vdsredsumn.vi vd, vs2, uimm
vredmaxi.vv vd, vs2, vs1, vm
vredmini.vv vd, vs2, vs1, vm
- Inter-element operation instructions
vperm.vi vd, vs2, uimm
vfsl.vv vd, vs2, vs1
vfsr.vv vd, vs2, vs1
- Fast non-linear operations
vlnlp0.v vnlpr0, vs2
vlnlp1.v vnlpr1, vs2
vnle.vv vd, vs2, vs1, vm
vnle.vs vd, vs2, vs1, vm
vnlm.vv vd, vs2, vs1, vm
vnlm.vs vd, vs2, vs1, vm
- Format conversion instructions
vfcvt.b2h.v vd, vs2
vfcvt.b2w.v vd, vs2
vfcvt.h2w.v vd, vs2
vfcvt.p2c.v vd, vs2
vfcvt.h2b.v vd, vs2
vfcvt.w2b.v vd, vs2
vfcvt.w2h.v vd, vs2
vfcvt.c2p.v vd, vs2
3.1.10.4. intrinsic 命名规则
rvv intrinsic 命名规则: https://github.com/riscv-non-isa/rvv-intrinsic-doc/releases/tag/v1.0.0-rc7 v-intrinsic-spec.pdf
-> Chapter 6.
我们的命名规则遵循上述的命名规则,并在此基础上在前缀处添加了 _xl
3.1.10.5. Nuclei 自定义的 intrinsic
Note
每一条指令对应的intrinsic,会给出示例,全部的intrinsic请参考rvv intrinsic 命名规则和示例进行构建。
下文出现的sew指的是指令支持的数据类型的宽度,full_preds 指的是intrinsic函数支持的后缀形式, full 是全部都支持,包含 none(无后缀),_m,_tu,_tum,_tumu,_mu等六种形式。
vcpack.vv
sew = 32 full_preds
vint32m1_t __riscv_xl_vcpack_vv_i32m1(vint32m1_t vs2, vint32m1_t vs1, size_t vl);
vint32m1_t __riscv_xl_vcpack_vv_i32m1_m(vbool32_t vm, vint32m1_t vs2, vint32m1_t vs1, size_t vl);
vint32m1_t __riscv_xl_vcpack_vv_i32m1_tu(vint32m1_t vd, vint32m1_t vs2, vint32m1_t vs1, size_t vl);
vint32m1_t __riscv_xl_vcpack_vv_i32m1_tum(vbool32_t vm,vint32m1_t vd, vint32m1_t vs2, vint32m1_t vs1, size_t vl);
vint32m1_t __riscv_xl_vcpack_vv_i32m1_tumu(vbool32_t vm,vint32m1_t vd, vint32m1_t vs2, vint32m1_t vs1, size_t vl);
vint32m1_t __riscv_xl_vcpack_vv_i32m1_mu(vbool32_t vm,vint32m1_t vd, vint32m1_t vs2, vint32m1_t vs1, size_t vl);
Note
后续指令对应的intrinsic,只会给出none(无后缀)的全部intrinsic,其余后缀的需要使用时,参考上面的构建规则
vcunpackr.v
sew = 32 full_preds
vint32mf2_t __riscv_xl_vcunpackr_v_i32mf2(vint32mf2_t vs2, size_t vl);
vint32m1_t __riscv_xl_vcunpackr_v_i32m1(vint32m1_t vs2, size_t vl);
vint32m2_t __riscv_xl_vcunpackr_v_i32m2(vint32m2_t vs2, size_t vl);
vint32m4_t __riscv_xl_vcunpackr_v_i32m4(vint32m4_t vs2, size_t vl);
vint32m8_t __riscv_xl_vcunpackr_v_i32m8(vint32m8_t vs2, size_t vl);
vcunpacki.v
sew = 32 full_preds
vint32mf2_t __riscv_xl_vcunpacki_v_i32mf2(vint32mf2_t vs2, size_t vl);
vint32m1_t __riscv_xl_vcunpacki_v_i32m1(vint32m1_t vs2, size_t vl);
vint32m2_t __riscv_xl_vcunpacki_v_i32m2(vint32m2_t vs2, size_t vl);
vint32m4_t __riscv_xl_vcunpacki_v_i32m4(vint32m4_t vs2, size_t vl);
vint32m8_t __riscv_xl_vcunpacki_v_i32m8(vint32m8_t vs2, size_t vl);
vdsmul.vv
sew = 8/16/32 full_preds
vint8mf8_t __riscv_xl_vdscmul_vv_i8mf8(vint8mf8_t vs2, vint8mf8_t vs1, size_t vl);
vint8mf4_t __riscv_xl_vdscmul_vv_i8mf4(vint8mf4_t vs2, vint8mf4_t vs1, size_t vl);
vint8mf2_t __riscv_xl_vdscmul_vv_i8mf2(vint8mf2_t vs2, vint8mf2_t vs1, size_t vl);
vint8m1_t __riscv_xl_vdscmul_vv_i8m1(vint8m1_t vs2, vint8m1_t vs1, size_t vl);
vint8m2_t __riscv_xl_vdscmul_vv_i8m2(vint8m2_t vs2, vint8m2_t vs1, size_t vl);
vint8m4_t __riscv_xl_vdscmul_vv_i8m4(vint8m4_t vs2, vint8m4_t vs1, size_t vl);
vint8m8_t __riscv_xl_vdscmul_vv_i8m8(vint8m8_t vs2, vint8m8_t vs1, size_t vl);
vint16mf4_t __riscv_xl_vdscmul_vv_i16mf4(vint16mf4_t vs2, vint16mf4_t vs1, size_t vl);
vint16mf2_t __riscv_xl_vdscmul_vv_i16mf2(vint16mf2_t vs2, vint16mf2_t vs1, size_t vl);
vint16m1_t __riscv_xl_vdscmul_vv_i16m1(vint16m1_t vs2, vint16m1_t vs1, size_t vl);
vint16m2_t __riscv_xl_vdscmul_vv_i16m2(vint16m2_t vs2, vint16m2_t vs1, size_t vl);
vint16m4_t __riscv_xl_vdscmul_vv_i16m4(vint16m4_t vs2, vint16m4_t vs1, size_t vl);
vint16m8_t __riscv_xl_vdscmul_vv_i16m8(vint16m8_t vs2, vint16m8_t vs1, size_t vl);
vint32mf2_t __riscv_xl_vdscmul_vv_i32mf2(vint32mf2_t vs2, vint32mf2_t vs1, size_t vl);
vint32m1_t __riscv_xl_vdscmul_vv_i32m1(vint32m1_t vs2, vint32m1_t vs1, size_t vl);
vint32m2_t __riscv_xl_vdscmul_vv_i32m2(vint32m2_t vs2, vint32m2_t vs1, size_t vl);
vint32m4_t __riscv_xl_vdscmul_vv_i32m4(vint32m4_t vs2, vint32m4_t vs1, size_t vl);
vint32m8_t __riscv_xl_vdscmul_vv_i32m8(vint32m8_t vs2, vint32m8_t vs1, size_t vl);
vdsmul.vs
sew = 8/16/32 full_preds
vint8mf8_t __riscv_xl_vdscmul_vs_i8mf8_i8mf8(vint8mf8_t vs2, vint8mf8_t vs1, size_t vl);
vint8mf4_t __riscv_xl_vdscmul_vs_i8mf4_i8mf4(vint8mf4_t vs2, vint8mf4_t vs1, size_t vl);
vint8mf2_t __riscv_xl_vdscmul_vs_i8mf2_i8mf2(vint8mf2_t vs2, vint8mf2_t vs1, size_t vl);
vint8m1_t __riscv_xl_vdscmul_vs_i8m1_i8m1(vint8m1_t vs2, vint8m1_t vs1, size_t vl);
vint8m2_t __riscv_xl_vdscmul_vs_i8m2_i8m2(vint8m2_t vs2, vint8m2_t vs1, size_t vl);
vint8m4_t __riscv_xl_vdscmul_vs_i8m4_i8m4(vint8m4_t vs2, vint8m4_t vs1, size_t vl);
vint8m8_t __riscv_xl_vdscmul_vs_i8m8_i8m8(vint8m8_t vs2, vint8m8_t vs1, size_t vl);
vint16mf4_t __riscv_xl_vdscmul_vs_i16mf4_i16mf4(vint16mf4_t vs2, vint16mf4_t vs1, size_t vl);
vint16mf2_t __riscv_xl_vdscmul_vs_i16mf2_i16mf2(vint16mf2_t vs2, vint16mf2_t vs1, size_t vl);
vint16m1_t __riscv_xl_vdscmul_vs_i16m1_i16m1(vint16m1_t vs2, vint16m1_t vs1, size_t vl);
vint16m2_t __riscv_xl_vdscmul_vs_i16m2_i16m2(vint16m2_t vs2, vint16m2_t vs1, size_t vl);
vint16m4_t __riscv_xl_vdscmul_vs_i16m4_i16m4(vint16m4_t vs2, vint16m4_t vs1, size_t vl);
vint16m8_t __riscv_xl_vdscmul_vs_i16m8_i16m8(vint16m8_t vs2, vint16m8_t vs1, size_t vl);
vint32mf2_t __riscv_xl_vdscmul_vs_i32mf2_i32mf2(vint32mf2_t vs2, vint32mf2_t vs1, size_t vl);
vint32m1_t __riscv_xl_vdscmul_vs_i32m1_i32m1(vint32m1_t vs2, vint32m1_t vs1, size_t vl);
vint32m2_t __riscv_xl_vdscmul_vs_i32m2_i32m2(vint32m2_t vs2, vint32m2_t vs1, size_t vl);
vint32m4_t __riscv_xl_vdscmul_vs_i32m4_i32m4(vint32m4_t vs2, vint32m4_t vs1, size_t vl);
vint32m8_t __riscv_xl_vdscmul_vs_i32m8_i32m8(vint32m8_t vs2, vint32m8_t vs1, size_t vl);
vdsmacini.v
sew = 8/16/32 none_m_preds[none,_m]
vint8mf8_t __riscv_xl_vdsmacini_v_i8mf8(vint8mf8_t vs2, size_t vl);
vint8mf4_t __riscv_xl_vdsmacini_v_i8mf4(vint8mf4_t vs2, size_t vl);
vint8mf2_t __riscv_xl_vdsmacini_v_i8mf2(vint8mf2_t vs2, size_t vl);
vint8m1_t __riscv_xl_vdsmacini_v_i8m1(vint8m1_t vs2, size_t vl);
vint8m2_t __riscv_xl_vdsmacini_v_i8m2(vint8m2_t vs2, size_t vl);
vint8m4_t __riscv_xl_vdsmacini_v_i8m4(vint8m4_t vs2, size_t vl);
vint8m8_t __riscv_xl_vdsmacini_v_i8m8(vint8m8_t vs2, size_t vl);
vint16mf4_t __riscv_xl_vdsmacini_v_i16mf4(vint16mf4_t vs2, size_t vl);
vint16mf2_t __riscv_xl_vdsmacini_v_i16mf2(vint16mf2_t vs2, size_t vl);
vint16m1_t __riscv_xl_vdsmacini_v_i16m1(vint16m1_t vs2, size_t vl);
vint16m2_t __riscv_xl_vdsmacini_v_i16m2(vint16m2_t vs2, size_t vl);
vint16m4_t __riscv_xl_vdsmacini_v_i16m4(vint16m4_t vs2, size_t vl);
vint16m8_t __riscv_xl_vdsmacini_v_i16m8(vint16m8_t vs2, size_t vl);
vint32mf2_t __riscv_xl_vdsmacini_v_i32mf2(vint32mf2_t vs2, size_t vl);
vint32m1_t __riscv_xl_vdsmacini_v_i32m1(vint32m1_t vs2, size_t vl);
vint32m2_t __riscv_xl_vdsmacini_v_i32m2(vint32m2_t vs2, size_t vl);
vint32m4_t __riscv_xl_vdsmacini_v_i32m4(vint32m4_t vs2, size_t vl);
vint32m8_t __riscv_xl_vdsmacini_v_i32m8(vint32m8_t vs2, size_t vl);
vdsmacini.s
sew = 8/16/32 none_m_preds[none,_m]
vint8mf8_t __riscv_xl_vdsmacini_x_i8mf8(int8_t rs1, size_t vl);
vint8mf4_t __riscv_xl_vdsmacini_x_i8mf4(int8_t rs1, size_t vl);
vint8mf2_t __riscv_xl_vdsmacini_x_i8mf2(int8_t rs1, size_t vl);
vint8m1_t __riscv_xl_vdsmacini_x_i8m1(int8_t rs1, size_t vl);
vint8m2_t __riscv_xl_vdsmacini_x_i8m2(int8_t rs1, size_t vl);
vint8m4_t __riscv_xl_vdsmacini_x_i8m4(int8_t rs1, size_t vl);
vint8m8_t __riscv_xl_vdsmacini_x_i8m8(int8_t rs1, size_t vl);
vint16mf4_t __riscv_xl_vdsmacini_x_i16mf4(int16_t rs1, size_t vl);
vint16mf2_t __riscv_xl_vdsmacini_x_i16mf2(int16_t rs1, size_t vl);
vint16m1_t __riscv_xl_vdsmacini_x_i16m1(int16_t rs1, size_t vl);
vint16m2_t __riscv_xl_vdsmacini_x_i16m2(int16_t rs1, size_t vl);
vint16m4_t __riscv_xl_vdsmacini_x_i16m4(int16_t rs1, size_t vl);
vint16m8_t __riscv_xl_vdsmacini_x_i16m8(int16_t rs1, size_t vl);
vint32mf2_t __riscv_xl_vdsmacini_x_i32mf2(int32_t rs1, size_t vl);
vint32m1_t __riscv_xl_vdsmacini_x_i32m1(int32_t rs1, size_t vl);
vint32m2_t __riscv_xl_vdsmacini_x_i32m2(int32_t rs1, size_t vl);
vint32m4_t __riscv_xl_vdsmacini_x_i32m4(int32_t rs1, size_t vl);
vint32m8_t __riscv_xl_vdsmacini_x_i32m8(int32_t rs1, size_t vl);
vdsmac.vv/vs
sew = 8/16/32 none_m_preds[none,_m]
Tip
intrinsic 的名字 参考 vdsmul.vv/vs intrinsic 的名字,只需要将 vdsmul 替换为 vdsmac 即可。
vdsmaco.vv/vs
sew = 8/16/32 full_preds
Tip
同上,只需要将 vdsmul 替换为 vdsmaco 即可
vlsb.v
sew = 8/16/32 full_preds
Tip
intrinsic 的名字参考 vdsmacini.v intrinsic 的名字,只需要将 vdsmacini 替换为 vlsb 即可。
vconj.v
sew = 8/16/32 full_preds
Tip
intrinsic 的名字参考 vdsmacini.v intrinsic 的名字,只需要将 vdsmacini 替换为 vconj 即可。
vdscmul.vv/vs
sew = 8/16/32 full_preds
Tip
intrinsic 的名字 参考 vdsmul.vv/vs intrinsic 的名字,只需要将 vdsmul 替换为 vdscmul 即可。
vdscmulj.vv/vs
sew = 8/16/32 full_preds
Tip
intrinsic 的名字 参考 vdsmul.vv/vs intrinsic 的名字,只需要将 vdsmul 替换为 vdscmulj 即可。
vdscredsum.v
sew = 8/16/32 full_preds
Tip
intrinsic 的名字参考 vdsmacini.v intrinsic 的名字,只需要将 vdsmacini 替换为 vdscredsum 即可。
vdscmac.vv/vs
sew = 8/16/32 none_m_preds[none,_m]
Tip
intrinsic 的名字 参考 vdsmul.vv/vs intrinsic 的名字,只需要将 vdsmul 替换为 vdscmac 即可。
vdscmacj.vv/vs
sew = 8/16/32 none_m_preds[none,_m]
Tip
intrinsic 的名字 参考 vdsmul.vv/vs intrinsic 的名字,只需要将 vdsmul 替换为 vdscmacj 即可。
vdscmaco.vv/vs
sew = 8/16/32 full_preds
Tip
intrinsic 的名字 参考 vdsmul.vv/vs intrinsic 的名字,只需要将 vdsmul 替换为 vdscmaco 即可。
vdscmacjo.vv/vs
sew = 8/16/32 full_preds
Tip
intrinsic 的名字 参考 vdsmul.vv/vs intrinsic 的名字,只需要将 vdsmul 替换为 vdscmacjo 即可。
vdscmacor.vv
sew = 32 full_preds
vint32mf2_t __riscv_xl_vdscmacor_vv_i32mf2(vint32mf2_t vs2, vint32mf2_t vs1, size_t vl);
vint32m1_t __riscv_xl_vdscmacor_vv_i32m1(vint32m1_t vs2, vint32m1_t vs1, size_t vl);
vint32m2_t __riscv_xl_vdscmacor_vv_i32m2(vint32m2_t vs2, vint32m2_t vs1, size_t vl);
vint32m4_t __riscv_xl_vdscmacor_vv_i32m4(vint32m4_t vs2, vint32m4_t vs1, size_t vl);
vint32m8_t __riscv_xl_vdscmacor_vv_i32m8(vint32m8_t vs2, vint32m8_t vs1, size_t vl);
vdscmacor.vs
sew = 32 full_preds
vint32mf2_t __riscv_xl_vdscmacor_vs_i32mf2_i32mf2(vint32mf2_t vs2, vint32mf2_t vs1, size_t vl);
vint32m1_t __riscv_xl_vdscmacor_vs_i32m1_i32m1(vint32m1_t vs2, vint32m1_t vs1, size_t vl);
vint32m2_t __riscv_xl_vdscmacor_vs_i32m2_i32m2(vint32m2_t vs2, vint32m2_t vs1, size_t vl);
vint32m4_t __riscv_xl_vdscmacor_vs_i32m4_i32m4(vint32m4_t vs2, vint32m4_t vs1, size_t vl);
vint32m8_t __riscv_xl_vdscmacor_vs_i32m8_i32m8(vint32m8_t vs2, vint32m8_t vs1, size_t vl);
vdscmacoi.vv/vs
sew = 32 full_preds
Tip
intrinsic 的名字 参考 vdscmacor.vv/vs intrinsic 的名字,只需要将 vdscmacor 替换为 vdscmacoi 即可。
vdscmacjor.vv/vs
sew = 32 full_preds
Tip
intrinsic 的名字 参考 vdscmacor.vv/vs intrinsic 的名字,只需要将 vdscmacor 替换为 vdscmacjor 即可。
vdscmacjoi.vv/vs
sew = 32 full_preds
Tip
intrinsic 的名字 参考 vdscmacor.vv/vs intrinsic 的名字,只需要将 vdscmacor 替换为 vdscmacjoi 即可。
vdscmulr.vv/vs
sew = 8/16/32 full_preds
Tip
intrinsic 的名字 参考 vdsmul.vv/vs intrinsic 的名字,只需要将 vdsmul 替换为 vdscmulr 即可。
vdscmuli.vv/vs
sew = 8/16/32 full_preds
Tip
intrinsic 的名字 参考 vdsmul.vv/vs intrinsic 的名字,只需要将 vdsmul 替换为 vdscmuli 即可。
vdscmuljr.vv/vs
sew = 8/16/32 full_preds
Tip
intrinsic 的名字 参考 vdsmul.vv/vs intrinsic 的名字,只需要将 vdsmul 替换为 vdscmuljr 即可。
vdscmulji.vv/vs
sew = 8/16/32 full_preds
Tip
intrinsic 的名字 参考 vdsmul.vv/vs intrinsic 的名字,只需要将 vdsmul 替换为 vdscmulji 即可。
vdsredsum.v
sew = 8/16/32 full_preds
Tip
intrinsic 的名字参考 vdsmacini.v intrinsic 的名字,只需要将 vdsmacini 替换为 vdsredsum 即可。
vdsredsumn.vs vd, vs2, rs1
sew = 8/16/32 none_tu_preds
vint8mf8_t __riscv_xl_vdsredsumn_vx_i8mf8(vint8mf8_t vs2, int8_t rs1, size_t vl);
vint8mf4_t __riscv_xl_vdsredsumn_vx_i8mf4(vint8mf4_t vs2, int8_t rs1, size_t vl);
vint8mf2_t __riscv_xl_vdsredsumn_vx_i8mf2(vint8mf2_t vs2, int8_t rs1, size_t vl);
vint8m1_t __riscv_xl_vdsredsumn_vx_i8m1(vint8m1_t vs2, int8_t rs1, size_t vl);
vint8m2_t __riscv_xl_vdsredsumn_vx_i8m2(vint8m2_t vs2, int8_t rs1, size_t vl);
vint8m4_t __riscv_xl_vdsredsumn_vx_i8m4(vint8m4_t vs2, int8_t rs1, size_t vl);
vint8m8_t __riscv_xl_vdsredsumn_vx_i8m8(vint8m8_t vs2, int8_t rs1, size_t vl);
vint16mf4_t __riscv_xl_vdsredsumn_vx_i16mf4(vint16mf4_t vs2, int16_t rs1, size_t vl);
vint16mf2_t __riscv_xl_vdsredsumn_vx_i16mf2(vint16mf2_t vs2, int16_t rs1, size_t vl);
vint16m1_t __riscv_xl_vdsredsumn_vx_i16m1(vint16m1_t vs2, int16_t rs1, size_t vl);
vint16m2_t __riscv_xl_vdsredsumn_vx_i16m2(vint16m2_t vs2, int16_t rs1, size_t vl);
vint16m4_t __riscv_xl_vdsredsumn_vx_i16m4(vint16m4_t vs2, int16_t rs1, size_t vl);
vint16m8_t __riscv_xl_vdsredsumn_vx_i16m8(vint16m8_t vs2, int16_t rs1, size_t vl);
vint32mf2_t __riscv_xl_vdsredsumn_vx_i32mf2(vint32mf2_t vs2, int32_t rs1, size_t vl);
vint32m1_t __riscv_xl_vdsredsumn_vx_i32m1(vint32m1_t vs2, int32_t rs1, size_t vl);
vint32m2_t __riscv_xl_vdsredsumn_vx_i32m2(vint32m2_t vs2, int32_t rs1, size_t vl);
vint32m4_t __riscv_xl_vdsredsumn_vx_i32m4(vint32m4_t vs2, int32_t rs1, size_t vl);
vint32m8_t __riscv_xl_vdsredsumn_vx_i32m8(vint32m8_t vs2, int32_t rs1, size_t vl);
vdsredsumn.vi vd, vs2, uimm
sew = 8/16/32 none_tu_preds
Tip
intrinsic 的名字参考 vdsredsumn.vs intrinsic 的名字,只需要将 _vx 替换为 _vi 即可。 vdsredsumn.vs/vi 指令的rs1和uimm的值必须是整数[1,2,3,4]之内的。
vredmaxi.vv
sew = 8/16/32 full_preds
Tip
intrinsic 的名字 参考 vdsmul.vv intrinsic 的名字,只需要将 vdsmul 替换为 vredmaxi 即可。
vredmini.vv
sew = 8/16/32 full_preds
Tip
intrinsic 的名字 参考 vdsmul.vv intrinsic 的名字,只需要将 vdsmul 替换为 vredmini 即可。
vperm.vi
sew = 8/16/32 none_tu_preds
vint8mf8_t __riscv_xl_vperm_vi_i8mf8(vint8mf8_t vs2, int8_t rs1, size_t vl);
vint8mf4_t __riscv_xl_vperm_vi_i8mf4(vint8mf4_t vs2, int8_t rs1, size_t vl);
vint8mf2_t __riscv_xl_vperm_vi_i8mf2(vint8mf2_t vs2, int8_t rs1, size_t vl);
vint8m1_t __riscv_xl_vperm_vi_i8m1(vint8m1_t vs2, int8_t rs1, size_t vl);
vint8m2_t __riscv_xl_vperm_vi_i8m2(vint8m2_t vs2, int8_t rs1, size_t vl);
vint8m4_t __riscv_xl_vperm_vi_i8m4(vint8m4_t vs2, int8_t rs1, size_t vl);
vint8m8_t __riscv_xl_vperm_vi_i8m8(vint8m8_t vs2, int8_t rs1, size_t vl);
vint16mf4_t __riscv_xl_vperm_vi_i16mf4(vint16mf4_t vs2, int16_t rs1, size_t vl);
vint16mf2_t __riscv_xl_vperm_vi_i16mf2(vint16mf2_t vs2, int16_t rs1, size_t vl);
vint16m1_t __riscv_xl_vperm_vi_i16m1(vint16m1_t vs2, int16_t rs1, size_t vl);
vint16m2_t __riscv_xl_vperm_vi_i16m2(vint16m2_t vs2, int16_t rs1, size_t vl);
vint16m4_t __riscv_xl_vperm_vi_i16m4(vint16m4_t vs2, int16_t rs1, size_t vl);
vint16m8_t __riscv_xl_vperm_vi_i16m8(vint16m8_t vs2, int16_t rs1, size_t vl);
vint32mf2_t __riscv_xl_vperm_vi_i32mf2(vint32mf2_t vs2, int32_t rs1, size_t vl);
vint32m1_t __riscv_xl_vperm_vi_i32m1(vint32m1_t vs2, int32_t rs1, size_t vl);
vint32m2_t __riscv_xl_vperm_vi_i32m2(vint32m2_t vs2, int32_t rs1, size_t vl);
vint32m4_t __riscv_xl_vperm_vi_i32m4(vint32m4_t vs2, int32_t rs1, size_t vl);
vint32m8_t __riscv_xl_vperm_vi_i32m8(vint32m8_t vs2, int32_t rs1, size_t vl);
vfsl.vv
sew = 8/16/32 none_tu_preds
Tip
intrinsic 的名字 参考 vdsmul.vv intrinsic 的名字,只需要将 vdsmul 替换为 vfsl 即可。
vfsr.vv
sew = 8/16/32 none_tu_preds
Tip
intrinsic 的名字 参考 vdsmul.vv intrinsic 的名字,只需要将 vdsmul 替换为 vfsr 即可。
vlnlp0.v/vlnlp1.v
sew = 8/16/32 none_preds
该指令的intrinsic的使用需要满足以下关系
当VLEN=128时,LMUL=8
当VLEN=256时,LMUL=4
当VLEN=512时,LMUL=2
当VLEN=1024时,LMUL=1
编译器可通过 -march=*_zvl${vlen}b
来控制vlen的长度,其中 vlen
可取值{128,256,512,1024,…}等,默认不指定的情况下是128
_zvl1024b 以下intrinsic可以使用
vint8m1_t __riscv_xl_vlnlp0_v_i8m1(vint8m1_t vs, size_t vl);
vint16m1_t __riscv_xl_vlnlp0_v_i16m1(vint16m1_t vs, size_t vl);
vint32m1_t __riscv_xl_vlnlp0_v_i32m1(vint32m1_t vs, size_t vl);
vint8m1_t __riscv_xl_vlnlp1_v_i8m1(vint8m1_t vs, size_t vl);
vint16m1_t __riscv_xl_vlnlp1_v_i16m1(vint16m1_t vs, size_t vl);
vint32m1_t __riscv_xl_vlnlp1_v_i32m1(vint32m1_t vs, size_t vl);
_zvl512b 以下intrinsic可以使用
vint8m2_t __riscv_xl_vlnlp0_v_i8m2(vint8m2_t vs, size_t vl);
vint16m2_t __riscv_xl_vlnlp0_v_i16m2(vint16m2_t vs, size_t vl);
vint32m2_t __riscv_xl_vlnlp0_v_i32m2(vint32m2_t vs, size_t vl);
vint8m2_t __riscv_xl_vlnlp1_v_i8m2(vint8m2_t vs, size_t vl);
vint16m2_t __riscv_xl_vlnlp1_v_i16m2(vint16m2_t vs, size_t vl);
vint32m2_t __riscv_xl_vlnlp1_v_i32m2(vint32m2_t vs, size_t vl);
_zvl256b 以下intrinsic可以使用
vint8m4_t __riscv_xl_vlnlp0_v_i8m4(vint8m4_t vs, size_t vl);
vint16m4_t __riscv_xl_vlnlp0_v_i16m4(vint16m4_t vs, size_t vl);
vint32m4_t __riscv_xl_vlnlp0_v_i32m4(vint32m4_t vs, size_t vl);
vint8m4_t __riscv_xl_vlnlp1_v_i8m4(vint8m4_t vs, size_t vl);
vint16m4_t __riscv_xl_vlnlp1_v_i16m4(vint16m4_t vs, size_t vl);
vint32m4_t __riscv_xl_vlnlp1_v_i32m4(vint32m4_t vs, size_t vl);
_zvl128b 以下intrinsic可以使用
vint8m8_t __riscv_xl_vlnlp0_v_i8m8(vint8m8_t vs, size_t vl);
vint16m8_t __riscv_xl_vlnlp0_v_i16m8(vint16m8_t vs, size_t vl);
vint32m8_t __riscv_xl_vlnlp0_v_i32m8(vint32m8_t vs, size_t vl);
vint8m8_t __riscv_xl_vlnlp1_v_i8m8(vint8m8_t vs, size_t vl);
vint16m8_t __riscv_xl_vlnlp1_v_i16m8(vint16m8_t vs, size_t vl);
vint32m8_t __riscv_xl_vlnlp1_v_i32m8(vint32m8_t vs, size_t vl);
Tip
在使用上述intrinsic的时候,如果遇到以下这种 unrecognizable insn
错误:
vlnlp_m1.c: In function 'test_vlnlp0_v_i8m1':
vlnlp_m1.c:8:1: error: unrecognizable insn:
8 | }
| ^
(insn 7 4 11 2 (set (reg:RVVM1QI 134 [ <retval> ])
(if_then_else:RVVM1QI (unspec:RVVMF8BI [
(const_vector:RVVMF8BI repeat [
(const_int 1 [0x1])
])
(reg/v:SI 136 [ vl ])
(const_int 2 [0x2]) repeated x2
(const_int 0 [0])
(reg:SI 66 vl)
(reg:SI 67 vtype)
] UNSPEC_VPREDICATE)
(unspec:RVVM1QI [
(reg/v:RVVM1QI 135 [ vs ])
] UNSPEC_VLNLP0)
(unspec:RVVM1QI [
(reg:SI 0 zero)
] UNSPEC_VUNDEF))) "vlnlp_m1.c":7:10 -1
(nil))
during RTL pass: vregs
vlnlp_m1.c:8:1: internal compiler error: in extract_insn, at recog.cc:2812
0x7f8636076082 __libc_start_main
../csu/libc-start.c:308
Please submit a full bug report, with preprocessed source (by using -freport-bug).
Please include the complete backtrace with any bug report.
See <https://gcc.gnu.org/bugs/> for instructions.
可能就是vlen长度和intrinsic使用时的lmul没有对应导致的
vnle.vv/vs
sew = 8/16/32 full_preds
Tip
intrinsic 的名字 参考 vdsmul.vv/vs intrinsic 的名字,只需要将 vdsmul 替换为 vnle 即可。
vnlm.vv/vs
sew = 8/16/32 full_preds
Tip
intrinsic 的名字 参考 vdsmul.vv/vs intrinsic 的名字,只需要将 vdsmul 替换为 vnlm 即可。
vfcvt_b2h.v
none_tu_preds
vint16mf4_t __riscv_xl_vfcvt_b2h_v_i16mf4(vint8mf8_t vs2, size_t vl);
vint16mf2_t __riscv_xl_vfcvt_b2h_v_i16mf2(vint8mf4_t vs2, size_t vl);
vint16m1_t __riscv_xl_vfcvt_b2h_v_i16m1(vint8mf2_t vs2, size_t vl);
vint16m2_t __riscv_xl_vfcvt_b2h_v_i16m2(vint8m1_t vs2, size_t vl);
vint16m4_t __riscv_xl_vfcvt_b2h_v_i16m4(vint8m2_t vs2, size_t vl);
vint16m8_t __riscv_xl_vfcvt_b2h_v_i16m8(vint8m4_t vs2, size_t vl);
vfcvt_b2w.v
none_tu_preds
vint32mf2_t __riscv_xl_vfcvt_b2w_v_i32mf2(vint8mf8_t vs2, size_t vl);
vint32m1_t __riscv_xl_vfcvt_b2w_v_i32m1(vint8mf4_t vs2, size_t vl);
vint32m2_t __riscv_xl_vfcvt_b2w_v_i32m2(vint8mf2_t vs2, size_t vl);
vint32m4_t __riscv_xl_vfcvt_b2w_v_i32m4(vint8m1_t vs2, size_t vl);
vint32m8_t __riscv_xl_vfcvt_b2w_v_i32m8(vint8m2_t vs2, size_t vl);
vfcvt_h2w.v
none_tu_preds
vint32mf2_t __riscv_xl_vfcvt_h2w_v_i32mf2(vint16mf4_t vs2, size_t vl);
vint32m1_t __riscv_xl_vfcvt_h2w_v_i32m1(vint16mf2_t vs2, size_t vl);
vint32m2_t __riscv_xl_vfcvt_h2w_v_i32m2(vint16m1_t vs2, size_t vl);
vint32m4_t __riscv_xl_vfcvt_h2w_v_i32m4(vint16m2_t vs2, size_t vl);
vint32m8_t __riscv_xl_vfcvt_h2w_v_i32m8(vint16m4_t vs2, size_t vl);
vfcvt_p2c.v
none_tu_preds
vint32mf2_t __riscv_xl_vfcvt_p2c_v_i32mf2(vint16mf4_t vs2, size_t vl);
vint32m1_t __riscv_xl_vfcvt_p2c_v_i32m1(vint16mf2_t vs2, size_t vl);
vint32m2_t __riscv_xl_vfcvt_p2c_v_i32m2(vint16m1_t vs2, size_t vl);
vint32m4_t __riscv_xl_vfcvt_p2c_v_i32m4(vint16m2_t vs2, size_t vl);
vint32m8_t __riscv_xl_vfcvt_p2c_v_i32m8(vint16m4_t vs2, size_t vl);
vfcvt.h2b.v
none_tu_preds
vint8mf8_t __riscv_xl_vfcvt_h2b_v_i8mf8(vint16mf4_t vs2, size_t vl);
vint8mf4_t __riscv_xl_vfcvt_h2b_v_i8mf4(vint16mf2_t vs2, size_t vl);
vint8mf2_t __riscv_xl_vfcvt_h2b_v_i8mf2(vint16m1_t vs2, size_t vl);
vint8m1_t __riscv_xl_vfcvt_h2b_v_i8m1(vint16m2_t vs2, size_t vl);
vint8m2_t __riscv_xl_vfcvt_h2b_v_i8m2(vint16m4_t vs2, size_t vl);
vint8m4_t __riscv_xl_vfcvt_h2b_v_i8m4(vint16m8_t vs2, size_t vl);
vfcvt.w2b.v
none_tu_preds
vint8mf8_t __riscv_xl_vfcvt_w2b_v_i8mf8(vint32mf2_t vs2, size_t vl);
vint8mf4_t __riscv_xl_vfcvt_w2b_v_i8mf4(vint32m1_t vs2, size_t vl);
vint8mf2_t __riscv_xl_vfcvt_w2b_v_i8mf2(vint32m2_t vs2, size_t vl);
vint8m1_t __riscv_xl_vfcvt_w2b_v_i8m1(vint32m4_t vs2, size_t vl);
vint8m2_t __riscv_xl_vfcvt_w2b_v_i8m2(vint32m8_t vs2, size_t vl);
vfcvt.w2h.v
none_tu_preds
vint16mf4_t __riscv_xl_vfcvt_w2h_v_i16mf4(vint32mf2_t vs2, size_t vl);
vint16mf2_t __riscv_xl_vfcvt_w2h_v_i16mf2(vint32m1_t vs2, size_t vl);
vint16m1_t __riscv_xl_vfcvt_w2h_v_i16m1(vint32m2_t vs2, size_t vl);
vint16m2_t __riscv_xl_vfcvt_w2h_v_i16m2(vint32m4_t vs2, size_t vl);
vint16m4_t __riscv_xl_vfcvt_w2h_v_i16m4(vint32m8_t vs2, size_t vl);
vfcvt.c2p.v
none_tu_preds
vint16mf4_t __riscv_xl_vfcvt_c2p_v_i16mf4(vint32mf2_t vs2, size_t vl);
vint16mf2_t __riscv_xl_vfcvt_c2p_v_i16mf2(vint32m1_t vs2, size_t vl);
vint16m1_t __riscv_xl_vfcvt_c2p_v_i16m1(vint32m2_t vs2, size_t vl);
vint16m2_t __riscv_xl_vfcvt_c2p_v_i16m2(vint32m4_t vs2, size_t vl);
vint16m4_t __riscv_xl_vfcvt_c2p_v_i16m4(vint32m8_t vs2, size_t vl);