Using NMSIS-NN
Here we will describe how to run the nmsis nn examples in Nuclei QEMU.
Preparation
Nuclei SDK,
master
branch(>= 0.7.0 release)Nuclei RISC-V GNU Toolchain 2024.06
Nuclei QEMU 2024.06
CMake >= 3.14
Python 3 and pip package requirements located in
<nuclei-sdk>/tools/scripts/requirements.txt
<NMSIS>/NMSIS/Scripts/requirements.txt
Tool Setup
Export PATH correctly for
qemu
andriscv64-unknown-elf-gcc
export PATH=/path/to/qemu/bin:/path/to/gcc/bin:$PATH
Build NMSIS NN Library
Download or clone NMSIS source code into NMSIS directory.
cd to NMSIS/NMSIS/ directory, if you want to add or remove more arches to be build, you can modify
Scripts/Build/nmsis_nn.json
fileBuild NMSIS NN library and strip debug information using
make gen_nn_lib
The nn library will be generated into
./Library/NN/GCC
folderThe nn libraries will be look like this:
$ ls -lhgG Library/NN/GCC/
total 64M
-rw-rw-r-- 1 619K Oct 20 11:55 libnmsis_nn_rv32imac.a
-rw-rw-r-- 1 983K Oct 20 11:55 libnmsis_nn_rv32imac_xxldsp.a
-rw-rw-r-- 1 986K Oct 20 11:55 libnmsis_nn_rv32imac_xxldspn1x.a
-rw-rw-r-- 1 989K Oct 20 11:55 libnmsis_nn_rv32imac_xxldspn2x.a
-rw-rw-r-- 1 992K Oct 20 11:55 libnmsis_nn_rv32imac_xxldspn3x.a
-rw-rw-r-- 1 607K Oct 20 11:55 libnmsis_nn_rv32imac_zba_zbb_zbc_zbs.a
-rw-rw-r-- 1 961K Oct 20 11:55 libnmsis_nn_rv32imac_zba_zbb_zbc_zbs_xxldsp.a
-rw-rw-r-- 1 964K Oct 20 11:55 libnmsis_nn_rv32imac_zba_zbb_zbc_zbs_xxldspn1x.a
-rw-rw-r-- 1 966K Oct 20 11:55 libnmsis_nn_rv32imac_zba_zbb_zbc_zbs_xxldspn2x.a
-rw-rw-r-- 1 969K Oct 20 11:55 libnmsis_nn_rv32imac_zba_zbb_zbc_zbs_xxldspn3x.a
-rw-rw-r-- 1 620K Oct 20 11:55 libnmsis_nn_rv32imafc.a
-rw-rw-r-- 1 984K Oct 20 11:55 libnmsis_nn_rv32imafc_xxldsp.a
-rw-rw-r-- 1 987K Oct 20 11:55 libnmsis_nn_rv32imafc_xxldspn1x.a
-rw-rw-r-- 1 990K Oct 20 11:55 libnmsis_nn_rv32imafc_xxldspn2x.a
-rw-rw-r-- 1 993K Oct 20 11:55 libnmsis_nn_rv32imafc_xxldspn3x.a
-rw-rw-r-- 1 608K Oct 20 11:55 libnmsis_nn_rv32imafc_zba_zbb_zbc_zbs.a
-rw-rw-r-- 1 962K Oct 20 11:55 libnmsis_nn_rv32imafc_zba_zbb_zbc_zbs_xxldsp.a
-rw-rw-r-- 1 965K Oct 20 11:55 libnmsis_nn_rv32imafc_zba_zbb_zbc_zbs_xxldspn1x.a
-rw-rw-r-- 1 968K Oct 20 11:55 libnmsis_nn_rv32imafc_zba_zbb_zbc_zbs_xxldspn2x.a
-rw-rw-r-- 1 970K Oct 20 11:55 libnmsis_nn_rv32imafc_zba_zbb_zbc_zbs_xxldspn3x.a
-rw-rw-r-- 1 716K Oct 20 11:55 libnmsis_nn_rv32imafc_zve32f.a
-rw-rw-r-- 1 848K Oct 20 11:55 libnmsis_nn_rv32imafc_zve32f_xxldsp.a
-rw-rw-r-- 1 851K Oct 20 11:55 libnmsis_nn_rv32imafc_zve32f_xxldspn1x.a
-rw-rw-r-- 1 854K Oct 20 11:55 libnmsis_nn_rv32imafc_zve32f_xxldspn2x.a
-rw-rw-r-- 1 856K Oct 20 11:55 libnmsis_nn_rv32imafc_zve32f_xxldspn3x.a
-rw-rw-r-- 1 695K Oct 20 11:55 libnmsis_nn_rv32imafc_zve32f_zba_zbb_zbc_zbs.a
-rw-rw-r-- 1 825K Oct 20 11:55 libnmsis_nn_rv32imafc_zve32f_zba_zbb_zbc_zbs_xxldsp.a
-rw-rw-r-- 1 828K Oct 20 11:55 libnmsis_nn_rv32imafc_zve32f_zba_zbb_zbc_zbs_xxldspn1x.a
-rw-rw-r-- 1 831K Oct 20 11:55 libnmsis_nn_rv32imafc_zve32f_zba_zbb_zbc_zbs_xxldspn2x.a
-rw-rw-r-- 1 834K Oct 20 11:55 libnmsis_nn_rv32imafc_zve32f_zba_zbb_zbc_zbs_xxldspn3x.a
-rw-rw-r-- 1 621K Oct 20 11:55 libnmsis_nn_rv32imafdc.a
-rw-rw-r-- 1 985K Oct 20 11:55 libnmsis_nn_rv32imafdc_xxldsp.a
-rw-rw-r-- 1 988K Oct 20 11:55 libnmsis_nn_rv32imafdc_xxldspn1x.a
-rw-rw-r-- 1 991K Oct 20 11:55 libnmsis_nn_rv32imafdc_xxldspn2x.a
-rw-rw-r-- 1 994K Oct 20 11:55 libnmsis_nn_rv32imafdc_xxldspn3x.a
-rw-rw-r-- 1 609K Oct 20 11:55 libnmsis_nn_rv32imafdc_zba_zbb_zbc_zbs.a
-rw-rw-r-- 1 963K Oct 20 11:55 libnmsis_nn_rv32imafdc_zba_zbb_zbc_zbs_xxldsp.a
-rw-rw-r-- 1 966K Oct 20 11:55 libnmsis_nn_rv32imafdc_zba_zbb_zbc_zbs_xxldspn1x.a
-rw-rw-r-- 1 969K Oct 20 11:55 libnmsis_nn_rv32imafdc_zba_zbb_zbc_zbs_xxldspn2x.a
-rw-rw-r-- 1 972K Oct 20 11:55 libnmsis_nn_rv32imafdc_zba_zbb_zbc_zbs_xxldspn3x.a
-rw-rw-r-- 1 718K Oct 20 11:55 libnmsis_nn_rv32imafdc_zve32f.a
-rw-rw-r-- 1 847K Oct 20 11:55 libnmsis_nn_rv32imafdc_zve32f_xxldsp.a
-rw-rw-r-- 1 850K Oct 20 11:55 libnmsis_nn_rv32imafdc_zve32f_xxldspn1x.a
-rw-rw-r-- 1 852K Oct 20 11:55 libnmsis_nn_rv32imafdc_zve32f_xxldspn2x.a
-rw-rw-r-- 1 855K Oct 20 11:55 libnmsis_nn_rv32imafdc_zve32f_xxldspn3x.a
-rw-rw-r-- 1 697K Oct 20 11:55 libnmsis_nn_rv32imafdc_zve32f_zba_zbb_zbc_zbs.a
-rw-rw-r-- 1 824K Oct 20 11:55 libnmsis_nn_rv32imafdc_zve32f_zba_zbb_zbc_zbs_xxldsp.a
-rw-rw-r-- 1 827K Oct 20 11:55 libnmsis_nn_rv32imafdc_zve32f_zba_zbb_zbc_zbs_xxldspn1x.a
-rw-rw-r-- 1 830K Oct 20 11:55 libnmsis_nn_rv32imafdc_zve32f_zba_zbb_zbc_zbs_xxldspn2x.a
-rw-rw-r-- 1 832K Oct 20 11:55 libnmsis_nn_rv32imafdc_zve32f_zba_zbb_zbc_zbs_xxldspn3x.a
-rw-rw-r-- 1 852K Oct 20 11:55 libnmsis_nn_rv64imac.a
-rw-rw-r-- 1 1.4M Oct 20 11:55 libnmsis_nn_rv64imac_xxldsp.a
-rw-rw-r-- 1 826K Oct 20 11:55 libnmsis_nn_rv64imac_zba_zbb_zbc_zbs.a
-rw-rw-r-- 1 1.4M Oct 20 11:55 libnmsis_nn_rv64imac_zba_zbb_zbc_zbs_xxldsp.a
-rw-rw-r-- 1 854K Oct 20 11:55 libnmsis_nn_rv64imafc.a
-rw-rw-r-- 1 1.4M Oct 20 11:55 libnmsis_nn_rv64imafc_xxldsp.a
-rw-rw-r-- 1 827K Oct 20 11:55 libnmsis_nn_rv64imafc_zba_zbb_zbc_zbs.a
-rw-rw-r-- 1 1.4M Oct 20 11:55 libnmsis_nn_rv64imafc_zba_zbb_zbc_zbs_xxldsp.a
-rw-rw-r-- 1 965K Oct 20 11:55 libnmsis_nn_rv64imafc_zve64f.a
-rw-rw-r-- 1 1.2M Oct 20 11:55 libnmsis_nn_rv64imafc_zve64f_xxldsp.a
-rw-rw-r-- 1 932K Oct 20 11:55 libnmsis_nn_rv64imafc_zve64f_zba_zbb_zbc_zbs.a
-rw-rw-r-- 1 1.2M Oct 20 11:55 libnmsis_nn_rv64imafc_zve64f_zba_zbb_zbc_zbs_xxldsp.a
-rw-rw-r-- 1 855K Oct 20 11:55 libnmsis_nn_rv64imafdc.a
-rw-rw-r-- 1 972K Oct 20 11:55 libnmsis_nn_rv64imafdcv.a
-rw-rw-r-- 1 1.2M Oct 20 11:55 libnmsis_nn_rv64imafdcv_xxldsp.a
-rw-rw-r-- 1 939K Oct 20 11:55 libnmsis_nn_rv64imafdcv_zba_zbb_zbc_zbs.a
-rw-rw-r-- 1 1.2M Oct 20 11:55 libnmsis_nn_rv64imafdcv_zba_zbb_zbc_zbs_xxldsp.a
-rw-rw-r-- 1 1.4M Oct 20 11:55 libnmsis_nn_rv64imafdc_xxldsp.a
-rw-rw-r-- 1 828K Oct 20 11:55 libnmsis_nn_rv64imafdc_zba_zbb_zbc_zbs.a
-rw-rw-r-- 1 1.4M Oct 20 11:55 libnmsis_nn_rv64imafdc_zba_zbb_zbc_zbs_xxldsp.a
library name with extra
_xxldsp
_xxldspn1x
_xxldspn2x
_xxldspn3x
is built with RISC-V DSP enabledThe examples are as follows:
libnmsis_nn_rv32imac.a
: Build for RISCV_ARCH=rv32imac without DSPlibnmsis_nn_rv32imac_xxldsp.a
: Build for RISCV_ARCH=rv32imac_xxldsp with Nuclei DSP enabledlibnmsis_nn_rv32imac_xxldspn1x.a
: Build for RISCV_ARCH=rv32imac_xxldspn1x with Nuclei N1 DSP extension enabledlibnmsis_nn_rv32imac_xxldspn2x.a
: Build for RISCV_ARCH=rv32imac_xxldspn2x with Nuclei N1/N2 DSP extension enabledlibnmsis_nn_rv32imac_xxldspn3x.a
: Build for RISCV_ARCH=rv32imac_xxldspn3x with Nuclei N1/N2/N3 DSP extension enabled
library name with extra
_zve32f
_zve64f
v
is built with RISC-V Vector enabledThe examples are as follows:
libnmsis_nn_rv32imafc_zve32f.a
: Build for RISCV_ARCH=rv32imafc_zve32f with Vector enabledlibnmsis_nn_rv32imafdc_zve32f.a
: Build for RISCV_ARCH=rv32imafdc_zve32f with Vector enabledlibnmsis_nn_rv64imafc_zve64f.a
: Build for RISCV_ARCH=rv64imafc_zve64f with Vector enabledlibnmsis_nn_rv64imafdcv.a
: Build for RISCV_ARCH=rv64imafdcv with Vector enabled
Note
This NMSIS 1.2.0 is a big change version, will no longer support old gcc 10 verison, and it now only support Nuclei Toolchain 2023.10. The
--march
option has changed a lot, such as:b
extension changed to_zba_zbb_zbc_zbs
extension,p
extension changed to_xxldsp
,_xxldspn1x
,_xxldspn2x
,_xxldspn3x
extensions which means stardard DSP extension, Nuclei N1, N2, N3 DSP extensionsv
extension changed tov
,_zve32f
,_zve64f
extensions
The name of libraries has changed with
-march
, for examples, the library namedlibnmsis_nn_rv32imacb.a
is now namedlibnmsis_nn_rv32imac_zba_zbb_zbc_zbs.a
sinceb
extension changed to_zba_zbb_zbc_zbs
_xxldspn1x
_xxldspn2x
_xxldspn3x
only valid for RISC-V 32bit processor._xxldsp
is valid for RISC-V 32/64 bit processorYou can also directly build both DSP and NN library using
make gen
You can strip the generated DSP and NN library using
make strip
DSP and Vector extension can be combined, such as
_xxldsp
,v
andv_xxldsp
, should notice the extension orderVector extension currently enabled for RISC-V 32/64 bit processor
NN library has no float16 data type, so here have no need to build float16 library
How to run
Set environment variables
NUCLEI_SDK_ROOT
andNUCLEI_SDK_NMSIS
, and set Nuclei SDK SoC to evalsoc, and change ilm/dlm size from 64K to 512K.
export NUCLEI_SDK_ROOT=/path/to/nuclei_sdk
export NUCLEI_SDK_NMSIS=/path/to/NMSIS/NMSIS
# Setup SDK development environment
cd $NUCLEI_SDK_ROOT
source setup.sh
cd -
# !!!!Take Care!!!!
# change this link script will make compiled example can only run on bitstream which has 512K ILM/DLM
# For Nuclei SDK < 0.7.0
sed -i "s/64K/512K/g" $NUCLEI_SDK_ROOT/SoC/evalsoc/Board/nuclei_fpga_eval/Source/GCC/gcc_evalsoc_ilm.ld
# For Nuclei SDK >= 0.7.0
sed -i 's/\([ID]LM_MEMORY_SIZE\).*/\1 = 0x80000;/' $NUCLEI_SDK_ROOT/SoC/evalsoc/Board/nuclei_fpga_eval/Source/GCC/evalsoc.memory
export SOC=evalsoc
Due to many of the examples could not be placed in 64K ILM and 64K DLM, and we are running using qemu, the ILM/DLM size in it are set to be 32MB, so we can change ilm/dlm to 512K/512K in the link script
$NUCLEI_SDK_ROOT/SoC/evalsoc/Board/nuclei_fpga_eval/Source/GCC/gcc_evalsoc_ilm.ld
or$NUCLEI_SDK_ROOT/SoC/evalsoc/Board/nuclei_fpga_eval/Source/GCC/evalsoc.memory
--- a/SoC/evalsoc/Board/nuclei_fpga_eval/Source/GCC/gcc_evalsoc_ilm.ld
+++ b/SoC/evalsoc/Board/nuclei_fpga_eval/Source/GCC/gcc_evalsoc_ilm.ld
@@ -30,8 +30,8 @@ __HEAP_SIZE = 2K;
MEMORY
{
- ilm (rxa!w) : ORIGIN = 0x80000000, LENGTH = 64K
- ram (wxa!r) : ORIGIN = 0x90000000, LENGTH = 64K
+ ilm (rxa!w) : ORIGIN = 0x80000000, LENGTH = 512K
+ ram (wxa!r) : ORIGIN = 0x90000000, LENGTH = 512K
}
Let us take
cifar10
for example:
cd $NUCLEI_SDK_NMSIS/NN/Examples/RISCV/cifar10/
Run with RISCV DSP enabled and Vector enabled NMSIS-NN library for CORE
nx900fd
# Clean project
make ARCH_EXT=v_xxldsp CORE=nx900fd clean
# Build project, enable ``v`` and ``xxldsp`` optimize
make ARCH_EXT=v_xxldsp CORE=nx900fd all
# Run application using qemu
make ARCH_EXT=v_xxldsp CORE=nx900fd run_qemu
Run with RISCV DSP disabled and Vector disabled NMSIS-NN library for CORE
nx900fd
make ARCH_EXT= CORE=nx900fd clean
make ARCH_EXT= CORE=nx900fd all
make ARCH_EXT= CORE=nx900fd run_qemu
Note
You can easily run this example in your hardware, if you have enough memory to run it, just modify the
SOC
to the one your are using in step 1.