NMSIS Bench and Test Helper Functions

group NMSIS_Core_Bench_Helpers

Functions that used to do benchmark and test suite.

NMSIS benchmark and test related helper functions are provided to help do benchmark and test case pass/fail assertion.

If you want to do calculate cpu cycle cost of a process, you can use BENCH_xxx macros defined in this.

In a single c source code file, you should include nmsis_bench.h, and then you should place BENCH_DECLARE_VAR(); before call other BENCH_xxx macros. If you want to start to do benchmark, you should only call BENCH_INIT(); once in your source code, and then place BENCH_START(proc_name); and BENCH_END(proc_name) before and after the process you want to measure. You can refer to <nuclei-sdk>/application/baremetal/demo_dsp for how to use it.

If you want to disable the benchmark calculation, you can place #define DISABLE_NMSIS_BENCH before include nmsis_bench.h

If in your c test source code, you can add NMSIS_TEST_PASS(); and NMSIS_TEST_FAIL(); to mark c test is pass or fail.

Defines

READ_CYCLE __get_rv_cycle

When XLEN=32, reading the full 64-bit CYCLE register incurs additional overhead.

BENCH_XLEN_MODE skips reading the upper 32 bits, reducing the extra cycle cost and allowing for more accurate measurements of small cycle counts.

NOTE: It is only applicable when the total cycle count does not exceed 2^32. Read the whole 64 bits value of MCYCLE register

BENCH_DECLARE_VAR()

Declare benchmark required variables, need to be placed above all BENCH_xxx macros in each c source code if BENCH_xxx used.

BENCH_INIT()

Initialize benchmark environment, need to called in before other BENCH_xxx macros are called.

BENCH_RESET(proc) _bc_sumcyc = 0; _bc_usecyc = 0; _bc_lpcnt = 0; _bc_ercd = 0;

Reset benchmark sum cycle and use cycle for proc.

BENCH_START(proc)

Start to do benchmark for proc, and record start cycle, and reset error code.

BENCH_SAMPLE(proc)

Sample a benchmark for proc, and record this start -> sample cost cycle, and accumulate it to sum cycle.

BENCH_END(proc)

Mark end of benchmark for proc, and calc used cycle, and print it.

BENCH_STOP(proc) printf("CSV, %s, %lu\n", #proc, (unsigned long)_bc_sumcyc);

Mark stop of benchmark, start -> sample -> sample -> stop, and print the sum cycle of a proc.

BENCH_STAT(proc) printf("STAT, %s, %lu, %lu\n", #proc, (unsigned long)_bc_lpcnt, (unsigned long)_bc_sumcyc);

Show statistics of benchmark, format: STAT, proc, loopcnt, sumcyc.

BENCH_GET_USECYC() (_bc_usecyc)

Get benchmark use cycle.

BENCH_GET_SUMCYC() (_bc_sumcyc)

Get benchmark sum cycle.

BENCH_GET_LPCNT() (_bc_lpcnt)

Get benchmark loop count.

BENCH_ERROR(proc) _bc_ercd = 1;

Mark benchmark for proc is errored.

BENCH_STATUS(proc)

Show the status of the benchmark.

EVENT_SEL_INSTRUCTION_COMMIT 0
EVENT_SEL_MEMORY_ACCESS 1
EVENT_SEL_TYPE_0 0
EVENT_SEL_TYPE_1 1
EVENT_SEL_TYPE_2 2
EVENT_SEL_TYPE_3 3
EVENT_INSTRUCTION_COMMIT_CYCLE_COUNT 1
EVENT_INSTRUCTION_COMMIT_RETIRED_COUNT 2
EVENT_INSTRUCTION_COMMIT_INTEGER_LOAD 3
EVENT_INSTRUCTION_COMMIT_INTEGER_STORE 4
EVENT_INSTRUCTION_COMMIT_ATOMIC_MEMORY_OPERATION 5
EVENT_INSTRUCTION_COMMIT_SYSTEM 6
EVENT_INSTRUCTION_COMMIT_INTEGER_COMPUTATIONAL 7
EVENT_INSTRUCTION_COMMIT_CONDITIONAL_BRANCH 8
EVENT_INSTRUCTION_COMMIT_TAKEN_CONDITIONAL_BRANCH 9
EVENT_INSTRUCTION_COMMIT_JAL 10
EVENT_INSTRUCTION_COMMIT_JALR 11
EVENT_INSTRUCTION_COMMIT_RETURN 12
EVENT_INSTRUCTION_COMMIT_CONTROL_TRANSFER 13
EVENT_INSTRUCTION_COMMIT_FENCE_INSTRUCTION 14
EVENT_INSTRUCTION_COMMIT_INTEGER_MULTIPLICATION 15
EVENT_INSTRUCTION_COMMIT_INTEGER_DIVISION_REMAINDER 16
EVENT_INSTRUCTION_COMMIT_FLOATING_POINT_LOAD 17
EVENT_INSTRUCTION_COMMIT_FLOATING_POINT_STORE 18
EVENT_INSTRUCTION_COMMIT_FLOATING_POINT_ADDITION_SUBTRACTION 19
EVENT_INSTRUCTION_COMMIT_FLOATING_POINT_MULTIPLICATION 20
EVENT_INSTRUCTION_COMMIT_FLOATING_POINT_FUSED_MULTIPLY_ADD_SUB 21
EVENT_INSTRUCTION_COMMIT_FLOATING_POINT_DIVISION_OR_SQUARE_ROOT 22
EVENT_INSTRUCTION_COMMIT_OTHER_FLOATING_POINT_INSTRUCTION 23
EVENT_INSTRUCTION_COMMIT_CONDITIONAL_BRANCH_PREDICTION_FAIL 24
EVENT_INSTRUCTION_COMMIT_JALR_PREDICTION_FAIL 25
EVENT_INSTRUCTION_COMMIT_POP_PREDICTION_FAIL 26
EVENT_INSTRUCTION_COMMIT_FENCEI_INSTRUCTION 27
EVENT_INSTRUCTION_COMMIT_SFENCE_INSTRUCTION 28
EVENT_INSTRUCTION_COMMIT_ECALL_INSTRUCTION 29
EVENT_INSTRUCTION_COMMIT_EXCEPTION_INSTRUCTION 30
EVENT_INSTRUCTION_COMMIT_INTERRUPT_INSTRUCTION 31
EVENT_MEMORY_ACCESS_ICACHE_MISS 1
EVENT_MEMORY_ACCESS_DCACHE_MISS 2
EVENT_MEMORY_ACCESS_ITLB_MISS 3
EVENT_MEMORY_ACCESS_DTLB_MISS 4
EVENT_MEMORY_ACCESS_MAIN_DTLB_MISS 5
EVENT_MEMORY_ACCESS_MAIN_TLB_MISS 5
EVENT_MEMORY_ACCESS_L2_CACHE_ACCESS 8
EVENT_MEMORY_ACCESS_L2_CACHE_MISS 9
EVENT_MEMORY_ACCESS_MEMORY_BUS_REQUEST 10
EVENT_MEMORY_ACCESS_IFU_STALL_CYCLE 11
EVENT_MEMORY_ACCESS_EXU_STALL_CYCLE 12
EVENT_MEMORY_ACCESS_TIMER 13
EVENT_TYPE_0_CYCLE_COUNT 1
EVENT_TYPE_0_RETIRED_COUNT 2
EVENT_TYPE_0_INTEGER_LOAD 3
EVENT_TYPE_0_INTEGER_STORE 4
EVENT_TYPE_0_ATOMIC_MEMORY_OPERATION 5
EVENT_TYPE_0_SYSTEM 6
EVENT_TYPE_0_INTEGER_COMPUTATIONAL 7
EVENT_TYPE_0_CONDITIONAL_BRANCH 8
EVENT_TYPE_0_TAKEN_CONDITIONAL_BRANCH 9
EVENT_TYPE_0_JAL 10
EVENT_TYPE_0_JALR 11
EVENT_TYPE_0_RETURN 12
EVENT_TYPE_0_CONTROL_TRANSFER 13
EVENT_TYPE_0_FENCE_INSTRUCTION 14
EVENT_TYPE_0_INTEGER_MULTIPLICATION 15
EVENT_TYPE_0_INTEGER_DIVISION_REMAINDER 16
EVENT_TYPE_0_FLOATING_POINT_LOAD 17
EVENT_TYPE_0_FLOATING_POINT_STORE 18
EVENT_TYPE_0_FLOATING_POINT_ADDITION_SUBTRACTION 19
EVENT_TYPE_0_FLOATING_POINT_MULTIPLICATION 20
EVENT_TYPE_0_FLOATING_POINT_FUSED_MULTIPLY_ADD_SUB 21
EVENT_TYPE_0_FLOATING_POINT_DIVISION_OR_SQUARE_ROOT 22
EVENT_TYPE_0_OTHER_FLOATING_POINT_INSTRUCTION 23
EVENT_TYPE_0_CONDITIONAL_BRANCH_PREDICTION_FAIL 24
EVENT_TYPE_0_JALR_PREDICTION_FAIL 25
EVENT_TYPE_0_POP_PREDICTION_FAIL 26
EVENT_TYPE_0_FENCEI_INSTRUCTION 27
EVENT_TYPE_0_SFENCE_INSTRUCTION 28
EVENT_TYPE_0_ECALL_INSTRUCTION 29
EVENT_TYPE_0_EXCEPTION_INSTRUCTION 30
EVENT_TYPE_0_INTERRUPT_INSTRUCTION 31
EVENT_TYPE_1_ICACHE_READ_MISS 1
EVENT_TYPE_1_DCACHE_RW_MISS 2
EVENT_TYPE_1_ITLB_READ_MISS 3
EVENT_TYPE_1_DTLB_RW_MISS 4
EVENT_TYPE_1_MAIN_TLB_MISS 5
EVENT_TYPE_1_L2_CACHE_ACCESS 8
EVENT_TYPE_1_L2_CACHE_MISS 9
EVENT_TYPE_1_MEMORY_BUS_REQUEST 10
EVENT_TYPE_1_IFU_STALL_CYCLE 11
EVENT_TYPE_1_EXU_STALL_CYCLE 12
EVENT_TYPE_1_TIMER 13
EVENT_TYPE_2_BRANCH_INSTRUCTION_COMMIT 2
EVENT_TYPE_2_BRANCH_PREDICT_FAIL_COMMIT 3
EVENT_TYPE_3_DCACHE_READ 0
EVENT_TYPE_3_DCACHE_READ_MISS 1
EVENT_TYPE_3_DCACHE_WRITE 2
EVENT_TYPE_3_DCACHE_WRITE_MISS 3
EVENT_TYPE_3_DCACHE_PREFETCH 4
EVENT_TYPE_3_DCACHE_PREFETCH_MISS 5
EVENT_TYPE_3_ICACHE_READ 6
EVENT_TYPE_3_ICACHE_PREFETCH 8
EVENT_TYPE_3_ICACHE_PREFETCH_MISS 9
EVENT_TYPE_3_L2_CACHE_READ 10
EVENT_TYPE_3_L2_CACHE_READ_MISS 11
EVENT_TYPE_3_L2_CACHE_WRITE 12
EVENT_TYPE_3_L2_CACHE_WRITE_MISS 13
EVENT_TYPE_3_L2_CACHE_PREFETCH_HIT 14
EVENT_TYPE_3_L2_CACHE_PREFETCH_MISS 15
EVENT_TYPE_3_DTLB_READ 16
EVENT_TYPE_3_DTLB_READ_MISS 17
EVENT_TYPE_3_DTLB_WRITE 18
EVENT_TYPE_3_DTLB_WRITE_MISS 19
EVENT_TYPE_3_ITLB_READ 20
EVENT_TYPE_3_BTB_READ 22
EVENT_TYPE_3_BTB_READ_MISS 23
EVENT_TYPE_3_BTB_WRITE 24
EVENT_TYPE_3_BTB_WRITE_MISS 25
MSU_EVENT_ENABLE 0x0F
MEVENT_EN 0x08
SEVENT_EN 0x02
UEVENT_EN 0x01
READ_HPM_COUNTER __get_hpm_counter
HPM_DECLARE_VAR(idx)

Declare high performance monitor counter idx benchmark required variables, need to be placed above all HPM_xxx macros in each c source code if HPM_xxx used.

HPM_SEL_ENABLE(ena) (ena << 28)
HPM_SEL_EVENT(sel, idx) ((sel) | (idx << 4))
HPM_EVENT(sel, idx, ena) (HPM_SEL_ENABLE(ena) | HPM_SEL_EVENT(sel, idx))

Construct a event variable to be set(sel -> event_sel, idx -> event_idx, ena -> m/s/u_enable)

HPM_INIT()

Initialize high performance monitor environment, need to called in before other HPM_xxx macros are called.

HPM_RESET(idx, proc, event) __hpm_sumcyc##idx = 0; __hpm_lpcnt##idx = 0;

Reset high performance benchmark for proc using counter which index is idx.

HPM_START(idx, proc, event)

Start to do high performance benchmark for proc, and record start hpm counter.

HPM_SAMPLE(idx, proc, event)

Do high performance benchmark sample for proc, and sum it into sum counter.

HPM_END(idx, proc, event)

Mark end of high performance benchmark for proc, and calc used hpm counter value.

HPM_STOP(idx, proc, event) printf("HPM%d:0x%x, %s, %lu\n", idx, event, #proc, (unsigned long)__hpm_sumcyc##idx);

Mark stop of hpm benchmark, start -> sample -> sample -> stop, and print the sum cycle of a proc.

HPM_STAT(idx, proc, event) printf("STATHPM%d:0x%x, %s, %lu, %lu\n", idx, event, #proc, (unsigned long)__hpm_lpcnt##idx, (unsigned long)__hpm_sumcyc##idx);

Show statistics of hpm benchmark, format: STATHPM::idx:event, proc, loopcnt, sumcyc.

HPM_GET_USECYC(idx) (__hpm_usecyc##idx)

Get hpm benchmark use cycle for counter idx.

HPM_GET_SUMCYC(idx) (__hpm_sumcyc##idx)

Get hpm benchmark sum cycle for counter idx.

HPM_GET_LPCNT(idx) (__hpm_lpcnt##idx)

Get hpm benchmark loop count for counter idx.

NMSIS_TEST_PASS() printf("\nNMSIS_TEST_PASS\n");

Mark test or application passed.

NMSIS_TEST_FAIL() printf("\nNMSIS_TEST_FAIL\n");

Mark test or application failed.

Functions

__STATIC_FORCEINLINE void __prepare_bench_env (void)

Prepare benchmark environment.

Prepare benchmark required environment, such as turn on necessary units like vpu, cycle, instret counters, hpm counters