Fully-connected Layer Functions

riscv_status riscv_fully_connected_mat_q7_vec_q15(const q15_t *pV, const q7_t *pM, const uint16_t dim_vec, const uint16_t num_of_rows, const uint16_t bias_shift, const uint16_t out_shift, const q7_t *bias, q15_t *pOut, q15_t *vec_buffer)
riscv_status riscv_fully_connected_mat_q7_vec_q15_opt(const q15_t *pV, const q7_t *pM, const uint16_t dim_vec, const uint16_t num_of_rows, const uint16_t bias_shift, const uint16_t out_shift, const q7_t *bias, q15_t *pOut, q15_t *vec_buffer)
riscv_status riscv_fully_connected_q15(const q15_t *pV, const q15_t *pM, const uint16_t dim_vec, const uint16_t num_of_rows, const uint16_t bias_shift, const uint16_t out_shift, const q15_t *bias, q15_t *pOut, q15_t *vec_buffer)
riscv_status riscv_fully_connected_q15_opt(const q15_t *pV, const q15_t *pM, const uint16_t dim_vec, const uint16_t num_of_rows, const uint16_t bias_shift, const uint16_t out_shift, const q15_t *bias, q15_t *pOut, q15_t *vec_buffer)
riscv_status riscv_fully_connected_q7(const q7_t *pV, const q7_t *pM, const uint16_t dim_vec, const uint16_t num_of_rows, const uint16_t bias_shift, const uint16_t out_shift, const q7_t *bias, q7_t *pOut, q15_t *vec_buffer)
riscv_status riscv_fully_connected_q7_opt(const q7_t *pV, const q7_t *pM, const uint16_t dim_vec, const uint16_t num_of_rows, const uint16_t bias_shift, const uint16_t out_shift, const q7_t *bias, q7_t *pOut, q15_t *vec_buffer)
riscv_status riscv_fully_connected_s8(const int8_t *pInput, const int8_t *pWeight, const uint16_t col_dim, const uint16_t row_dim, const uint16_t nb_batches, const int32_t input_offset, const int32_t filter_offset, const int32_t out_mult, const int32_t out_shift, const int32_t output_offset, const int8_t *pBias, int8_t *pOut, const int32_t output_activation_min, const int32_t output_activation_max, q15_t *vec_buffer)
group FC

Perform fully-connected layer.

Fully-connected layer is basically a matrix-vector multiplication with bias. The matrix is the weights and the input/output vectors are the activation values. Supported {weight, activation} precisions include {8-bit, 8-bit}, {16-bit, 16-bit}, and {8-bit, 16-bit}.

Here we have two types of kernel functions. The basic function implements the function using regular GEMV approach. The opt functions operates with weights in interleaved formats.

Functions

riscv_status riscv_fully_connected_mat_q7_vec_q15(const q15_t *pV, const q7_t *pM, const uint16_t dim_vec, const uint16_t num_of_rows, const uint16_t bias_shift, const uint16_t out_shift, const q7_t *bias, q15_t *pOut, q15_t *vec_buffer)

Mixed Q15-Q7 fully-connected layer function.

Buffer size:

Return

The function returns RISCV_MATH_SUCCESS

Parameters
  • [in] pV: pointer to input vector

  • [in] pM: pointer to matrix weights

  • [in] dim_vec: length of the vector

  • [in] num_of_rows: number of rows in weight matrix

  • [in] bias_shift: amount of left-shift for bias

  • [in] out_shift: amount of right-shift for output

  • [in] bias: pointer to bias

  • [inout] pOut: pointer to output vector

  • [inout] vec_buffer: pointer to buffer space for input

vec_buffer size: 0

Q7_Q15 version of the fully connected layer

Weights are in q7_t and Activations are in q15_t

riscv_status riscv_fully_connected_mat_q7_vec_q15_opt(const q15_t *pV, const q7_t *pM, const uint16_t dim_vec, const uint16_t num_of_rows, const uint16_t bias_shift, const uint16_t out_shift, const q7_t *bias, q15_t *pOut, q15_t *vec_buffer)

Mixed Q15-Q7 opt fully-connected layer function.

Buffer size:

Return

The function returns RISCV_MATH_SUCCESS

Parameters
  • [in] pV: pointer to input vector

  • [in] pM: pointer to matrix weights

  • [in] dim_vec: length of the vector

  • [in] num_of_rows: number of rows in weight matrix

  • [in] bias_shift: amount of left-shift for bias

  • [in] out_shift: amount of right-shift for output

  • [in] bias: pointer to bias

  • [inout] pOut: pointer to output vector

  • [inout] vec_buffer: pointer to buffer space for input

vec_buffer size: 0

Q7_Q15 version of the fully connected layer

Weights are in q7_t and Activations are in q15_t

Limitation: x4 version requires weight reordering to work

Here we use only one pointer to read 4 rows in the weight matrix. So if the original q7_t matrix looks like this:

| a11 | a12 | a13 | a14 | a15 | a16 | a17 |

| a21 | a22 | a23 | a24 | a25 | a26 | a27 |

| a31 | a32 | a33 | a34 | a35 | a36 | a37 |

| a41 | a42 | a43 | a44 | a45 | a46 | a47 |

| a51 | a52 | a53 | a54 | a55 | a56 | a57 |

| a61 | a62 | a63 | a64 | a65 | a66 | a67 |

We operates on multiple-of-4 rows, so the first four rows becomes

| a11 | a21 | a12 | a22 | a31 | a41 | a32 | a42 |

| a13 | a23 | a14 | a24 | a33 | a43 | a34 | a44 |

| a15 | a25 | a16 | a26 | a35 | a45 | a36 | a46 |

The column left over will be in-order. which is: | a17 | a27 | a37 | a47 |

For the left-over rows, we do 1x1 computation, so the data remains as its original order.

So the stored weight matrix looks like this:

| a11 | a21 | a12 | a22 | a31 | a41 |

| a32 | a42 | a13 | a23 | a14 | a24 |

| a33 | a43 | a34 | a44 | a15 | a25 |

| a16 | a26 | a35 | a45 | a36 | a46 |

| a17 | a27 | a37 | a47 | a51 | a52 |

| a53 | a54 | a55 | a56 | a57 | a61 |

| a62 | a63 | a64 | a65 | a66 | a67 |

riscv_status riscv_fully_connected_q15(const q15_t *pV, const q15_t *pM, const uint16_t dim_vec, const uint16_t num_of_rows, const uint16_t bias_shift, const uint16_t out_shift, const q15_t *bias, q15_t *pOut, q15_t *vec_buffer)

Q15 opt fully-connected layer function.

Q15 basic fully-connected layer function.

Buffer size:

Return

The function returns RISCV_MATH_SUCCESS

Parameters
  • [in] pV: pointer to input vector

  • [in] pM: pointer to matrix weights

  • [in] dim_vec: length of the vector

  • [in] num_of_rows: number of rows in weight matrix

  • [in] bias_shift: amount of left-shift for bias

  • [in] out_shift: amount of right-shift for output

  • [in] bias: pointer to bias

  • [inout] pOut: pointer to output vector

  • [inout] vec_buffer: pointer to buffer space for input

vec_buffer size: 0

riscv_status riscv_fully_connected_q15_opt(const q15_t *pV, const q15_t *pM, const uint16_t dim_vec, const uint16_t num_of_rows, const uint16_t bias_shift, const uint16_t out_shift, const q15_t *bias, q15_t *pOut, q15_t *vec_buffer)

Q15 opt fully-connected layer function.

Buffer size:

Return

The function returns RISCV_MATH_SUCCESS

Parameters
  • [in] pV: pointer to input vector

  • [in] pM: pointer to matrix weights

  • [in] dim_vec: length of the vector

  • [in] num_of_rows: number of rows in weight matrix

  • [in] bias_shift: amount of left-shift for bias

  • [in] out_shift: amount of right-shift for output

  • [in] bias: pointer to bias

  • [inout] pOut: pointer to output vector

  • [inout] vec_buffer: pointer to buffer space for input

vec_buffer size: 0

Here we use only one pointer to read 4 rows in the weight matrix. So if the original matrix looks like this:

| a11 | a12 | a13 |

| a21 | a22 | a23 |

| a31 | a32 | a33 |

| a41 | a42 | a43 |

| a51 | a52 | a53 |

| a61 | a62 | a63 |

We operates on multiple-of-4 rows, so the first four rows becomes

| a11 | a12 | a21 | a22 | a31 | a32 | a41 | a42 |

| a13 | a23 | a33 | a43 |

Remaining rows are kept the same original order.

So the stored weight matrix looks like this:

| a11 | a12 | a21 | a22 | a31 | a32 | a41 | a42 |

| a13 | a23 | a33 | a43 | a51 | a52 | a53 | a61 |

| a62 | a63 |

riscv_status riscv_fully_connected_q7(const q7_t *pV, const q7_t *pM, const uint16_t dim_vec, const uint16_t num_of_rows, const uint16_t bias_shift, const uint16_t out_shift, const q7_t *bias, q7_t *pOut, q15_t *vec_buffer)

Q7 basic fully-connected layer function.

Buffer size:

Return

The function returns RISCV_MATH_SUCCESS

Parameters
  • [in] pV: pointer to input vector

  • [in] pM: pointer to matrix weights

  • [in] dim_vec: length of the vector

  • [in] num_of_rows: number of rows in weight matrix

  • [in] bias_shift: amount of left-shift for bias

  • [in] out_shift: amount of right-shift for output

  • [in] bias: pointer to bias

  • [inout] pOut: pointer to output vector

  • [inout] vec_buffer: pointer to buffer space for input

vec_buffer size: dim_vec

This basic function is designed to work with regular weight matrix without interleaving.

riscv_status riscv_fully_connected_q7_opt(const q7_t *pV, const q7_t *pM, const uint16_t dim_vec, const uint16_t num_of_rows, const uint16_t bias_shift, const uint16_t out_shift, const q7_t *bias, q7_t *pOut, q15_t *vec_buffer)

Q7 opt fully-connected layer function.

Buffer size:

Return

The function returns RISCV_MATH_SUCCESS

Parameters
  • [in] pV: pointer to input vector

  • [in] pM: pointer to matrix weights

  • [in] dim_vec: length of the vector

  • [in] num_of_rows: number of rows in weight matrix

  • [in] bias_shift: amount of left-shift for bias

  • [in] out_shift: amount of right-shift for output

  • [in] bias: pointer to bias

  • [inout] pOut: pointer to output vector

  • [inout] vec_buffer: pointer to buffer space for input

vec_buffer size: dim_vec

This opt function is designed to work with interleaved weight matrix. The vector input is assumed in q7_t format, we call riscv_q7_to_q15_no_shift_shuffle function to expand into q15_t format with certain weight re-ordering, refer to the function comments for more details. Here we use only one pointer to read 4 rows in the weight matrix. So if the original q7_t matrix looks like this:

| a11 | a12 | a13 | a14 | a15 | a16 | a17 |

| a21 | a22 | a23 | a24 | a25 | a26 | a27 |

| a31 | a32 | a33 | a34 | a35 | a36 | a37 |

| a41 | a42 | a43 | a44 | a45 | a46 | a47 |

| a51 | a52 | a53 | a54 | a55 | a56 | a57 |

| a61 | a62 | a63 | a64 | a65 | a66 | a67 |

We operates on multiple-of-4 rows, so the first four rows becomes

| a11 | a21 | a13 | a23 | a31 | a41 | a33 | a43 |

| a12 | a22 | a14 | a24 | a32 | a42 | a34 | a44 |

| a15 | a25 | a35 | a45 | a16 | a26 | a36 | a46 |

So within the kernel, we first read the re-ordered vector in as:

| b1 | b3 | and | b2 | b4 |

the four q31_t weights will look like

| a11 | a13 |, | a21 | a23 |, | a31 | a33 |, | a41 | a43 |

| a12 | a14 |, | a22 | a24 |, | a32 | a34 |, | a42 | a44 |

The column left over will be in-order. which is:

| a17 | a27 | a37 | a47 |

For the left-over rows, we do 1x1 computation, so the data remains as its original order.

So the stored weight matrix looks like this:

| a11 | a21 | a13 | a23 | a31 | a41 |

| a33 | a43 | a12 | a22 | a14 | a24 |

| a32 | a42 | a34 | a44 | a15 | a25 |

| a35 | a45 | a16 | a26 | a36 | a46 |

| a17 | a27 | a37 | a47 | a51 | a52 |

| a53 | a54 | a55 | a56 | a57 | a61 |

| a62 | a63 | a64 | a65 | a66 | a67 |

riscv_status riscv_fully_connected_s8(const int8_t *pInput, const int8_t *pWeight, const uint16_t col_dim, const uint16_t row_dim, const uint16_t nb_batches, const int32_t input_offset, const int32_t filter_offset, const int32_t out_mult, const int32_t out_shift, const int32_t output_offset, const int8_t *pBias, int8_t *pOut, const int32_t output_activation_min, const int32_t output_activation_max, q15_t *vec_buffer)

S8 basic fully-connected layer function for TF Lite.

Buffer size:

Return

The function returns RISCV_MATH_SUCCESS

Parameters
  • [in] pInput: pointer to pInput vector

  • [in] pWeight: pointer to matrix weights

  • [in] col_dim: dimension of the input vector

  • [in] row_dim: dimension of the output vector

  • [in] nb_batches: number of batches

  • [in] input_offset:

  • [in] filter_offset:

  • [in] out_mult: requantization parameter

  • [in] out_shift: requantization parameter

  • [in] output_offset:

  • [in] pBias: pointer to bias

  • [out] pOut: pointer to output vector

  • [in] output_activation_min: for clamping

  • [in] output_activation_max: for clamping

  • [inout] vec_buffer: pointer to buffer space for pInput

vec_buffer size: col_dim of word16.

This basic function is designed to work with regular pWeight matrix without interleaving.