Convert 16-bit floating point value

void riscv_f16_to_f64(const float16_t *pSrc, float64_t *pDst, uint32_t blockSize)
void riscv_f16_to_float(const float16_t *pSrc, float32_t *pDst, uint32_t blockSize)
void riscv_f16_to_q15(const float16_t *pSrc, q15_t *pDst, uint32_t blockSize)
group f16_to_x

Functions

void riscv_f16_to_f64(const float16_t *pSrc, float64_t *pDst, uint32_t blockSize)

Converts the elements of the f16 vector to f64 vector.

Converts the elements of the 16 bit floating-point vector to 64 bit floating-point vector.

Parameters
  • pSrc[in] points to the f16 input vector

  • pDst[out] points to the f64 output vector

  • blockSize[in] number of samples in each vector

Returns

none

void riscv_f16_to_float(const float16_t *pSrc, float32_t *pDst, uint32_t blockSize)

Converts the elements of the f16 vector to f32 vector.

Converts the elements of the floating-point vector to Q31 vector.

Parameters
  • pSrc[in] points to the f16 input vector

  • pDst[out] points to the f32 output vector

  • blockSize[in] number of samples in each vector

Returns

none

void riscv_f16_to_q15(const float16_t *pSrc, q15_t *pDst, uint32_t blockSize)

Converts the elements of the f16 vector to Q15 vector.

Converts the elements of the floating-point vector to Q31 vector.

Details

The equation used for the conversion process is:

Scaling and Overflow Behavior

The function uses saturating arithmetic. Results outside of the allowable Q15 range [0x8000 0x7FFF] are saturated.

Note

In order to apply rounding in scalar version, the library should be rebuilt with the ROUNDING macro defined in the preprocessor section of project options.

Parameters
  • pSrc[in] points to the f16 input vector

  • pDst[out] points to the Q15 output vector

  • blockSize[in] number of samples in each vector

Returns

none