* - documenting and profiling matrix_set_diag cuda kernel
Signed-off-by: Yurii <yurii@skymind.io>
* - correct formula of pnorm pooling in cuda 2d/3d kernels
- remove helper matrix_diag which duplicates work of helper matrix_set_diag
Signed-off-by: Yurii <yurii@skymind.io>