There's multiple different Ops designs supported in libND4j, and in this guide we'll try to explain how to build your very own operation. ## XYZ operations This kind of operations is actually split into multiple subtypes, based on element-access and result type: - Transform operations: These operations typically take some NDArray in, and change each element independent of others. - Reduction operations: These operations take some NDArray and dimensions, and return reduced NDArray (or scalar) back. I.e. sum along dimension(s). - Scalar operations: These operations are similar to transforms, but they only do arithmetic operations, and second operand is scalar. I.e. each element in given NDArray will add given scalar value. - Pairwise operations: These operations are between regular transform opeartions and scalar operations. I.e. element-wise addition of two NDArrays. - Random operations: Most of these operations related to random numbers distributions: Uniform, Gauss, Bernoulli etc. Despite differences between these operations, they are all using XZ/XYZ three-operand design, where X and Y are inputs, and Z is output. Data access in these operations is usually trivial, and loop based. I.e. most trivial loop for scalar transform will look like this: ```c++ for (Nd4jLong i = start; i < end; i++) { result[i] = OpType::op(x[i], scalar, extraParams); } ``` Operation used in this loop will be template-driven, and compiled statically. There are another loops implementation, depending on op group or strides within NDArrays, but idea will be the same all the time: each element of the NDArray will be accessed within loop. Now, let's take a look into typical XYZ op implementation. Here's how `Add` operation will look like: ```c++ template class Add { public: op_def static T op(T d1, T d2) { return d1 + d2; } // this signature will be used in Scalar loops op_def static T op(T d1, T d2, T *params) { return d1 + d2; } // this signature will be used in reductions op_def static T op(T d1) { return d1; } // op for MetaOps op_def static T op(T d1, T *params) { return d1 + params[0]; } }; ``` This particular operation is used in different XYZ op groups, but you see the idea: element-wise operation, which is invoked on each element in given NDArray. So, if you want to add new XYZ operation to libnd4j, you should just add operation implementation to file `includes/ops/ops.h`, and assign it to specific ops group in file `includes/loops/legacy_ops.h` together with some number unique to this ops group, i.e.: `(21, simdOps::Add)` After libnd4j is recompiled, this op will become available for legacy execution mechanism, NDArray wrappers, and `LegacyOp` wrappers (those are made to map legacy operations to CustomOps design for Graph). ## Custom operations Custom operations is a new concept, added recently and mostly suits SameDiff/Graph needs. For CustomOps we defined universal signature, with variable number of input/output NDArrays, and variable number of floating-point and integer arguments. However, there are some minor difference between various CustomOp declarations: - **DECLARE_OP**(string, int, int, bool): these operations take no fp/int arguments, and output shape equals to input shape. - **DECLARE_CONFIGURABLE_OP**(string, int, int, bool, int, int): these operations do take fp/int output arguments, and output shape equals to input shape. - **DECLARE_REDUCTION_OP**(string, int, int, bool, int, int): these operations do take fp/int output arguments, and output shape is calculated as Reduction. - **DECLARE_CUSTOM_OP**(string, int, int, bool, int, int): these operations return NDArray with custom shape, that usually depends on input and arguments. - **DECLARE_BOOLEAN_OP**(string, int, bool): these operations take some NDArrays and return scalar, where 0 is **False**, and other values are treated as **True**. Let's take a look at example CustomOp: ```c++ CUSTOM_OP_IMPL(tear, 1, -1, false, 0, -1) { auto input = INPUT_VARIABLE(0); REQUIRE_TRUE(!block.getIArguments()->empty(), 0, "At least 1 dimension should be specified for Tear"); std::vector dims(*block.getIArguments()); for (auto &v: dims) REQUIRE_TRUE(v >= 0 && v < input->rankOf(), 0, "Tear dimensions should be non-negative values, and lower then input rank. Got %i instead", v); auto tads = input->allTensorsAlongDimension(dims); for (int e = 0; e < tads->size(); e++) { auto outE = OUTPUT_VARIABLE(e); outE->assign(tads->at(e)); this->storeResult(block, e, *outE); } delete tads; return ND4J_STATUS_OK; } DECLARE_SHAPE_FN(tear) { auto inShape = inputShape->at(0); std::vector dims(*block.getIArguments()); if (dims.size() > 1) std::sort(dims.begin(), dims.end()); shape::TAD tad(inShape, dims.data(), (int) dims.size()); tad.createTadOnlyShapeInfo(); Nd4jLong numTads = shape::tadLength(inShape, dims.data(), (int) dims.size()); auto result = SHAPELIST(); for (int e = 0; e < numTads; e++) { int *newShape; COPY_SHAPE(tad.tadOnlyShapeInfo, newShape); result->push_back(newShape); } return result; } ``` In the example above, we declare `tear` CustomOp implementation, and shape function for this op. So, at the moment of op execution, we assume that we will either have output array(s) provided by end-user, or they will be generated with shape function. You can also see number of macros used, we'll cover those later as well. Beyond that - op execution logic is fairly simple & linear: Each new op implements protected member function `DeclarableOp::validateAndExecute(Block& block)`, and this method is eventually called either from GraphExecutioner, or via direct call, like `DeclarableOp::execute(Block& block)`. Important part of op declaration is input/output description for the op. I.e. as shown above: `CUSTOM_OP_IMPL(tear, 1, -1, false, 0, -1)`. This declaration means: - Op name: `tear` - Op expects at least 1 NDArray as input - Op returns unknown positive number of NDArrays as output - Op can't be run in-place, so under any circumstances original NDArray will stay intact - Op doesn't expect any T (aka floating point) arguments - Op expects unknown positive number of integer arguments. In case of this op it's dimensions to split input NDArray. Here's another example: `DECLARE_CUSTOM_OP(permute, 1, 1, true, 0, -2);` This declaration means: - Op name: `permute` - Op expects at least 1 NDArray as input - Op returns 1 NDArray as output - Op can be run in-place if needed (it means: input == output, and input is modified and returned as output) - Op doesn't expect any T arguments - Op expects unknown number of integer arguments OR no integer arguments at all. ## c++11 syntactic sugar In ops you can easily use c++11 features, including lambdas. In some cases it might be easiest way to build your custom op (or some part of it) via `NDArray::applyLambda` or `NDArray::applyPairwiseLambda`: ```c++ auto lambda = LAMBDA_TT(_x, _y) { return (_x + _y) * 2; }; x.applyPairwiseLambda(&y, lambda); ``` In this simple example, each element of NDArray `x` will get values set to `x[e] = (x[e] + y[e]) * 2`. ## Tests For tests libnd4j uses Google Tests suit. All tests are located at `tests_cpu/layers_tests` folder. Here's simple way to run those from command line: ``` cd tests_cpu cmake -G "Unix Makefiles" make -j 4 ./layers_tests/runtests ``` You can also use your IDE (i.e. Jetbrains CLion) to run tests via GUI. **PLEASE NOTE:** if you're considering submitting your new op to libnd4j repository via pull request - consider adding tests for it. Ops without tests won't be approved. ## Backend-specific operation GPU/MPI/whatever to be added soon. ## Utility macros We have number of utility macros, suitable for custom ops. Here they are: - **INPUT_VARIABLE**(int): this macro returns you NDArray at specified input index. - **OUTPUT_VARIABLE**(int): this macro returns you NDArray at specified output index. - **STORE_RESULT**(NDArray): this macro stores result to VariableSpace. - **STORE_2_RESULTS**(NDArray, NDArray): this macro stores results accordingly to VariableSpace. - **INT_ARG**(int): this macro returns you specific Integer argument passed to the given op. - **T_ARG**(int): this macro returns you specific T argument passed to the given op. - **ALLOCATE**(...): this macro check if Workspace is available, and either uses Workspace or direct memory allocation if Workspace isn't available. - **RELEASE**(...): this macro is made to release memory allocated with **ALLOCATE()** macro. - **REQUIRE_TRUE**(...): this macro takes condition, and evaluates it. If evaluation doesn't end up as True - exception is raised, and specified message is printed out. - **LAMBDA_T**(X) and **LAMBDA_TT**(X, Y): lambda declaration for `NDArray::applyLambda` and `NDArray::applyPairwiseLambda` - **COPY_SHAPE**(SRC, TGT): this macro allocates memory for TGT pointer and copies shape from SRC pointer - **ILAMBDA_T**(X) and **ILAMBDA_TT**(X, Y): lambda declaration for indexed lambdas, index argument is passed in as Nd4jLong (aka **long long**) - **FORCEINLINE**: platform-specific definition for functions inlining