On JavaOne 2014 Conference, IBM announced the CUDA4J application programming interface (API) to develop applications that can specify exactly when to use the graphics processing unit (GPU) for application processing. Many classes are available to manage operations between the CPU and the GPU.
You can use the CUDA4J API to develop Java applications that can directly invoke arbitrary kernels on the GPU, providing low-level control over processing.
The CUDA4J API enables you to develop applications that can move data between the Java heap and memory buffers on the GPU. On the GPU, C kernels can be invoked to process that data and the results can be moved back into the Java heap under the control of the application.
Not all application processing functions are suitable for offloading to the GPU, even those functions that might lend themselves to parallel processing capabilities. One such function is code page conversion. Although this operation might be suitable for parallel processing, there is a substantial and unavoidable cost in moving the data between the CPU and the GPU, regardless of the size of the data that needs converting.
You can check out the recorded slides here
Screenshots from NVIDIA Dev Blog