[TensorFlow] Added headers from common_runtime/gpu/* #863

okdzhimiev · 2020-04-03T18:29:59Z

As requested by Samuel.

added those headers I needed
error log posted here

saudet · 2020-04-05T07:09:53Z

Yeah, that is going to require some amount of work to get all this mapped in a meaningful way...

BTW, Google isn't supporting the C++ API anymore, so this is all deprecated. We should, if possible, use the C API. Could you provide more details about which features you need to access?

/cc @karllessard

okdzhimiev · 2020-04-06T17:11:43Z

Basically what is needed is something like this:

TensorShape shape = TensorShape({256});
PlatformGpuId platform_gpu_id(0);

GPUMemAllocator *sub_allocator =
    new GPUMemAllocator(
        GpuIdUtil::ExecutorForPlatformGpuId(platform_gpu_id).ValueOrDie(),
        platform_gpu_id, false, {}, {});

GPUBFCAllocator *allocator =
    new GPUBFCAllocator(sub_allocator, shape.num_elements() * sizeof(DT_UINT8), "GPU_0_bfc");

auto inputTensor = Tensor(allocator, DT_UINT8, shape);

Ultimately I would like to be able to feed graph from GPU memory directly - for that 2 things are needed:

create a tensor in gpu memory. This is missing in JavaCPP. Need to:
- create tensor in gpu memory. See the code above.
- get pointer to memory to run through some CUDA kernel (currently doing with JCuda). Normally the pointer can be acquired via:

void* p =  t.tensor_data().data();
or
void* p = TF_TensorData(t);

run graph with specific options. This seems to exist in JavaCPP - didn't test though. Also, see direct_session_test.cc#L2387, the workflow is:
construct CallableOptions -> run makeCallable -> run runCallable

Notes:

Didn't know it's all deprecated. Though it's still there in the current master branch - TF 2.1.0
In the direct_session_test.cc they get gpu_tensor from one runCallable() then feed it to another runCallable(). I haven't explored if I could just do the same trick and 'bypass' GPUMemAllocator. If that worked then no changes are required to JavaCPP.
Also, I managed to add the required modifications to the built-in TF's JNI (here). First tried to build JavaCPP and contacted you, then went ahead with the TF's JNI.

saudet · 2020-04-07T02:51:11Z

If I follow you, what you would need is a way to allocate tensors in GPU memory directly and be able to specify which device exactly? Can you bring this up on the SIG JVM mailing list at https://groups.google.com/a/tensorflow.org/forum/#!forum/jvm or the Gitter channel at https://gitter.im/tensorflow/sig-jvm? The guys at Google have started using JavaCPP for their Java bindings, so TensorFlow is basically a downstream project of JavaCPP now...

Didn't know it's all deprecated. Though it's still there in the current master branch - TF 2.1.0

Yes, it looks like they will leave it there for a while, but from what I know it is no longer being updated, so will most likely either start to become unusable somewhere down the road or become the target of internal refactoring efforts without prior notice.

In the direct_session_test.cc they get gpu_tensor from one runCallable() then feed it to another runCallable(). I haven't explored if I could just do the same trick and 'bypass' GPUMemAllocator. If that worked then no changes are required to JavaCPP.

From what I understand of the way TensorFlow works is that all intput/output tensors are first allocated in host memory, but they can also have allocated GPU memory associated with them once they get used in sessions and what not, which TensorFlow manages. I stumbled on a nice thread about that at tensorflow/tensorflow#5902. It's not clear to me how any of this is supposed to help when we actually want to do everything manually through.

Also, I managed to add the required modifications to the built-in TF's JNI (here). First tried to build JavaCPP and contacted you, then went ahead with the TF's JNI.

That's cool, but like I said that's all deprecated so the SIG JVM will probably not want to use that anyway (unless this becomes part of the official upstream C API, which I would encourage you to contribute to). Let's see what these guys say though.

okdzhimiev · 2020-04-07T17:05:36Z

If I follow you, what you would need is a way to allocate tensors in GPU memory directly and be able to specify which device exactly?

Yes. Here's the workflow:

Open image steam - get resolution for the shape of tensor.
Create Tensor in gpu, acquire pointer beforehand - p.
Read image from input stream.
Run image through CUDA kernel (distortions, aberrations and other linear transforms, so NN is hardware agnostic), place result at p to avoid unnecessary transfers between device and host.
Run graph.

From what I understand of the way TensorFlow works is that all intput/output tensors are first allocated in host memory, but they can also have allocated GPU memory associated with them once they get used in sessions and what not, which TensorFlow manages. I stumbled on a nice thread about that at tensorflow/tensorflow#5902. It's not clear to me how any of this is supposed to help when we actually want to do everything manually through.

Yeah, I read that. With some help from @fierval - in my C++ test program tensor gets allocated as in #5902, comment #263944891 (also the GPUBFCAllocator in my previous comment here) then some code from direct_session_test.cc.

Thanks for your suggestions. I'll try to post to the resources you mentioned.

Added headers from common_runtime/gpu/*

268c8f4

okdzhimiev changed the title Added headers from common_runtime/gpu/* [TensorFlow] Added headers from common_runtime/gpu/* Apr 3, 2020

saudet added enhancement help wanted labels Apr 5, 2020

saudet mentioned this pull request May 13, 2020

Improve Indexer to allow an hyper-rectangular selection bytedeco/javacpp#391

Closed

saudet force-pushed the master branch from 7bb3ed0 to a986737 Compare July 16, 2020 15:22

saudet force-pushed the master branch from af48a32 to 8d341bb Compare December 6, 2020 00:47

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[TensorFlow] Added headers from common_runtime/gpu/* #863

[TensorFlow] Added headers from common_runtime/gpu/* #863

okdzhimiev commented Apr 3, 2020

saudet commented Apr 5, 2020

okdzhimiev commented Apr 6, 2020 •

edited

saudet commented Apr 7, 2020

okdzhimiev commented Apr 7, 2020 •

edited

[TensorFlow] Added headers from common_runtime/gpu/* #863

Are you sure you want to change the base?

[TensorFlow] Added headers from common_runtime/gpu/* #863

Conversation

okdzhimiev commented Apr 3, 2020

saudet commented Apr 5, 2020

okdzhimiev commented Apr 6, 2020 • edited

saudet commented Apr 7, 2020

okdzhimiev commented Apr 7, 2020 • edited

okdzhimiev commented Apr 6, 2020 •

edited

okdzhimiev commented Apr 7, 2020 •

edited