Does anyone have any good methods for debugging kernels?

Sep 1, 2013 at 10:23 PM
I would like to step through a kernel I built while using Has anyone had any luck with kernel debugging?

I tried CodeXL but it does not ketch any breakpoints in my file.
Sep 4, 2013 at 2:23 AM
Edited Sep 9, 2013 at 2:48 AM
fyi, received some help from AMD but no luck yet on getting kernel breakpoints working.

Re: Debug kernels (C# / .net)
From: Uri Shomroni Sep 2, 2013 6:00 AM

Hi sunsetquest,

There is a known issue with the way CodeXL performs interception of the OpenCL API with CLR (.NET) applications.

As a workaround:
  1. Navigate to the CodeXL folder (usually, C:\Program Files (x86)\AMD\CodeXL\ )
  2. Open the spies\ subdirectory
    2a. If the application you are debugging is a 64-bit application, open the spies64\ subdirectory instead.
  3. Copy OpenCL.dll and OpenCL.pdb to the folder where your debugged executable (.exe) resides.
    3a. If you also use and want to debug OpenGL code, you need to copy opengl32.dll and opengl32.pdb from the spies directory to the same path.
  4. Run the project as usual in CodeXL
  5. When you are done using CodeXL, remove the files you copied from the application folder, or you won't be able to run the application outside of CodeXL.
Hope this helps,
Uri Shomroni
Advanced Micro Devices

Re: Debug kernels (C# / .net)
From: Sunsetquest Sep 2, 2013 12:09 PM

Hi Uri, I tried your instructions and I think I'm closer. I copied the opencl.dll and opencl.pdb file into the folder where the .exe file is and ran the debug but its still not the hitting the breakpoints. I think its not hitting breakpoints because loads the file via a string and then hands it off to the opencl wrapper.

Here 'source' is a string:

program = Cl.CreateProgramWithSource(_context, 1, new[] { source }, new[] { (IntPtr)source.Length }, out error); there is no way for the CodeXL debugger to really know that a file exists. (this is just a guess of mine though)

I also briefly tried to get the workaround to work the 'Cloo' wrapper but I could not get it work there either - I did not spend much time with the Cloo wrapper however.

Thank you for your help. I am defiantly closer now.
Sep 9, 2013 at 2:32 AM
With Uri's help from AMD we got kernel debugging to work for

From: Uri Shomroni Sep 8, 2013 7:59 AM

Hi sunsetquest, When an OpenCL program is created via clCreateProgramWithSource, CodeXL should intercept that function call and compare the source string against available *.cl files (available = part of the project in Visual Studio, or present in the "Kernel sources" folder in standalone CodeXL). However, if the cl source does not match any of those files (or there are no such files), CodeXL still logs the creation of the program - some applications have OpenCL kernels embedded in the source as string constants, or even generate OpenCL code on-the-fly, and CodeXL supports those as well:

When an CodeXL thinks the application "generates" OpenCL source code, it will create a temporary cl source file and use that as the source for the kernel, showing it if you try to debug it.

CodeXL also has API-level debugging - it allows you to set breakpoint at OpenCL API functions, such as clCreateProgramWithSource or clEnqueueNDRangeKernel.

To debug an OpenCL kernel, you can also set an API breakpoint on clEnqueueNDRangeKernel (via CodeXL's "New CodeXL breakpoint..." dialog) - this should show the API call in your code if you debug with CodeXL. Once you get there, simply press "Step in" (F11) to go into kernel debugging. This will also take you to the aforementioned temporary kernel source file. The same is true for clEnqueueTask, if you happen to use that API.

I may have misread your original post, and the workaround I described might not be necessary, please try the following:
  1. In CodeXL's breakpoints dialog, set breakpoints on "clGetPlatformIDs", "clCreateContext" and "clCreateContextFromType"
  2. Debug with CodeXL. One of the breakpoints, most likely clGetPlatformIDs, will be hit.
  3. Now try undoing the workaround as I described in my original post.
  4. Debug with CodeXL and the three breakpoints again. If none of the breakpoints are hit, the workaround was necessary, restore it (until you no longer need to use CodeXL). If the same breakpoint is hit again, the workaround was not needed and you can use CodeXL without it.
  5. Once you're done verifying, you can remove those three breakpoints, and proceed to add clEnqueueNDRangeKernel as a breakpoint as explained in this post.
I hope this helps,
Uri Shomroni
Advanced Micro Devices

From: Sunsetquest Sep 8, 2013 7:16 PM

Hi Uri, Thank you for all your help. After some playing around it finally worked!!! The manual breakpoint and F11 you had me try fixed it. I can now debug c#/.net openCL applications using CodeXL… this rocks!

In a nutshell, after adding a CodeXL breakpoint at clEnqueueNDRangeKernel and starting debug, I would get a new tab with “No Source Available. No Symbols are loaded for any call stack frame.” When I would click the different options like Browse to find Source or Show Disassembly nothing would happen but then I noticed in the 'CodeXL Function Call history' that it appeared to be sitting at a breakpoint so I pressed F11 a few times and it jumped to the auto generated kernel in the temp folder you mentioned. From here, I could add breakpoints and debug as needed.

Some other notes…
•Copying the opencl.dll from the spies folder was not needed in my case.
•No matter where I put the a “” file with matching opencl I could not get OpenEX to use it. I tried it as a project file, in a ‘kernel source’ folder, any many other locations. This would be a nice feature in the future.

Thank you again Uri for your persistent help on this. Your support is A+++++;. I was about to give up on this and just use printf for debugging my kernels but this is a 1000 times better. Thank you again.

I will post this thread in the OpenCL.Net user forums so it can hopefully help others.

Here is what I did to get the debugging to work for C# / Visual studio 2010. I am not sure if this is the best way but it is the only way I had any luck with:
1.Start a new C# “Console Application” .
2.Go to the Project Properties and change the Target framework to “.NET Framework 4”
3.Right click on References then ‘add a reference’. Go to the Browse Tab and navigate to then click okay.
4.Replace all the code with the code below. (see below for the code to paste in)
5.Start CodeXL debugging (CodeXL-> Start CodeXL Debugging) It will fail with “Failed to launch debugged process” but that’s okay. It will create some folders that are needed.
6.Copy the ‘OpenCL.Net.dll’ file to the newly generated “obj\x86\Debug” folder. You should see your exe file in there.
7.Exit visual studio and reopen the project. (seems to help)
8.Go to CodeXL -> Breakpoints -> New CodeXL Breakpoint and then add clEnqueueNDRangeKernel. clSetKernelArg
9.Re-Start the CodeXL debugging (CodeXL-> Start CodeXL Debugging) and you will get to a “no source available” tab. Just press F11 a number of times and you should get to the auto generated source. From there you will be able to step through your opencl kernel as well as set breakpoints.
using System;
using System.Collections.Generic;
using System.Linq;
using System.Text;
using OpenCL.Net.Extensions;
using OpenCL.Net;

namespace OpenClDebugTest
    class TestProgram
        static void Main()
            const int count = 2048;

            // Lets create an random array of floats
            var random = new Random();
            float[] data = (from i in Enumerable.Range(0, count) select (float)random.NextDouble()).ToArray();

            // Create a compute device, create a context and a command queue
            Event event0; ErrorCode err;
            Platform[] platforms = Cl.GetPlatformIDs(out err);
            Device[] devices = Cl.GetDeviceIDs(platforms[0], DeviceType.Gpu, out err);
            Device device = devices[0]; //cl_device_id device;
            Context context = Cl.CreateContext(null, 1, devices, null, IntPtr.Zero, out err);
            CommandQueue cmdQueue = Cl.CreateCommandQueue(context, device, CommandQueueProperties.None, out err);

            // Create and build a program from our OpenCL-C source code
            string programSource = @"
            __kernel void doubleMe(__global float* input, __global float* output) 
                size_t i = get_global_id(0);
                output[i] = input[i] + input[i];
            Program program = Cl.CreateProgramWithSource(context, 1, new[] { programSource }, null, out err);
            Cl.BuildProgram(program, 0, null, string.Empty, null, IntPtr.Zero);  //"-cl-mad-enable"

            // Check for any compilation errors
            if (Cl.GetProgramBuildInfo(program, device, ProgramBuildInfo.Status, out err).CastTo<BuildStatus>() != BuildStatus.Success)
                if (err != ErrorCode.Success)
                    Console.WriteLine("ERROR: " + "Cl.GetProgramBuildInfo" + " (" + err.ToString() + ")");
                Console.WriteLine("Cl.GetProgramBuildInfo != Success");
                Console.WriteLine(Cl.GetProgramBuildInfo(program, device, ProgramBuildInfo.Log, out err));

            // Create a kernel from our program
            Kernel kernel = Cl.CreateKernel(program, "doubleMe", out err);

            // Allocate input and output buffers, and fill the input with data
            Mem memInput = (Mem)Cl.CreateBuffer(context, MemFlags.ReadOnly, sizeof(float) * count, out err);

            // Create an output memory buffer for our results
            Mem memoutput = (Mem)Cl.CreateBuffer(context, MemFlags.WriteOnly, sizeof(float) * count, out err);

            // Copy our host buffer of random values to the input device buffer
            Cl.EnqueueWriteBuffer(cmdQueue, (IMem)memInput, Bool.True, IntPtr.Zero, new IntPtr(sizeof(float) * count), data, 0, null, out event0);

            // Get the maximum number of work items supported for this kernel on this device
            IntPtr notused;
            InfoBuffer local = new InfoBuffer(new IntPtr(4));
            Cl.GetKernelWorkGroupInfo(kernel, device, KernelWorkGroupInfo.WorkGroupSize, new IntPtr(sizeof(int)), local, out notused);

            // Set the arguments to our kernel, and enqueue it for execution
            Cl.SetKernelArg(kernel, 0, new IntPtr(4), memInput);
            Cl.SetKernelArg(kernel, 1, new IntPtr(4), memoutput);
            Cl.SetKernelArg(kernel, 2, new IntPtr(4), count);
            IntPtr[] workGroupSizePtr = new IntPtr[] { new IntPtr(count) };
            Cl.EnqueueNDRangeKernel(cmdQueue, kernel, 1, null, workGroupSizePtr, null, 0, null, out event0);

            // Force the command queue to get processed, wait until all commands are complete

            // Read back the results
            float[] results = new float[count];
            Cl.EnqueueReadBuffer(cmdQueue, (IMem)memoutput, Bool.True, IntPtr.Zero, new IntPtr(count * sizeof(float)), results, 0, null, out event0);

            // Validate our results
            int correct = 0;
            for (int i = 0; i < count; i++)
                correct += (results[i] == data[i] + data[i]) ? 1 : 0;

            // Print a brief summary detailing the results
            Console.WriteLine("Computed {0} of {1} correct values!", correct.ToString(), count.ToString());
    //   - examples from (Derek Gerstmann) 
    //   - examples from (Ananth Balasubramaniam)