Probably the most amazing thing about OpenCL is its heterogeneous nature. An OpenCL kernel can run on just about any compute device in your computer, the CPU, the GPU or even a FPGA and it can all be orchestrated from the host with ease.
As you may be aware, 3rd generation Intel Core (and later) processors include an integrated graphics component and in the HD400 and later chips this compute power is not to be sniffed at and certainly worth exploiting however its not entirely clear how you access it. If like me you have a discrete graphics card you may be wondering as I did why the Intel GPU is not accessible.
Here’s what to do.
Boot your computer into the BIOS settings and look for a section probably entitled something like “System Agent”, under this menu :
- “Initiate Graphic Adapter” – set this to PCIe/PCI
- “iGPU Multi-Monitor” – set this to Enabled
Save your settings and re-boot.
Now visit the Intel website and download the appropriate graphics driver for your CPU, install it and re-boot once more, then when you open your device panel you can see the integrated Intel graphics device like this :
We’re ready to start programming.
Next you are going to need an OpenCL SDK so that you have the headers you need to build an OpenCL program (the drivers already have a run-time). It doesn’t really matter who’s you use, in my case I downloaded the Nvidia tools which are part of the CUDA SDK. Currently the download is here but may move at a later date.
Once installed you will need to set-up your project to access the SDK. In Visual Studio 2013 (12 is the same) select the property manager tab and select your build target, in my case I select “Debug | x64″ then double-click “Microsoft.Cpp.x64.user” so that you only modify properties for this project. Now you have the property dialog open select “VC++ Directories” and enter :
- Include Directories – $(CUDA_PATH)\include;$(IncludePath)
- Library Directories – $(CUDA_PATH)\lib\x64;$(LibraryPath)
The CUDA installer has conveniently created an environment variable called CUDA_PATH to make this nice and clean.
Now go to the “Linker” then “General” section and update :
- Additional Library Directories – $(CUDA_LIB_PATH);%(AdditionalLibraryDirectories)
Then “Linker”, “Input” and update :
- Additional Dependencies – OpenCL.lib;%(AdditionalDependencies)
Hit OK and we’re ready to go.
This is a little program to look for compute devices on your system and print out their capabilities :
This gives us output like this :
Number of OpenCL platforms found: 2 CL_PLATFORM_PROFILE : FULL_PROFILE CL_PLATFORM_VERSION : OpenCL 1.1 CUDA 6.0.1 CL_PLATFORM_NAME : NVIDIA CUDA CL_PLATFORM_VENDOR : NVIDIA Corporation CL_PLATFORM_EXTENSIONS : cl_khr_byte_addressable_store cl_khr_icd cl_khr_gl_sharing cl_n3d10_sharing cl_khr_d3d10_sharing cl_nv_d3d11_sharing cl_nv_compiler_options cl_nv_devicv_pragma_unroll Number of detected OpenCL devices: 2 GPU detected Device name is GeForce GTX 680 Device vendor is NVIDIA Corporation VENDOR ID: 0x10de Device max memory allocation: 512 mega-bytes Device global cacheline size: 128 bytes Device global mem: 2048 mega-bytes Maximum number of parallel compute units: 8 Maximum dimensions for global/local work-item IDs: 3 Maximum number of work-items in each dimension: ( 1024 1024 64 ) Maximum number of work-items in a work-group: 1024 GPU detected Device name is GeForce GTX 680 Device vendor is NVIDIA Corporation VENDOR ID: 0x10de Device max memory allocation: 512 mega-bytes Device global cacheline size: 128 bytes Device global mem: 2048 mega-bytes Maximum number of parallel compute units: 8 Maximum dimensions for global/local work-item IDs: 3 Maximum number of work-items in each dimension: ( 1024 1024 64 ) Maximum number of work-items in a work-group: 1024 CL_PLATFORM_PROFILE : FULL_PROFILE CL_PLATFORM_VERSION : OpenCL 1.2 CL_PLATFORM_NAME : Intel(R) OpenCL CL_PLATFORM_VENDOR : Intel(R) Corporation CL_PLATFORM_EXTENSIONS : cl_khr_fp64 cl_khr_icd cl_khr_global_int32_base_atomics cl_khr_atomics cl_khr_local_int32_base_atomics cl_khr_local_int32_extended_atomics cl_khr_byte_ntel_printf cl_ext_device_fission cl_intel_exec_by_local_thread cl_khr_gl_sharing cl_intl_khr_dx9_media_sharing cl_khr_d3d11_sharing Number of detected OpenCL devices: 1 CPU detected Device name is Intel(R) Core(TM) i7-3770K CPU @ 3.50GHz Device vendor is Intel(R) Corporation VENDOR ID: 0x8086 Device max memory allocation: 8159 mega-bytes Device global cacheline size: 64 bytes Device global mem: 32639 mega-bytes Maximum number of parallel compute units: 8 Maximum dimensions for global/local work-item IDs: 3 Maximum number of work-items in each dimension: ( 1024 1024 1024 ) Maximum number of work-items in a work-group: 1024
Lovely.
The post Getting Started with OpenCL appeared first on Oakdale Software Ltd.