编辑“︁
OpenCL
”︁
跳转到导航
跳转到搜索
Template:Editnotice load/content
警告:
您没有登录。如果您进行任何编辑,您的IP地址会公开展示。如果您
登录
或
创建账号
,您的编辑会以您的用户名署名,此外还有其他益处。
反垃圾检查。
不要
加入这个!
{{noteTA |G1 = IT }} {{distinguish|OpenGL}} {{Infobox software | name = OpenCL API | title = OpenCL API | logo = OpenCL logo.svg | logo caption = | logo size = 200px | logo alt = OpenCL logo | screenshot = <!-- Image name is enough. --> | caption = | screenshot size = | screenshot alt = | collapsible = | author = [[苹果公司]] | developer = [[科纳斯组织]] | released = {{Start date and age|2009|08|28}} | discontinued = | latest release version = {{wikidata|property|preferred|references|edit|P348|P548=Q2804309}} | latest release date = {{wikidata|qualifier|preferred|single|P348|P548=Q2804309|P577}} | latest preview version = | latest preview date = | programming language = [[C语言|C]],具有[[C++]]绑定 | operating system = [[Android]](厂商依赖)<ref>{{cite web |title=Android Devices With OpenCL support |url=https://docs.google.com/a/arrayfire.com/spreadsheets/d/1Mpzfl2NmLUVSAjIph77-FOsJeuyD9Xjha89r5iHw1hI/edit?pli=1#gid=0 |website=Google Docs |publisher=ArrayFire |access-date=April 28, 2015 |archive-date=2021-02-25 |archive-url=https://web.archive.org/web/20210225193922/https://docs.google.com/a/arrayfire.com/spreadsheets/d/1Mpzfl2NmLUVSAjIph77-FOsJeuyD9Xjha89r5iHw1hI/edit?pli=1#gid=0 |dead-url=no }}</ref>、[[FreeBSD]]<ref>{{cite web |title=FreeBSD Graphics/OpenCL |url=https://wiki.freebsd.org/Graphics/OpenCL |publisher=FreeBSD |access-date=December 23, 2015 |archive-date=2021-02-08 |archive-url=https://web.archive.org/web/20210208150957/https://wiki.freebsd.org/Graphics/OpenCL |dead-url=no }}</ref>、[[Linux]]、[[macOS]](通过PoCL)、[[Windows]] | platform = [[ARMv7]]、[[ARMv8]]<ref name="conformant-products">{{cite web |title=Conformant Products |url=https://www.khronos.org/conformance/adopters/conformant-products/opencl |publisher=Khronos Group |access-date=May 9, 2015 |archive-date=2018-06-29 |archive-url=https://web.archive.org/web/20180629130825/https://www.khronos.org/conformance/adopters/conformant-products/opencl |dead-url=no }}</ref>、[[Cell (微處理器)|Cell]]、[[IA-32]]、 [[IBM POWER微处理器|Power]]、[[x86-64]] | size = | language = | language count = <!-- Number only --> | language footnote = | genre = [[异构计算]][[API]] | license = OpenCL规范许可证 | alexa = | website = {{URL|https://www.khronos.org/opencl/}} | standard = }} {{Infobox programming language | name = OpenCL C和C++ for OpenCL | logo = | logo caption = | paradigm = [[指令式编程|指令式]]([[过程式编程|过程式]])、[[结构化编程|结构化]]、(仅C++)[[面向对象编程|面向对象]]、[[泛型编程|泛型]] | family = {{le|C家族编程语言列表|List of C-family programming languages|C}} | designer = | developer = | released = | latest release version = | latest release date = | latest preview version = | latest preview date = | typing = [[类型系统|静态]]、[[弱类型]]、{{le|明示类型|Manifest typing|明示}}、[[名義型別系統|名义]] | scope = | programming language = 特定于实现 | platform = | operating_system = | license = | file_ext = .cl .clcpp | file format = | website = | implementations = [[AMD]]、[[Mesa 3D|Gallium]] Compute、[[IBM]]、[[Intel]] NEO、Intel SDK、[[Texas Instruments]]、[[Nvidia]]、PoCL、[[ARM架構|ARM]] | dialects = | influenced_by = [[C99]]、[[CUDA]]、[[C++14]]、[[C++17]] | influenced = }} '''OpenCL'''({{langx|en|Open Computing Language}},中譯:'''开放计算语言'''),是一个为[[异构计算|异构平台]]编写程序的[[軟體框架|框架]],此异构平台可由[[CPU]]、[[GPU]]、[[數位訊號處理器|DSP]]、[[FPGA]]或其他类型的处理器與硬體加速器所组成。OpenCL由一门用于编写kernels(在OpenCL设备上运行的函数)的语言(基于[[C99]])和一组用于定义并控制平台的API组成。OpenCL提供了基于任务分割和数据分割的[[并行计算]]机制。 OpenCL类似于另外两个开放的工业标准[[OpenGL]]和[[OpenAL]],这两个标准分别用于三维图形和计算机音频方面。OpenCL擴充了GPU圖形生成之外的能力。OpenCL由非盈利性技术组织[[Khronos Group]]掌管。 == 历史 == OpenCL最初由[[苹果公司]]开发,拥有其商标权,并在与[[AMD]],[[IBM]],[[Intel]]和[[NVIDIA]]技术团队的合作之下初步完善。随后,苹果将这一草案提交至[[Khronos Group]]。2008年6月16日,Khronos的通用计算工作小组成立<ref>{{cite press release |url=http://www.khronos.org/news/press/releases/khronos_launches_heterogeneous_computing_initiative/ |title=Khronos Launches Heterogeneous Computing Initiative |accessdate=2008-06-18 |publisher=Khronos Group |date=2008-06-16 |deadurl=yes |archiveurl=https://web.archive.org/web/20080620123431/http://www.khronos.org/news/press/releases/khronos_launches_heterogeneous_computing_initiative/ |archivedate=2008-06-20 }}</ref>。5个月后的2008年11月18日,该工作组完成了OpenCL 1.0规范的技术细节<ref name=macWorld>{{cite web | url=http://www.macworld.com/article/136921/2008/11/opencl.html?lsrc=top_2 | title=OpenCL gets touted in Texas | publisher=MacWorld | date=2008-11-20 | accessdate=2009-06-12 | archive-date=2009-02-18 | archive-url=https://web.archive.org/web/20090218165557/http://www.macworld.com/article/136921/2008/11/opencl.html?lsrc=top_2 | dead-url=no }}</ref>。该技术规范在由Khronos成员进行审查之后,于2008年12月8日公开发表<ref name=khronosGroup>{{cite press release | url=http://www.khronos.org/news/press/releases/the_khronos_group_releases_opencl_1.0_specification/ | title=The Khronos Group Releases OpenCL 1.0 Specification | publisher=Khronos Group | date=2008-12-08 | accessdate=2009-06-12 | deadurl=yes | archiveurl=https://web.archive.org/web/20100713014204/http://www.khronos.org/news/press/releases/the_khronos_group_releases_opencl_1.0_specification/ | archivedate=2010-07-13 }}</ref>。 2010年6月14日,OpenCL 1.1发布<ref>{{cite press release | url=http://www.khronos.org/news/press/releases/khronos-group-releases-opencl-1-1-parallel-computing-standard/ | title=Khronos Drives Momentum of Parallel Computing Standard with Release of OpenCL 1.1 Specification | publisher=Khronos Group | date=2010-06-14 | accessdate=2010-10-13 | deadurl=yes | archiveurl=https://web.archive.org/web/20100923101844/http://www.khronos.org/news/press/releases/khronos-group-releases-opencl-1-1-parallel-computing-standard | archivedate=2010-09-23 }}</ref>。2011年11月15日,OpenCL 1.2发布<ref>{{cite web |url=https://www.khronos.org/news/press/khronos-releases-opencl-1.2-specification |title=Khronos Releases OpenCL 1.2 Specification |publisher=Khronos Group |date=November 15, 2011 |access-date=June 23, 2015 |archive-date=2020-12-02 |archive-url=https://web.archive.org/web/20201202103240/https://www.khronos.org/news/press/khronos-releases-opencl-1.2-specification |dead-url=no }}</ref>。2013年11月18日,OpenCL 2.0发布<ref>{{cite web |url=https://www.khronos.org/news/press/khronos-finalizes-opencl-2.0-specification-for-heterogeneous-computing |title=Khronos Finalizes OpenCL 2.0 Specification for Heterogeneous Computing |date=November 18, 2013 |access-date=February 10, 2014 |publisher=Khronos Group |archive-date=2020-11-11 |archive-url=https://web.archive.org/web/20201111201527/https://www.khronos.org/news/press/khronos-finalizes-opencl-2.0-specification-for-heterogeneous-computing |dead-url=no }}</ref>。2015年11月16日,OpenCL 2.1发布<ref>{{cite web |title=Khronos Releases OpenCL 2.1 and SPIR-V 1.0 Specifications for Heterogeneous Parallel Programming |url=https://www.khronos.org/news/press/khronos-releases-opencl-2.1-and-spir-v-1.0-specifications-for-heterogeneous |publisher=Khronos Group |access-date=November 16, 2015 |date=November 16, 2015 |archive-date=2020-11-08 |archive-url=https://web.archive.org/web/20201108122543/https://www.khronos.org/news/press/khronos-releases-opencl-2.1-and-spir-v-1.0-specifications-for-heterogeneous |dead-url=no }}</ref>。2017年5月16日,OpenCL 2.2发布<ref>{{cite web |url=https://www.khronos.org/news/press/khronos-releases-opencl-2.2-with-spir-v-1.2 |title=Khronos Releases OpenCL 2.2 With SPIR-V 1.2 |date=May 16, 2017 |publisher=[[Khronos Group]] |access-date=2024-10-13 |archive-date=2021-01-18 |archive-url=https://web.archive.org/web/20210118110308/https://www.khronos.org/news/press/khronos-releases-opencl-2.2-with-spir-v-1.2 |dead-url=no }}</ref>。 === 路线图 === 在2017年5月发行OpenCL 2.2之时,Khronos Group宣布OpenCL将尽可能的汇合于[[Vulkan]],以确使OpenCL软件在这两种API上灵活部署<ref>{{Cite web|url=https://www.pcper.com/reviews/General-Tech/Breaking-OpenCL-Merging-Roadmap-Vulkan|title=Breaking: OpenCL Merging Roadmap into Vulkan | PC Perspective|website=www.pcper.com|access-date=May 17, 2017|archive-url=https://web.archive.org/web/20171101062642/https://www.pcper.com/reviews/General-Tech/Breaking-OpenCL-Merging-Roadmap-Vulkan|archive-date=November 1, 2017|url-status=dead}}</ref>。这已经由Adobe的Premiere Rush展示出来,它使用clspv开源编译器<ref name=":4">{{Citation|title=Clspv is a compiler for OpenCL C to Vulkan compute shaders|date=2019-08-17|url=https://github.com/google/clspv|access-date=2024-10-14|archive-date=2020-11-28|archive-url=https://web.archive.org/web/20201128054700/https://github.com/google/clspv|dead-url=no}}</ref>,编译了大量OpenCL C内核代码,使其在部署于Android的Vulkan运行时系统上运行<ref>{{Cite web|url=https://www.khronos.org/assets/uploads/developers/library/2019-siggraph/Vulkan-01-Update-SIGGRAPH-Jul19.pdf|title=Vulkan Update SIGGRAPH 2019|access-date=2024-10-13|archive-date=2019-08-20|archive-url=https://web.archive.org/web/20190820222456/https://www.khronos.org/assets/uploads/developers/library/2019-siggraph/Vulkan-01-Update-SIGGRAPH-Jul19.pdf|dead-url=no}}</ref>。 OpenCL拥有独立于Vulkan的前瞻性路线图<ref>{{Cite web|url=https://www.phoronix.com/scan.php?page=article&item=siggraph-2018-khr&num=2|title=SIGGRAPH 2018: OpenCL-Next Taking Shape, Vulkan Continues Evolving – Phoronix|website=www.phoronix.com|access-date=2024-10-13|archive-date=2020-10-28|archive-url=https://web.archive.org/web/20201028152042/https://www.phoronix.com/scan.php?page=article&item=siggraph-2018-khr&num=2|dead-url=no}}</ref>,即曾意图在2020年发行的“OpenCL Next”<ref>{{Cite web|url=https://www.khronos.org/assets/uploads/developers/library/2019-embedded-vision-summit/1b%20Khronos-and-OpenCL-Overview-EVS-Workshop_May19.pdf|title=Khronos and OpenCL Overview EVS Workshop May19|last=Trevett|first=Neil|date=May 23, 2019|website=Khronos Group|access-date=2024-10-13|archive-date=2020-07-17|archive-url=https://web.archive.org/web/20200717035519/https://www.khronos.org/assets/uploads/developers/library/2019-embedded-vision-summit/1b%20Khronos-and-OpenCL-Overview-EVS-Workshop_May19.pdf|dead-url=no}}</ref>,它可以集成于扩展诸如Vulkan/OpenCL互操作、Scratch-Pad内存管理、扩展子组、SPIR-V 1.4摄入和SPIR-V扩展调试信息;OpenCL还在考虑类似Vulkan的装载器和分层以及“灵活配置”,以便在多种加速类型上灵活部署。 === OpenCL 3.0 === 在2020年8月30日,发行了最终的OpenCL 3.0规范<ref>{{Cite web|url = https://www.khronos.org/blog/opencl-3.0-specification-finalized-and-initial-khronos-open-source-opencl-sdk-released|title = OpenCL 3.0 Specification Finalized and Initial Khronos Open Source OpenCL SDK Released|date = September 30, 2020|access-date = 2024-10-14|archive-date = 2020-09-30|archive-url = https://web.archive.org/web/20200930143119/https://www.khronos.org/blog/opencl-3.0-specification-finalized-and-initial-khronos-open-source-opencl-sdk-released|dead-url = no}}</ref>。OpenCL 1.2功能已经成为强制性基准,而所有OpenCL 2.x和OpenCL 3.0特征变为可选项<ref>{{Cite web|url=https://www.phoronix.com/review/opencl-30-spec|title=OpenCL 3.0 Bringing Greater Flexibility, Async DMA Extensions|website=www.phoronix.com|access-date=2024-10-13|archive-date=2024-05-06|archive-url=https://web.archive.org/web/20240506194730/https://www.phoronix.com/review/opencl-30-spec|dead-url=no}}</ref>。这个规范保留了“OpenCL C”语言<ref>{{cite web |last1=Munshi |first1=Aaftab |last2=Howes |first2=Lee |last3=Sochaki |first3=Barosz |title=The OpenCL C Specification Version: 3.0 Document Revision: V3.0.7 |url=https://www.khronos.org/registry/OpenCL/specs/3.0-unified/pdf/OpenCL_C.pdf |publisher=Khronos OpenCL Working Group |access-date=Apr 28, 2021 |date=Apr 27, 2020 |archive-date=September 20, 2020 |archive-url=https://web.archive.org/web/20200920173143/https://www.khronos.org/registry/OpenCL/specs/3.0-unified/pdf/OpenCL_C.pdf |url-status=dead }}</ref>,并废弃了版本2.1介入的“OpenCL C++”内核语言<ref>{{cite web|last1=Sochacki|first1=Bartosz|title=The OpenCL C++ 1.0 Specification|url=https://www.khronos.org/registry/OpenCL/specs/opencl-2.2-cplusplus.pdf|publisher=Khronos OpenCL Working Group|access-date=Jul 19, 2019|date=Jul 19, 2019|archive-date=2021-03-06|archive-url=https://web.archive.org/web/20210306212145/https://www.khronos.org/registry/OpenCL/specs/2.2/pdf/OpenCL_Cxx.pdf|dead-url=no}}</ref>,将其替代为“C++ for OpenCL”语言<ref name=":0">{{Cite web|title=C++ for OpenCL, OpenCL-Guide|url=https://github.com/KhronosGroup/OpenCL-Guide/blob/main/chapters/cpp_for_opencl.md|access-date=2021-04-18|website=GitHub|language=en|archive-date=2025-01-08|archive-url=https://web.archive.org/web/20250108213646/https://github.com/KhronosGroup/OpenCL-Guide/blob/main/chapters/cpp_for_opencl.md|dead-url=no}}</ref>,它基于了[[Clang]]/[[LLVM]]编译器,实现了[[C++17]]的子集和{{le|标准可移植中间表示|Standard Portable Intermediate Representation|SPIR-V}}中间代码。C++ for OpenCL版本1.0的官方文档在2020年12月发表<ref>{{Cite web|date=December 2020|title=Release of Documentation of C++ for OpenCL kernel language, version 1.0, revision 1 · KhronosGroup/OpenCL-Docs|url=https://github.com/KhronosGroup/OpenCL-Docs/releases/tag/cxxforopencl-v1.0-r1|access-date=2021-04-18|website=GitHub|language=en|archive-date=2024-12-16|archive-url=https://web.archive.org/web/20241216064559/https://github.com/KhronosGroup/OpenCL-Docs/releases/tag/cxxforopencl-v1.0-r1|dead-url=no}}</ref>,它后向兼容于OpenCL C 2.0。 在{{le|IWOCL}} 21上发布的OpenCL 3.0.7,提出了C++ for OpenCL的新版本和一些Khronos openCL扩展<ref>{{Cite web |last=Trevett |first=Neil |date=2021 |title=State of the Union: OpenCL Working Group |url=https://www.iwocl.org/wp-content/uploads/k03-iwocl-syclcon-2021-trevett-updated.mp4.pdf |page=9 |access-date=2024-10-13 |archive-date=2024-07-16 |archive-url=https://web.archive.org/web/20240716180733/https://www.iwocl.org/wp-content/uploads/k03-iwocl-syclcon-2021-trevett-updated.mp4.pdf |dead-url=no }}</ref>。在2021年12月,发行了C++ for OpenCL版本2021<ref>{{cite web |title=The C++ for OpenCL 1.0 and 2021 Programming Language Documentation |url=https://github.com/KhronosGroup/OpenCL-Docs/releases/tag/cxxforopencl-docrev2021.12 |publisher=Khronos OpenCL Working Group |access-date=Dec 2, 2022 |date=Dec 20, 2021 |archive-date=2025-01-26 |archive-url=https://web.archive.org/web/20250126162657/https://github.com/KhronosGroup/OpenCL-Docs/releases/tag/cxxforopencl-docrev2021.12 |dead-url=no }}</ref>,它完全兼容于OpenCL 3.0标准。NVIDIA密切协作于Khronos OpenCL工作组,通过信号量和内存共享改进了[[Vulkan]]互操作<ref>{{Cite web|url=https://developer.nvidia.com/blog/using-semaphore-and-memory-sharing-extensions-for-vulkan-interop-with-opencl/|title=Using Semaphore and Memory Sharing Extensions for Vulkan Interop with NVIDIA OpenCL|date=February 24, 2022|access-date=2024-10-13|archive-date=2022-07-06|archive-url=https://web.archive.org/web/20220706061023/https://developer.nvidia.com/blog/using-semaphore-and-memory-sharing-extensions-for-vulkan-interop-with-opencl/|dead-url=no}}</ref>。小更新3.0.14版本,具有缺陷修正和针对多设备的一个新扩展<ref>{{cite web | url=https://www.phoronix.com/news/OpenCL-3.0.14 | title=OpenCL 3.0.14 Released with New Extension for Command Buffer Multi-Device | access-date=2024-10-13 | archive-date=2024-10-08 | archive-url=https://web.archive.org/web/20241008120142/https://www.phoronix.com/news/OpenCL-3.0.14 | dead-url=no }}</ref>。 == 範例 == === 快速傅立葉變換 === 一個[[快速傅立葉變換]]的式子: <ref name=siggraph>{{cite web | url=http://s08.idav.ucdavis.edu/munshi-opencl.pdf | title=OpenCL | accessdate=2008-08-14 | publisher=SIGGRAPH2008 | date=2008-08-14 | archive-url=https://www.webcitation.org/66GmScoh5?url=http://s08.idav.ucdavis.edu/munshi-opencl.pdf | archive-date=2012-03-19 | dead-url=yes }}</ref> <syntaxhighlight lang="c"> // create a compute context with GPU device context = clCreateContextFromType(NULL, CL_DEVICE_TYPE_GPU, NULL, NULL, NULL); // create a command queue queue = clCreateCommandQueue(context, NULL, 0, NULL); // allocate the buffer memory objects memobjs[0] = clCreateBuffer(context, CL_MEM_READ_ONLY | CL_MEM_COPY_HOST_PTR, sizeof(float)*2*num_entries, srcA, NULL); memobjs[1] = clCreateBuffer(context, CL_MEM_READ_WRITE, sizeof(float)*2*num_entries, NULL, NULL); // create the compute program program = clCreateProgramWithSource(context, 1, &fft1D_1024_kernel_src, NULL, NULL); // build the compute program executable clBuildProgram(program, 0, NULL, NULL, NULL, NULL); // create the compute kernel kernel = clCreateKernel(program, "fft1D_1024", NULL); // set the args values clSetKernelArg(kernel, 0, sizeof(cl_mem),(void *)&memobjs[0]); clSetKernelArg(kernel, 1, sizeof(cl_mem),(void *)&memobjs[1]); clSetKernelArg(kernel, 2, sizeof(float)*(local_work_size[0]+1)*16, NULL); clSetKernelArg(kernel, 3, sizeof(float)*(local_work_size[0]+1)*16, NULL); // create N-D range object with work-item dimensions and execute kernel global_work_size[0] = num_entries; local_work_size[0] = 64; clEnqueueNDRangeKernel(queue, kernel, 1, NULL, global_work_size, local_work_size, 0, NULL, NULL); </syntaxhighlight> 真正的運算:(基於Fitting FFT onto the G80 Architecture)<ref name=VolkovKazianFFTG80>{{cite web | url=http://www.cs.berkeley.edu/~kubitron/courses/cs258-S08/projects/reports/project6_report.pdf | title=Fitting FFT onto G80 Architecture | accessdate=2008-11-14 | publisher=Vasily Volkov and Brian Kazian, UC Berkeley CS258 project report | format=PDF | date=May 2008 | archive-date=2012-03-19 | archive-url=https://www.webcitation.org/66GmTA1HM?url=http://www.cs.berkeley.edu/~kubitron/courses/cs258-S08/projects/reports/project6_report.pdf | dead-url=no }}</ref> <syntaxhighlight lang="c"> // This kernel computes FFT of length 1024. The 1024 length FFT is decomposed into // calls to a radix 16 function, another radix 16 function and then a radix 4 function __kernel void fft1D_1024(__global float2 *in, __global float2 *out, __local float *sMemx, __local float *sMemy){ int tid = get_local_id(0); int blockIdx = get_group_id(0) * 1024 + tid; float2 data[16]; // starting index of data to/from global memory in = in + blockIdx; out = out + blockIdx; globalLoads(data, in, 64); // coalesced global reads fftRadix16Pass(data); // in-place radix-16 pass twiddleFactorMul(data, tid, 1024, 0); // local shuffle using local memory localShuffle(data, sMemx, sMemy, tid, (((tid & 15)* 65) +(tid >> 4))); fftRadix16Pass(data); // in-place radix-16 pass twiddleFactorMul(data, tid, 64, 4); // twiddle factor multiplication localShuffle(data, sMemx, sMemy, tid, (((tid >> 4)* 64) +(tid & 15))); // four radix-4 function calls fftRadix4Pass(data); // radix-4 function number 1 fftRadix4Pass(data + 4); // radix-4 function number 2 fftRadix4Pass(data + 8); // radix-4 function number 3 fftRadix4Pass(data + 12); // radix-4 function number 4 // coalesced global writes globalStores(data, out, 64); } </syntaxhighlight> Apple的網站上可以發現傅立葉變換的例子<ref name=AppleOpenCLFFT>. {{cite web | url=https://developer.apple.com/mac/library/samplecode/OpenCL_FFT/index.html | title=OpenCL on FFT | accessdate=2009-12-07 | publisher=Apple | date=16 Nov 2009 | archive-date=2009-11-30 | archive-url=https://web.archive.org/web/20091130085543/http://developer.apple.com/mac/library/samplecode/OpenCL_FFT/index.html | dead-url=no }}</ref> === 平行合併排序法 === 使用 Python 3.x 搭配 PyOpenCL 與 NumPy <div style="height:400px; overflow-y: scroll;"> <syntaxhighlight lang="python3"> import io import random import numpy as np import pyopencl as cl def dump_step(data, chunk_size): """顯示排序過程""" msg = io.StringIO('') div = io.StringIO('') for idx, item in enumerate(data): if idx % chunk_size == 0: if idx > 0: msg.write(' ||') div.write(' ') div.write(' --') else: msg.write(' ') div.write('------') msg.write(' {:2d}'.format(item)) out = msg.getvalue() if chunk_size == 1: print(' ' + '-' * (len(out) - 1)) print(out) print(div.getvalue()) msg.close() div.close() def cl_merge_sort_sbs(data_in): """平行合併排序""" # OpenCL kernel 函數程式碼 CL_CODE = ''' kernel void merge(int chunk_size, int size, global long* data, global long* buff) { // 取得分組編號 const int gid = get_global_id(0); // 根據分組編號計算責任範圍 const int offset = gid * chunk_size; const int real_size = min(offset + chunk_size, size) - offset; global long* data_part = data + offset; global long* buff_part = buff + offset; // 設定合併前的初始狀態 int r_beg = chunk_size >> 1; int b_ptr = 0; int l_ptr = 0; int r_ptr = r_beg; // 進行合併 while (b_ptr < real_size) { if (r_ptr >= real_size) { // 若右側沒有資料,取左側資料堆入緩衝區 buff_part[b_ptr] = data_part[l_ptr++]; } else if (l_ptr == r_beg) { // 若左側沒有資料,取右側資料堆入緩衝區 buff_part[b_ptr] = data_part[r_ptr++]; } else { // 若兩側都有資料,取較小資料堆入緩衝區 if (data_part[l_ptr] < data_part[r_ptr]) { buff_part[b_ptr] = data_part[l_ptr++]; } else { buff_part[b_ptr] = data_part[r_ptr++]; } } b_ptr++; } } ''' # 配置計算資源,編譯 OpenCL 程式 ctx = cl.Context(dev_type=cl.device_type.GPU) prg = cl.Program(ctx, CL_CODE).build() queue = cl.CommandQueue(ctx) mf = cl.mem_flags # 資料轉換成 numpy 形式以利轉換為 OpenCL Buffer data_np = np.int64(data_in) buff_np = np.empty_like(data_np) # 建立緩衝區,並且複製數值到緩衝區 data = cl.Buffer(ctx, mf.READ_WRITE | mf.COPY_HOST_PTR, hostbuf=data_np) buff = cl.Buffer(ctx, mf.READ_WRITE | mf.COPY_HOST_PTR, hostbuf=buff_np) # 設定合併前初始狀態 data_len = np.int32(len(data_np)) chunk_size = np.int32(1) dump_step(data_np, chunk_size) while chunk_size < data_len: # 更新分組大小,每一回合變兩倍 chunk_size <<= 1 # 換算平行作業組數 group_size = ((data_len - 1) // chunk_size) + 1 # 進行分組合併作業 prg.merge(queue, (group_size,), (1,), chunk_size, data_len, data, buff) # 將合併結果作為下一回合的原始資料 temp = data data = buff buff = temp # 顯示此回合狀態 cl.enqueue_copy(queue, data_np, data) dump_step(data_np, chunk_size) queue.finish() data.release() buff.release() def main(): n = random.randint(5, 16) data = [] for i in range(n): data.append(random.randint(1, 99)) cl_merge_sort_sbs(data) if __name__ == '__main__': main() </syntaxhighlight> </div> 執行結果: <syntaxhighlight lang="text"> -------------------------------------------------------------------------------------- 85 || 41 || 64 || 40 || 90 || 29 || 38 || 41 || 64 || 17 || 20 || 41 || 16 || 65 || 83 -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- 41 85 || 40 64 || 29 90 || 38 41 || 17 64 || 20 41 || 16 65 || 83 -------- -------- -------- -------- -------- -------- -------- -- 40 41 64 85 || 29 38 41 90 || 17 20 41 64 || 16 65 83 -------------------- -------------------- -------------------- -------------- 29 38 40 41 41 64 85 90 || 16 17 20 41 64 65 83 -------------------------------------------- -------------------------------------- 16 17 20 29 38 40 41 41 41 64 64 65 83 85 90 -------------------------------------------------------------------------------------- </syntaxhighlight> == 参见 == {{Portal|信息技术}} * [[GPGPU]] * [[CUDA]] * [[DirectCompute]] * [[C++ AMP]] == 參考文獻 == {{Reflist|30em}} == 外部連結 == * [https://www.khronos.org/conformance/adopters/conformant-products/opencl 支援OpenCL的產品]{{Wayback|url=https://www.khronos.org/conformance/adopters/conformant-products/opencl |date=20180629130825 }} * [https://web.archive.org/web/20190122085224/http://www.opengpu.org/ 开源GPU社区]{{zh-cn}} {{-}} {{Khronos Group standards}} {{并行计算}} [[Category:C語言家族]] [[Category:应用程序接口]] [[Category:GPGPU]] [[Category:GPGPU函式庫|OpenCL]] [[Category:并行计算]] [[Category:跨平台軟體]]
摘要:
请注意,所有对Local Chinese Wikipedia的贡献均可能会被其他贡献者编辑、修改或删除。如果您不希望您的文字作品被随意编辑,请不要在此提交。
您同时也向我们承诺,您提交的内容为您自己所创作,或是复制自公共领域或类似自由来源(详情请见
Project:著作权
)。
未经许可,请勿提交受著作权保护的作品!
取消
编辑帮助
(在新窗口中打开)
导航菜单
个人工具
未登录
讨论
贡献
创建账号
登录
命名空间
页面
讨论
大陆简体
不转换
简体
繁體
大陆简体
香港繁體
澳門繁體
大马简体
新加坡简体
臺灣正體
查看
阅读
编辑
查看历史
更多
搜索
导航
首页
最近更改
随机页面
MediaWiki帮助
工具
链入页面
相关更改
特殊页面
页面信息