Talk:OpenCL
Before creating a discussion or leaving a comment, please read about using talk pages. To create a new discussion, click here. Comments on an existing discussion should be signed using
~~~~
:
A comment [[User:Larry|Larry]] 13:52, 13 May 2024 (UTC) : A reply [[User:Sally|Sally]] 10:48, 13 September 2024 (UTC) :: Your reply ~~~~
Translation
Hi,
I would like to translate this page to Japanese. Could the editor request translation?
Hisashi
amdgpu-pro-opencl instability
Word of warning for other users: it's not kidding about the risk of mixing and matching drivers.
Running things sometimes works, but usually dmesg fills up with the dying gurgles of a broken driver, X stops working, and the computer follows soon after.
Here's one I suffered earlier:
user $
awk '/ERROR/,/end / { print }' /var/log/kernel/current
Mar 13 02:04:13 [kernel] [25091.704610] [drm:amdgpu_ttm_backend_bind] *ERROR* failed to pin userptr Mar 13 02:04:13 [kernel] [25091.937866] [drm:amdgpu_ttm_backend_bind] *ERROR* failed to pin userptr Mar 13 02:04:14 [kernel] [25092.903125] [drm:amdgpu_ttm_backend_bind] *ERROR* failed to pin userptr Mar 13 02:04:14 [kernel] [25092.914144] [drm:amdgpu_ttm_backend_bind] *ERROR* failed to pin userptr Mar 13 02:04:14 [kernel] [25092.961676] ------------[ cut here ]------------ Mar 13 02:04:14 [kernel] [25092.961685] WARNING: CPU: 18 PID: 55924 at drivers/iommu/dma-iommu.c:471 __iommu_dma_unmap+0xe1/0xf0 Mar 13 02:04:14 [kernel] [25092.961686] Modules linked in: ext4 mbcache jbd2 fuse sd_mod bnep bluetooth ecdh_generic ecc crc16 rfkill kvm_amd kvm ahci irqbypass libahci libata uas usb_storage input_leds cdc_acm hid_microsoft led_class scsi_mod Mar 13 02:04:14 [kernel] [25092.961698] CPU: 18 PID: 55924 Comm: FahCore_22 Tainted: G D 5.5.9-zen-01720-gb286bb50f #22 Mar 13 02:04:14 [kernel] [25092.961699] Hardware name: Gigabyte Technology Co., Ltd. X570 UD/X570 UD, BIOS F11 12/06/2019 Mar 13 02:04:14 [kernel] [25092.961701] RIP: 0010:__iommu_dma_unmap+0xe1/0xf0 Mar 13 02:04:14 [kernel] [25092.961703] Code: c0 74 0b 48 89 e6 4c 89 f7 e8 6b d1 76 00 48 c7 44 24 08 00 00 00 00 48 c7 44 24 10 00 00 00 00 48 c7 04 24 ff ff ff ff eb a2 <0f> 0b eb 94 e8 66 df bb ff 66 0f 1f 44 00 00 41 57 41 56 49 89 f7 Mar 13 02:04:14 [kernel] [25092.961704] RSP: 0018:ffff963f14a37bd0 EFLAGS: 00010206 Mar 13 02:04:14 [kernel] [25092.961705] RAX: 0000000040000000 RBX: 0000000000000001 RCX: ffff963f14a37b48 Mar 13 02:04:14 [kernel] [25092.961705] RDX: 0000000000000000 RSI: ffffffffc0000000 RDI: 0000000000000015 Mar 13 02:04:14 [kernel] [25092.961706] RBP: ffff940fd2d27000 R08: 0000000000000000 R09: 0000000000000000 Mar 13 02:04:14 [kernel] [25092.961706] R10: 0000000000000002 R11: 0000000000000001 R12: 0000000000002000 Mar 13 02:04:14 [kernel] [25092.961707] R13: ffff9411078ea800 R14: ffff9411087d0e20 R15: ffff9410243c9e38 Mar 13 02:04:14 [kernel] [25092.961708] FS: 00007f885c38bf80(0000) GS:ffff94110ec80000(0000) knlGS:0000000000000000 Mar 13 02:04:14 [kernel] [25092.961709] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 Mar 13 02:04:14 [kernel] [25092.961709] CR2: 00007f885c9f2270 CR3: 0000000e2c56c000 CR4: 0000000000340ee0 Mar 13 02:04:14 [kernel] [25092.961710] Call Trace: Mar 13 02:04:14 [kernel] [25092.961715] ttm_unmap_and_unpopulate_pages+0xa7/0x130 Mar 13 02:04:14 [kernel] [25092.961717] ttm_tt_destroy.part.0+0x44/0x50 Mar 13 02:04:14 [kernel] [25092.961718] ttm_bo_cleanup_memtype_use+0x2d/0x80 Mar 13 02:04:14 [kernel] [25092.961720] ttm_bo_put+0x2ac/0x330 Mar 13 02:04:14 [kernel] [25092.961722] amdgpu_bo_unref+0x15/0x20 Mar 13 02:04:14 [kernel] [25092.961724] amdgpu_gem_object_free+0x2b/0x50 Mar 13 02:04:14 [kernel] [25092.961726] drm_gem_object_release_handle+0x6b/0x90 Mar 13 02:04:14 [kernel] [25092.961728] drm_gem_handle_delete+0x53/0x90 Mar 13 02:04:14 [kernel] [25092.961730] ? drm_gem_handle_create+0x40/0x40 Mar 13 02:04:14 [kernel] [25092.961731] drm_ioctl_kernel+0xa6/0xf0 Mar 13 02:04:14 [kernel] [25092.961733] drm_ioctl+0x1fc/0x380 Mar 13 02:04:14 [kernel] [25092.961735] ? drm_gem_handle_create+0x40/0x40 Mar 13 02:04:14 [kernel] [25092.961738] ? tlb_finish_mmu+0x24/0x160 Mar 13 02:04:14 [kernel] [25092.961739] ? unmap_region+0xd1/0x100 Mar 13 02:04:14 [kernel] [25092.961741] amdgpu_drm_ioctl+0x44/0x80 Mar 13 02:04:14 [kernel] [25092.961744] do_vfs_ioctl+0x449/0x6c0 Mar 13 02:04:14 [kernel] [25092.961745] ? __do_munmap+0x27e/0x4a0 Mar 13 02:04:14 [kernel] [25092.961747] ksys_ioctl+0x35/0x70 Mar 13 02:04:14 [kernel] [25092.961748] __x64_sys_ioctl+0x11/0x20 Mar 13 02:04:14 [kernel] [25092.961750] do_syscall_64+0x43/0x100 Mar 13 02:04:14 [kernel] [25092.961752] entry_SYSCALL_64_after_hwframe+0x44/0xa9 Mar 13 02:04:14 [kernel] [25092.961755] RIP: 0033:0x7f885c48e567 Mar 13 02:04:14 [kernel] [25092.961756] Code: 00 00 00 75 0c 48 c7 c0 ff ff ff ff 48 83 c4 18 c3 e8 8d c8 01 00 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 00 b8 10 00 00 00 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d f9 e8 0c 00 f7 d8 64 89 01 48 Mar 13 02:04:14 [kernel] [25092.961757] RSP: 002b:00007ffedf620218 EFLAGS: 00000246 ORIG_RAX: 0000000000000010 Mar 13 02:04:14 [kernel] [25092.961758] RAX: ffffffffffffffda RBX: 00007ffedf620260 RCX: 00007f885c48e567 Mar 13 02:04:14 [kernel] [25092.961759] RDX: 00007ffedf620260 RSI: 0000000040086409 RDI: 000000000000000f Mar 13 02:04:14 [kernel] [25092.961760] RBP: 0000000040086409 R08: 0000000002e23012 R09: 0000000000000007 Mar 13 02:04:14 [kernel] [25092.961760] R10: 00007ffedf6204d0 R11: 0000000000000246 R12: 000000000319c970 Mar 13 02:04:14 [kernel] [25092.961761] R13: 000000000000000f R14: 0000000002f95cd0 R15: 0000000000000001 Mar 13 02:04:14 [kernel] [25092.961763] ---[ end trace a079ed0091d543ee ]---
I've encountered other poor folks with similar experiences, and given something as simple as `clinfo` can cause it to detonate I don't think it's a hardware issue.
I'd really like to get OpenCL working, but unfortunately this is the only option that produces any working result (ROCm just won't run period, Clover is uselessly out of date) and it's so risky as to be unusable for the job. - Ant P. (talk) 03:09, 13 March 2020 (UTC)
- Update: apparently it's kernel 5.5 to blame and it's being fixed. Looks like I just picked an unlucky time to try it. - Ant P. (talk) 18:41, 24 March 2020 (UTC)
- -----
- Does this issue still occur 3 years later or is the fix functional? If the issue is still valid, open a Gentoo Bug. Someone may fix it (or not) but at least the problem is documented. A small note about this is the main article can be a valuable information the case the problem still occurs.
- --Admnd (talk) 13:47, 11 April 2023 (UTC)
possible issue when installing rocm based opencl system without PCIe Atomics on some AMD GPUs
I am openning this topic because ROCM itself requires PCIe with Atomics for AMD GPUs from gfx803 as written on https://github.com/ROCm/ROCm.github.io/blob/master/hardware.md , and for what I am testing on my system, including with clinfo showing 0 devices, and forum messages it seems it indeed goes to affect opencl.
The good news is that maybe rusticl from Mesa could support it