Talk:OpenCL

From Gentoo Wiki
Jump to:navigation Jump to:search
Note
This is a Talk page - please see the documentation about using talk pages. Add newer comments below older ones, sign comments using four tildes (~~~~), and indent successive comments with colons (:). Add new sections at the bottom of the page, under a heading (== ==). Please remember to mark sections as "open for discussion" using {{talk|open}}, so they will show up in the list of open discussions.

Hi,
I would like to translate this page to Japanese. Could the editor request translation?
Hisashi

amdgpu-pro-opencl instability

Word of warning for other users: it's not kidding about the risk of mixing and matching drivers.

Running things sometimes works, but usually dmesg fills up with the dying gurgles of a broken driver, X stops working, and the computer follows soon after.

Here's one I suffered earlier:

user $awk '/ERROR/,/end / { print }' /var/log/kernel/current
Mar 13 02:04:13 [kernel] [25091.704610] [drm:amdgpu_ttm_backend_bind] *ERROR* failed to pin userptr
Mar 13 02:04:13 [kernel] [25091.937866] [drm:amdgpu_ttm_backend_bind] *ERROR* failed to pin userptr
Mar 13 02:04:14 [kernel] [25092.903125] [drm:amdgpu_ttm_backend_bind] *ERROR* failed to pin userptr
Mar 13 02:04:14 [kernel] [25092.914144] [drm:amdgpu_ttm_backend_bind] *ERROR* failed to pin userptr
Mar 13 02:04:14 [kernel] [25092.961676] ------------[ cut here ]------------
Mar 13 02:04:14 [kernel] [25092.961685] WARNING: CPU: 18 PID: 55924 at drivers/iommu/dma-iommu.c:471 __iommu_dma_unmap+0xe1/0xf0
Mar 13 02:04:14 [kernel] [25092.961686] Modules linked in: ext4 mbcache jbd2 fuse sd_mod bnep bluetooth ecdh_generic ecc crc16 rfkill kvm_amd kvm ahci irqbypass libahci libata uas usb_storage input_leds cdc_acm hid_microsoft led_class scsi_mod
Mar 13 02:04:14 [kernel] [25092.961698] CPU: 18 PID: 55924 Comm: FahCore_22 Tainted: G      D           5.5.9-zen-01720-gb286bb50f
#22
Mar 13 02:04:14 [kernel] [25092.961699] Hardware name: Gigabyte Technology Co., Ltd. X570 UD/X570 UD, BIOS F11 12/06/2019
Mar 13 02:04:14 [kernel] [25092.961701] RIP: 0010:__iommu_dma_unmap+0xe1/0xf0
Mar 13 02:04:14 [kernel] [25092.961703] Code: c0 74 0b 48 89 e6 4c 89 f7 e8 6b d1 76 00 48 c7 44 24 08 00 00 00 00 48 c7 44 24 10 00 00 00 00 48 c7 04 24 ff ff ff ff eb a2 <0f> 0b eb 94 e8 66 df bb ff 66 0f 1f 44 00 00 41 57 41 56 49 89 f7
Mar 13 02:04:14 [kernel] [25092.961704] RSP: 0018:ffff963f14a37bd0 EFLAGS: 00010206
Mar 13 02:04:14 [kernel] [25092.961705] RAX: 0000000040000000 RBX: 0000000000000001 RCX: ffff963f14a37b48
Mar 13 02:04:14 [kernel] [25092.961705] RDX: 0000000000000000 RSI: ffffffffc0000000 RDI: 0000000000000015
Mar 13 02:04:14 [kernel] [25092.961706] RBP: ffff940fd2d27000 R08: 0000000000000000 R09: 0000000000000000
Mar 13 02:04:14 [kernel] [25092.961706] R10: 0000000000000002 R11: 0000000000000001 R12: 0000000000002000
Mar 13 02:04:14 [kernel] [25092.961707] R13: ffff9411078ea800 R14: ffff9411087d0e20 R15: ffff9410243c9e38
Mar 13 02:04:14 [kernel] [25092.961708] FS:  00007f885c38bf80(0000) GS:ffff94110ec80000(0000) knlGS:0000000000000000
Mar 13 02:04:14 [kernel] [25092.961709] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
Mar 13 02:04:14 [kernel] [25092.961709] CR2: 00007f885c9f2270 CR3: 0000000e2c56c000 CR4: 0000000000340ee0
Mar 13 02:04:14 [kernel] [25092.961710] Call Trace:
Mar 13 02:04:14 [kernel] [25092.961715]  ttm_unmap_and_unpopulate_pages+0xa7/0x130
Mar 13 02:04:14 [kernel] [25092.961717]  ttm_tt_destroy.part.0+0x44/0x50
Mar 13 02:04:14 [kernel] [25092.961718]  ttm_bo_cleanup_memtype_use+0x2d/0x80
Mar 13 02:04:14 [kernel] [25092.961720]  ttm_bo_put+0x2ac/0x330
Mar 13 02:04:14 [kernel] [25092.961722]  amdgpu_bo_unref+0x15/0x20
Mar 13 02:04:14 [kernel] [25092.961724]  amdgpu_gem_object_free+0x2b/0x50
Mar 13 02:04:14 [kernel] [25092.961726]  drm_gem_object_release_handle+0x6b/0x90
Mar 13 02:04:14 [kernel] [25092.961728]  drm_gem_handle_delete+0x53/0x90
Mar 13 02:04:14 [kernel] [25092.961730]  ? drm_gem_handle_create+0x40/0x40
Mar 13 02:04:14 [kernel] [25092.961731]  drm_ioctl_kernel+0xa6/0xf0
Mar 13 02:04:14 [kernel] [25092.961733]  drm_ioctl+0x1fc/0x380
Mar 13 02:04:14 [kernel] [25092.961735]  ? drm_gem_handle_create+0x40/0x40
Mar 13 02:04:14 [kernel] [25092.961738]  ? tlb_finish_mmu+0x24/0x160
Mar 13 02:04:14 [kernel] [25092.961739]  ? unmap_region+0xd1/0x100
Mar 13 02:04:14 [kernel] [25092.961741]  amdgpu_drm_ioctl+0x44/0x80
Mar 13 02:04:14 [kernel] [25092.961744]  do_vfs_ioctl+0x449/0x6c0
Mar 13 02:04:14 [kernel] [25092.961745]  ? __do_munmap+0x27e/0x4a0
Mar 13 02:04:14 [kernel] [25092.961747]  ksys_ioctl+0x35/0x70
Mar 13 02:04:14 [kernel] [25092.961748]  __x64_sys_ioctl+0x11/0x20
Mar 13 02:04:14 [kernel] [25092.961750]  do_syscall_64+0x43/0x100
Mar 13 02:04:14 [kernel] [25092.961752]  entry_SYSCALL_64_after_hwframe+0x44/0xa9
Mar 13 02:04:14 [kernel] [25092.961755] RIP: 0033:0x7f885c48e567
Mar 13 02:04:14 [kernel] [25092.961756] Code: 00 00 00 75 0c 48 c7 c0 ff ff ff ff 48 83 c4 18 c3 e8 8d c8 01 00 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 00 b8 10 00 00 00 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d f9 e8 0c 00 f7 d8 64 89 01 48
Mar 13 02:04:14 [kernel] [25092.961757] RSP: 002b:00007ffedf620218 EFLAGS: 00000246 ORIG_RAX: 0000000000000010
Mar 13 02:04:14 [kernel] [25092.961758] RAX: ffffffffffffffda RBX: 00007ffedf620260 RCX: 00007f885c48e567
Mar 13 02:04:14 [kernel] [25092.961759] RDX: 00007ffedf620260 RSI: 0000000040086409 RDI: 000000000000000f
Mar 13 02:04:14 [kernel] [25092.961760] RBP: 0000000040086409 R08: 0000000002e23012 R09: 0000000000000007
Mar 13 02:04:14 [kernel] [25092.961760] R10: 00007ffedf6204d0 R11: 0000000000000246 R12: 000000000319c970
Mar 13 02:04:14 [kernel] [25092.961761] R13: 000000000000000f R14: 0000000002f95cd0 R15: 0000000000000001
Mar 13 02:04:14 [kernel] [25092.961763] ---[ end trace a079ed0091d543ee ]---

I've encountered other poor folks with similar experiences, and given something as simple as `clinfo` can cause it to detonate I don't think it's a hardware issue.

I'd really like to get OpenCL working, but unfortunately this is the only option that produces any working result (ROCm just won't run period, Clover is uselessly out of date) and it's so risky as to be unusable for the job. - Ant P. (talk) 03:09, 13 March 2020 (UTC)

Update: apparently it's kernel 5.5 to blame and it's being fixed. Looks like I just picked an unlucky time to try it. - Ant P. (talk) 18:41, 24 March 2020 (UTC)