Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add texture support from CuTextures.jl #209

Merged
merged 15 commits into from
Jun 25, 2020
Merged

Add texture support from CuTextures.jl #209

merged 15 commits into from
Jun 25, 2020

Conversation

maleadt
Copy link
Member

@maleadt maleadt commented Jun 9, 2020

I messed up and #206 got closed.

@maleadt maleadt added enhancement New feature or request cuda kernels Stuff about writing CUDA kernels. labels Jun 9, 2020
lib/cuda/texture.jl Outdated Show resolved Hide resolved
@maleadt
Copy link
Member Author

maleadt commented Jun 9, 2020

I wonder if it would be valuable to try and expose this through array abstractions like broadcast. Since those typically require matching sizes, I could imagine something like warp(::CuTexture, dims...) creating a 'view' with changed dimensions so that we can have it interpolate. Or maybe warp(f, ::CuTexture) to apply a mapping to indices and, e.g., be able to do the sine warp example.

@ChrisRackauckas, you've expressed interest in this, what's your use case?

@codecov
Copy link

codecov bot commented Jun 9, 2020

Codecov Report

Merging #209 into master will decrease coverage by 0.40%.
The diff coverage is 81.53%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master     #209      +/-   ##
==========================================
- Coverage   80.53%   80.13%   -0.41%     
==========================================
  Files         152      154       +2     
  Lines        9911    10174     +263     
==========================================
+ Hits         7982     8153     +171     
- Misses       1929     2021      +92     
Impacted Files Coverage Δ
src/CUDA.jl 100.00% <ø> (ø)
src/pointer.jl 78.04% <50.00%> (-9.05%) ⬇️
lib/cudadrv/memory.jl 85.85% <77.41%> (-7.95%) ⬇️
src/texture.jl 84.61% <84.61%> (ø)
test/texture.jl 90.74% <90.74%> (ø)
examples/wmma/high-level.jl 11.11% <0.00%> (-38.89%) ⬇️
examples/wmma/low-level.jl 14.28% <0.00%> (-35.72%) ⬇️
test/device/wmma.jl 0.00% <0.00%> (-7.41%) ⬇️
test/execution.jl 38.68% <0.00%> (-1.47%) ⬇️
test/device/cuda.jl 9.72% <0.00%> (-0.73%) ⬇️
... and 2 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 160c998...5f6e3c1. Read the comment docs.

@maleadt
Copy link
Member Author

maleadt commented Jun 9, 2020

julia> a = CUDA.rand(2,2)
2×2 CuArray{Float32,2,Nothing}:
 0.148875   0.386196
 0.0170659  0.696161

julia> b = similar(a)
2×2 CuArray{Float32,2,Nothing}:
 0.0  0.0
 0.0  0.0

julia> b .= CuTexture(a)
2×2 CuArray{Float32,2,Nothing}:
 0.148875   0.148875
 0.0170659  0.0170659

julia> @device_code_llvm debuginfo=:none b .= CuTexture(a)
; PTX CompilerJob of kernel broadcast(CUDA.CuKernelContext, CuDeviceArray{Float32,2,CUDA.AS.Global}, Base.Broadcast.Broadcasted{Nothing,Tuple{Base.OneTo{Int64},Base.OneTo{Int64}},typeof(identity),Tuple{Base.Broadcast.Extruded{CuDeviceTexture{Float32,2,false},Tuple{Bool,Bool},Tuple{Int64,Int64}}}}) for sm_75

...
L152:                                             ; preds = %L146, %L141
  %31 = load i64, i64* %56, align 8
  %32 = call [4 x float] @llvm.nvvm.tex.unified.2d.v4f32.s32(i64 %31, i32 %57, i32 %28)
  %.fca.0.extract = extractvalue [4 x float] %32, 0
  %33 = getelementptr inbounds { [2 x i64], i64 }, { [2 x i64], i64 }* %0, i64 0, i32 1
  %34 = bitcast i64* %33 to float addrspace(1)**
  %35 = load float addrspace(1)*, float addrspace(1)** %34, align 8
  %36 = getelementptr float, float addrspace(1)* %35, i64 %27
  store float %.fca.0.extract, float addrspace(1)* %36, align 4
  br label %L33
...
}

@maleadt maleadt force-pushed the master branch 10 times, most recently from d3147a2 to cf97309 Compare June 12, 2020 15:49
@maleadt maleadt force-pushed the tb/textures branch 2 times, most recently from 6cc3a54 to e8ec49b Compare June 25, 2020 10:50
@maleadt
Copy link
Member Author

maleadt commented Jun 25, 2020

CI is green, so let's merge this. API is not final yet, it requires explicit use of CuTexture/CuTextureArray objects (more or less like @cdsousa designed it, but integrated with the rest of the stack). I'd like to try making CuArray compatible with it, dispatching on the inner buffer object.

@maleadt maleadt merged commit 15ca724 into master Jun 25, 2020
@maleadt maleadt deleted the tb/textures branch June 25, 2020 14:56
@cdsousa
Copy link
Contributor

cdsousa commented Jun 25, 2020

Nice!!!

The issue above, got fixed? I've started looking at it and was able to test in my original code, where it was ok, but then I had no more time to continue.

@maleadt
Copy link
Member Author

maleadt commented Jun 25, 2020

The issue above, got fixed? I've started looking at it and was able to test in my original code, where it was ok, but then I had no more time to continue.

Which issue? The tests are mostly the same as they were in your version.

@cdsousa
Copy link
Contributor

cdsousa commented Jun 26, 2020

The issue above, got fixed? I've started looking at it and was able to test in my original code, where it was ok, but then I had no more time to continue.

#209 (comment)

Wasn't broadcast assignment supposed to give the same as a copy?

@maleadt
Copy link
Member Author

maleadt commented Jun 26, 2020

Ah, I didn't even see they mismatched! I was just happy it generated appropriate code. But it seems to work now:

julia> a = CUDA.rand(8,8)
8×8 CuArray{Float32,2,Nothing}:
 0.916239  0.976847   0.755844  0.66619   0.011888  0.213493   0.665336   0.132537
 0.547251  0.867846   0.316787  0.580942  0.953204  0.762209   0.0874605  0.0591269
 0.413512  0.374084   0.369481  0.642127  0.724004  0.0811296  0.657992   0.788099
 0.48726   0.417901   0.889191  0.795766  0.334781  0.387315   0.913375   0.881644
 0.943592  0.584336   0.455078  0.28139   0.591649  0.0605415  0.832624   0.409376
 0.57543   0.0464874  0.612669  0.451585  0.950189  0.647614   0.658178   0.89397
 0.987634  0.586837   0.71754   0.884229  0.501966  0.217039   0.520088   0.542805
 0.730895  0.838332   0.796267  0.295658  0.348608  0.727785   0.867374   0.255682

julia> b = similar(a)

julia> b .= CuTexture(a)
8×8 CuArray{Float32,2,Nothing}:
 0.916239  0.976847   0.755844  0.66619   0.011888  0.213493   0.665336   0.132537
 0.547251  0.867846   0.316787  0.580942  0.953204  0.762209   0.0874605  0.0591269
 0.413512  0.374084   0.369481  0.642127  0.724004  0.0811296  0.657992   0.788099
 0.48726   0.417901   0.889191  0.795766  0.334781  0.387315   0.913375   0.881644
 0.943592  0.584336   0.455078  0.28139   0.591649  0.0605415  0.832624   0.409376
 0.57543   0.0464874  0.612669  0.451585  0.950189  0.647614   0.658178   0.89397
 0.987634  0.586837   0.71754   0.884229  0.501966  0.217039   0.520088   0.542805
 0.730895  0.838332   0.796267  0.295658  0.348608  0.727785   0.867374   0.255682

julia> a == b
true

Although the exact example doesn't anymore:

julia> CuTexture(CUDA.rand(2,2))
ERROR: CUDA error: invalid argument (code 1, ERROR_INVALID_VALUE)

Anyway, CuTexture isn't really ready to be used with broadcast yet, but it's nice to see the essentials work already.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
cuda kernels Stuff about writing CUDA kernels. enhancement New feature or request
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants