Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

HW8 finish ! #6

Open
wants to merge 4 commits into
base: main
Choose a base branch
from
Open

HW8 finish ! #6

wants to merge 4 commits into from

Conversation

jiayaozhang
Copy link

这是基于“边角料法”的,请把他改成基于“网格跨步循环”的:10 分

template<class Func>
__global__ void parallel_for(int n, Func func){
    for(int i = blockDim.x * blockIdx.x + threadIdx.x;i < n ; i += blockDim.x * gridDim.x){
        func(i);
    }
}

fill_sin 改成“网格跨步循环”以后,这里三重尖括号里的参数如何调整?10 分

   parallel_for<<<32,1024>>>(n,[arr_data = arr.data()] __device__ (int i){
        arr_data[i] = __sinf(i);
    });

这里的“边角料法”对于不是 1024 整数倍的 n 会出错,为什么?请修复:10 分

 filter_positive<<< (n+1024-1) / 1024, 1024>>>(counter.data(), res.data(), arr.data(), n);

这里 CPU 访问数据前漏了一步什么操作?请补上:10 分

    checkCudaErrors(cudaDeviceSynchronize());

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant