Matrix related calculation & Parallel Programming

Language : C++, Java, python, matlab

buffer_creation_test.cpp

OpenCl: Using CL_MEM_COPY_HOST_PTR when creating buffer will result in slower execution However this is only applied to kernel method with large arguments (>4)

ILU.cpp & ILU_pointer.cpp

Conventional ILU Factorization using parallel program of OpenMP. Data structure : std :: vector and pointer arrays

fine_grained_incomplete_factorization.m

Decompose matrix to be Lower and Upper Triangular matrix so that L*U=A Only works for positive definite matrices.

incomplete_Cholesky.m

Decompose matrix to be Lower and Upper Triangular matrix so that U'*U=A Only works for positive definite matrices.

Based on research paper by : FINE - GRAINED PARALLEL INCOMPLETE LU FACTORIZATION by EDMOND CHOW AND AFTAB PATEL

Speed.cpp

Compare speed of function with reference argument & without reference argument.

Without optimization (-O3):

reference argument & function is slower

With optimization (-O3):

reference argument & function is faster

But the difference between compiling with -O3 and not is huge.

So stuck with compiling with -O3

locality_of_reference.cpp

Changing the order of index looping resulted in much faster execution due to cache penalty. With -O3 flags, the speed up is around 5X faster for n = 1500

assignment_operator_test.cu

Performance testing for assignment operator '+='

Comparison between '+=' and '='

Result : assignment operator '=' is faster than '+='

performance_change_order_code.cpp

Three-four times faster if the order of part I & part II in the code is switched. https://stackoverflow.com/questions/56308339/is-position-of-code-affect-performance-in-c/56308654#56308654

vector_push_test.cpp

Comparing performance between push back vector with C-style array for discretization of 3D block Related Stackoverflow issue : https://stackoverflow.com/questions/20168051/why-push-back-is-slower-than-operator-for-a-previously-allocated-vector/20168172#20168172

max_args_test.cu

Test max argument's size (256 Byte) for Cuda kernel. Passing struct to kernel arguments to overcome limitation of number of args in kernel But regular kernel turns out working properly with 70 args https://devtalk.nvidia.com/default/topic/458705/is-there-any-limit-on-of-arguments-in-cuda-kernel-/

createBMatrix.m

Function to create B matrix from a given matrix in Finite Element Methods

LUDecomposition.m

Function to create LU Decomposition

IncompleteLU.m

Incomplete LU Factorization

LUFactorization

Function to create LU Factorization

Steepest_descent

Calculating inverse matrix using a Steepest descent algorithm.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Matrix related calculation & Parallel Programming

About

Releases

Packages

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 37 Commits
C. Fine-Grained Parallel Incomplete LU Factorization.pdf		C. Fine-Grained Parallel Incomplete LU Factorization.pdf
ILU.cpp		ILU.cpp
ILU_p.cpp		ILU_p.cpp
ILU_pointer.cpp		ILU_pointer.cpp
IncompleteLU.m		IncompleteLU.m
LUDecomposition.m		LUDecomposition.m
LUFactorization.m		LUFactorization.m
README.md		README.md
Speed.cpp		Speed.cpp
Steepest_descent.m		Steepest_descent.m
assignment_operator_test.cu		assignment_operator_test.cu
buffer_creation_test.cl		buffer_creation_test.cl
buffer_creation_test.cpp		buffer_creation_test.cpp
createBMatrix.m		createBMatrix.m
fine_grained_incomplete_factorization.m		fine_grained_incomplete_factorization.m
incomplete_Cholesky.m		incomplete_Cholesky.m
locality_of_reference.cpp		locality_of_reference.cpp
max_args_test.cu		max_args_test.cu
performance_change_order_code.cpp		performance_change_order_code.cpp
vector_push_test.cpp		vector_push_test.cpp

cho-uc/Matrix-Parallel-Programming

Folders and files

Latest commit

History

Repository files navigation

Matrix related calculation & Parallel Programming

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages