-
March 18th, 2024, 10:08 PM
#1
OpenMP compute pi
Hello,
I am working my way into OpenMP.
The following code is supposed to compute pi.
It works fine for one thread, but in case of multiple threads the relative error gets larger. Can anyone spot the mistake? It should be trivial...
Code:
#include <iostream>
#include <omp.h>
#include <chrono>
int main(){
long int N = 1000000000;
long double dx = 1./N;
long double result = 0;
auto t1 = std:: chrono::high_resolution_clock::now();
omp_set_num_threads(2); // works only for 1
#pragma omp parallel
{
long int rank = omp_get_thread_num();
long int nr_ranks = omp_get_num_threads();
long int n_per_rank = N/nr_ranks;
long int N_start = rank*n_per_rank;
long int N_end = N_start + n_per_rank;
// In case N not evenly divisible by ranks, last rank does remaining iterations.
if (N % nr_ranks != 0 && rank == nr_ranks-1){
N_end += N % nr_ranks;
}
double x_start = N_start*dx;
// Main loop.
for(long int i=N_start; i<N_end; ++i){
result += 4/(1+x_start*x_start)*dx;
x_start += dx;
}
}
auto t2 = std:: chrono:: high_resolution_clock:: now();
auto ms_int = std:: chrono:: duration_cast < std:: chrono:: seconds > (t2 - t1);
std:: cout << "Execution took " << ms_int.count() << " seconds!\n";
std::cout<<"error is "<<(result-3.14159265358979323846)/3.14159265358979323846<<"\n";
return 0;
}
-
March 19th, 2024, 12:13 AM
#2
Re: OpenMP compute pi
Yeah, you're not calculating PI at all.
You need to flag your result as being shared between multiple threads, otherwise you're losing updates to it.
Code:
#include <iostream>
#include <omp.h>
#include <chrono>
int main(){
const long int N = 1000000000;
const long double dx = 1./N;
long double result = 0;
auto t1 = std:: chrono::high_resolution_clock::now();
omp_set_num_threads(2); // works only for 1
#pragma omp parallel
{
long double result_by_rank = 0;
long int rank = omp_get_thread_num();
long int nr_ranks = omp_get_num_threads();
long int n_per_rank = N/nr_ranks;
long int N_start = rank*n_per_rank;
long int N_end = N_start + n_per_rank;
// In case N not evenly divisible by ranks, last rank does remaining iterations.
if (N % nr_ranks != 0 && rank == nr_ranks-1){
N_end += N % nr_ranks;
}
double x_start = N_start*dx;
// Main loop.
for(long int i=N_start; i<N_end; ++i){
result += 4/(1+x_start*x_start)*dx;
result_by_rank += 4/(1+x_start*x_start)*dx;
x_start += dx;
}
std::cout << "Rank:" << rank
<< " start:" << N_start
<< " end:" << N_end
<< " result:" << result
<< " result_by_rank:" << result_by_rank
<< "\n";
}
auto t2 = std:: chrono:: high_resolution_clock:: now();
auto ms_int = std:: chrono:: duration_cast < std:: chrono:: seconds > (t2 - t1);
std:: cout << "Execution took " << ms_int.count() << " seconds!\n";
std::cout << "Final result=" << result << "\n";
std::cout<<"error is "<<(result-3.14159265358979323846)/3.14159265358979323846<<"\n";
return 0;
}
$ g++ -fopenmp foo.cpp
$ ./a.out
Rank:1 start:500000000 end:1000000000 result:1.03382 result_by_rank:1.287
Rank:0 start:0 end:500000000 result:1.86255 result_by_rank:1.85459
Execution took 11 seconds!
Final result=1.86255
error is -0.407131
The sum of all the result_by_rank is a lot better because each one is private to a thread.
Posting Permissions
- You may not post new threads
- You may not post replies
- You may not post attachments
- You may not edit your posts
-
Forum Rules
|
Click Here to Expand Forum to Full Width
|