[petsc-users] Scaling with number of cores

Discussion:

TAY wee-beng

2015-10-31 16:34:44 UTC

Hi,

I understand that as mentioned in the faq, due to the limitations in
memory, the scaling is not linear. So, I am trying to write a proposal
to use a supercomputer.

Its specs are:

Compute nodes: 82,944 nodes (SPARC64 VIIIfx; 16GB of memory per node)

8 cores / processor

Interconnect: Tofu (6-dimensional mesh/torus) Interconnect

Each cabinet contains 96 computing nodes,

One of the requirement is to give the performance of my current code
with my current set of data, and there is a formula to calculate the
estimated parallel efficiency when using the new large set of data

There are 2 ways to give performance:
1. Strong scaling, which is defined as how the elapsed time varies with
the number of processors for a fixed
problem.
2. Weak scaling, which is defined as how the elapsed time varies with
the number of processors for a
fixed problem size per processor.

I ran my cases with 48 and 96 cores with my current cluster, giving 140
and 90 mins respectively. This is classified as strong scaling.

Cluster specs:

CPU: AMD 6234 2.4GHz

8 cores / processor (CPU)

6 CPU / node

So 48 Cores / CPU

Not sure abt the memory / node

The parallel efficiency âEnâ for a given degree of parallelism ânâ
indicates how much the program is
efficiently accelerated by parallel processing. âEnâ is given by the
following formulae. Although their
derivation processes are different depending on strong and weak scaling,
derived formulae are the
same.

From the estimated time, my parallel efficiency using Amdahl's law on
the current old cluster was 52.7%.

So is my results acceptable?

For the large data set, if using 2205 nodes (2205X8cores), my expected
parallel efficiency is only 0.5%. The proposal recommends value of > 50%.

Is it possible for this type of scaling in PETSc (>50%), when using
17640 (2205X8) cores?

Btw, I do not have access to the system.

Sent using CloudMagic Email
<https://cloudmagic.com/k/d/mailapp?ct=pa&cv=7.4.10&pv=5.0.2&source=email_footer_2>

Matthew Knepley

2015-10-31 16:47:51 UTC

Permalink

Post by TAY wee-beng
Hi,
I understand that as mentioned in the faq, due to the limitations in
memory, the scaling is not linear. So, I am trying to write a proposal to
use a supercomputer.
Compute nodes: 82,944 nodes (SPARC64 VIIIfx; 16GB of memory per node)
8 cores / processor
Interconnect: Tofu (6-dimensional mesh/torus) Interconnect
Each cabinet contains 96 computing nodes,
One of the requirement is to give the performance of my current code with
my current set of data, and there is a formula to calculate the estimated
parallel efficiency when using the new large set of data
1. Strong scaling, which is defined as how the elapsed time varies with
the number of processors for a fixed
problem.
2. Weak scaling, which is defined as how the elapsed time varies with the
number of processors for a
fixed problem size per processor.
I ran my cases with 48 and 96 cores with my current cluster, giving 140
and 90 mins respectively. This is classified as strong scaling.
CPU: AMD 6234 2.4GHz
8 cores / processor (CPU)
6 CPU / node
So 48 Cores / CPU
Not sure abt the memory / node
The parallel efficiency âEnâ for a given degree of parallelism ânâ
indicates how much the program is
efficiently accelerated by parallel processing. âEnâ is given by the
following formulae. Although their
derivation processes are different depending on strong and weak scaling,
derived formulae are the
same.
From the estimated time, my parallel efficiency using Amdahl's law on the
current old cluster was 52.7%.
So is my results acceptable?
For the large data set, if using 2205 nodes (2205X8cores), my expected
parallel efficiency is only 0.5%. The proposal recommends value of > 50%.
The problem with this analysis is that the estimated serial fraction from

Amdahl's Law changes as a function
of problem size, so you cannot take the strong scaling from one problem and
apply it to another without a
model of this dependence.

Weak scaling does model changes with problem size, so I would measure weak
scaling on your current
cluster, and extrapolate to the big machine. I realize that this does not
make sense for many scientific
applications, but neither does requiring a certain parallel efficiency.

Thanks,

Matt

Post by TAY wee-beng
Is it possible for this type of scaling in PETSc (>50%), when using 17640
(2205X8) cores?
Btw, I do not have access to the system.
Sent using CloudMagic Email
<https://cloudmagic.com/k/d/mailapp?ct=pa&cv=7.4.10&pv=5.0.2&source=email_footer_2>

--
What most experimenters take for granted before they begin their
experiments is infinitely more interesting than any results to which their
experiments lead.
-- Norbert Wiener

TAY wee-beng

2015-11-01 01:43:06 UTC

Permalink

Post by TAY wee-beng
Hi,
I understand that as mentioned in the faq, due to the limitations
in memory, the scaling is not linear. So, I am trying to write a
proposal to use a supercomputer.
Compute nodes: 82,944 nodes (SPARC64 VIIIfx; 16GB of memory per node)
8 cores / processor
Interconnect: Tofu (6-dimensional mesh/torus) Interconnect
Each cabinet contains 96 computing nodes,
One of the requirement is to give the performance of my current
code with my current set of data, and there is a formula to
calculate the estimated parallel efficiency when using the new
large set of data
1. Strong scaling, which is defined as how the elapsed time varies
with the number of processors for a fixed
problem.
2. Weak scaling, which is defined as how the elapsed time varies
with the number of processors for a
fixed problem size per processor.
I ran my cases with 48 and 96 cores with my current cluster,
giving 140 and 90 mins respectively. This is classified as strong
scaling.
CPU: AMD 6234 2.4GHz
8 cores / processor (CPU)
6 CPU / node
So 48 Cores / CPU
Not sure abt the memory / node
The parallel efficiency âEnâ for a given degree of parallelism ânâ
indicates how much the program is
efficiently accelerated by parallel processing. âEnâ is given by
the following formulae. Although their
derivation processes are different depending on strong and weak
scaling, derived formulae are the
same.
From the estimated time, my parallel efficiency using Amdahl's
law on the current old cluster was 52.7%.
So is my results acceptable?
For the large data set, if using 2205 nodes (2205X8cores), my
expected parallel efficiency is only 0.5%. The proposal recommends
value of > 50%.
The problem with this analysis is that the estimated serial fraction
from Amdahl's Law changes as a function
of problem size, so you cannot take the strong scaling from one
problem and apply it to another without a
model of this dependence.
Weak scaling does model changes with problem size, so I would measure
weak scaling on your current
cluster, and extrapolate to the big machine. I realize that this does
not make sense for many scientific
applications, but neither does requiring a certain parallel efficiency.

Ok I check the results for my weak scaling it is even worse for the
expected parallel efficiency. From the formula used, it's obvious it's
doing some sort of exponential extrapolation decrease. So unless I can
achieve a near > 90% speed up when I double the cores and problem size
for my current 48/96 cores setup, extrapolating from about 96 nodes to
10,000 nodes will give a much lower expected parallel efficiency for the
new case.

However, it's mentioned in the FAQ that due to memory requirement, it's
impossible to get >90% speed when I double the cores and problem size
(ie linear increase in performance), which means that I can't get >90%
speed up when I double the cores and problem size for my current 48/96
cores setup. Is that so?

So is it fair to say that the main problem does not lie in my
programming skills, but rather the way the linear equations are solved?

Thanks.

Post by TAY wee-beng
Thanks,
Matt
Is it possible for this type of scaling in PETSc (>50%), when
using 17640 (2205X8) cores?
Btw, I do not have access to the system.
Sent using CloudMagic Email
<https://cloudmagic.com/k/d/mailapp?ct=pa&cv=7.4.10&pv=5.0.2&source=email_footer_2>
--
What most experimenters take for granted before they begin their
experiments is infinitely more interesting than any results to which
their experiments lead.
-- Norbert Wiener

Barry Smith

2015-11-01 02:00:49 UTC

Permalink

Hi,
I understand that as mentioned in the faq, due to the limitations in memory, the scaling is not linear. So, I am trying to write a proposal to use a supercomputer.
Compute nodes: 82,944 nodes (SPARC64 VIIIfx; 16GB of memory per node)
8 cores / processor
Interconnect: Tofu (6-dimensional mesh/torus) Interconnect
Each cabinet contains 96 computing nodes,
One of the requirement is to give the performance of my current code with my current set of data, and there is a formula to calculate the estimated parallel efficiency when using the new large set of data
1. Strong scaling, which is defined as how the elapsed time varies with the number of processors for a fixed
problem.
2. Weak scaling, which is defined as how the elapsed time varies with the number of processors for a
fixed problem size per processor.
I ran my cases with 48 and 96 cores with my current cluster, giving 140 and 90 mins respectively. This is classified as strong scaling.
CPU: AMD 6234 2.4GHz
8 cores / processor (CPU)
6 CPU / node
So 48 Cores / CPU
Not sure abt the memory / node
The parallel efficiency ‘En’ for a given degree of parallelism ‘n’ indicates how much the program is
efficiently accelerated by parallel processing. ‘En’ is given by the following formulae. Although their
derivation processes are different depending on strong and weak scaling, derived formulae are the
same.
From the estimated time, my parallel efficiency using Amdahl's law on the current old cluster was 52.7%.
So is my results acceptable?
For the large data set, if using 2205 nodes (2205X8cores), my expected parallel efficiency is only 0.5%. The proposal recommends value of > 50%.
The problem with this analysis is that the estimated serial fraction from Amdahl's Law changes as a function
of problem size, so you cannot take the strong scaling from one problem and apply it to another without a
model of this dependence.
Weak scaling does model changes with problem size, so I would measure weak scaling on your current
cluster, and extrapolate to the big machine. I realize that this does not make sense for many scientific
applications, but neither does requiring a certain parallel efficiency.

Ok I check the results for my weak scaling it is even worse for the expected parallel efficiency. From the formula used, it's obvious it's doing some sort of exponential extrapolation decrease. So unless I can achieve a near > 90% speed up when I double the cores and problem size for my current 48/96 cores setup, extrapolating from about 96 nodes to 10,000 nodes will give a much lower expected parallel efficiency for the new case.
However, it's mentioned in the FAQ that due to memory requirement, it's impossible to get >90% speed when I double the cores and problem size (ie linear increase in performance), which means that I can't get >90% speed up when I double the cores and problem size for my current 48/96 cores setup. Is that so?

What is the output of -ksp_view -log_summary on the problem and then on the problem doubled in size and number of processors?

Barry

So is it fair to say that the main problem does not lie in my programming skills, but rather the way the linear equations are solved?
Thanks.

Thanks,
Matt
Is it possible for this type of scaling in PETSc (>50%), when using 17640 (2205X8) cores?
Btw, I do not have access to the system.
Sent using CloudMagic Email
--
What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead.
-- Norbert Wiener

TAY wee-beng

2015-11-01 13:30:50 UTC