Determine the latencies and bandwidths quantitatively

import numpy as np
from collections import OrderedDict as odict
import pandas as pd
import matplotlib.pyplot as plt
import matplotlib.cm as cm

This notebook computes the latencies and bandwidths of the three primitive function types. Again, start with a list of files and architectures that we want to investigate.

#(hardware name, number of nodes)
files = odict({})
files['i5'] = ('i5',1)
files['gtx1060'] = ('gtx1060',1)
files['skl_mpi1'] = ('skl',1)
files['skl_mpi2'] = ('skl',2)
files['skl_mpi4'] = ('skl',4)
files['knl_mpi1'] = ('knl',1)
files['knl_mpi2'] = ('knl',2)
files['knl_mpi4'] = ('knl',4)
files['p100nv_mpi1'] = ('p100',1)
files['p100nv_mpi2'] = ('p100',2)
files['p100nv_mpi4'] = ('p100',4)
files['titanxp'] = ('titanXp',1)
files['v100nv_mpi1'] = ('v100',1)
files['v100nv_mpi2'] = ('v100',2)
files['v100nv_mpi4'] = ('v100',4)

pd.set_option('precision',2)

Assumptions

  • there are three basic functions: trivially parallel(axpby), nearest neighbor (dxdy), global reduction (dot)

  • each can be represented by the single node bandwidth, the single node latency and the multinode latency

But

  • does not capture cache effect e.g. in SKl

First, we read in files and extract runtimes and compute bandwidth for every measurement. We also compute the average runtime and bandwidth for every combination of n, Nx and Ny.

Axpby and Dot Bandwidths

The bandwidths are determined by taking the average bandwidth of the 30 bandwidths corresponding to the 3 largest sizes. The error is the standard deviation on these bandwidths.

Axpby and Dot Latencies

We try three different methods to compute the latency. First, we try the mean average runtime for the 3 smallest sizes. Second, we take the minimum of the average runtime among all combinations of n, Nx, and Ny. The error is the standard deviation corresponding to this average runtime. Finally, we use a correction factor from the more accurate bandwidths.

Dx-Dy Bandwidths

Since the efficiency of the matrix-vector multiplications depends on the polynomial coefficient we should compute these bandwidths separately for each coefficient and use sizes between 10MB and 1000MB

Dx-Dy Latencies

As in Axpby

names={'axpby':3,'dot':2,'dx':3, 'dy':3}
#ns=[3,4]
values = []
for f, v in files.items() :
    runtimes=pd.read_csv('benchmark_'+f+'.csv', delimiter=' ')
    #add size and bandwidth columns
    runtimes.insert(0,'size', 8*runtimes['n']*runtimes['n']
                    *runtimes['Nx']*runtimes['Ny']/1e6/v[1]) #inplace transformation
    for name,memops in names.items() :
        runtimes.insert(0,name+'_bw',runtimes['size']/1000*memops/runtimes[name])
    runtimes = runtimes.assign( dxdy=(runtimes['dx']+runtimes['dy'])/2)
    runtimes = runtimes.assign( dxdy_bw=2.0*runtimes['dx_bw']*runtimes['dy_bw']
                               /(runtimes['dx_bw']+runtimes['dy_bw']))
    #compute one version with aggregated grouped sizes and one without
    avgruntimes=runtimes.groupby(['n', 'Nx','Ny','size']).agg(['mean', 'std'])
    avgruntimes=avgruntimes.reset_index(level=['n','Nx','Ny','size'])
    avgruntimes.sort_values(by='size',inplace=True) #sort by size
    runtimes.sort_values(by='size',inplace=True)
    ##first compute axpby and dot latencies and bandwidths 
    nmax = 3
    s =30

    line = []
    l=len(runtimes)
    line.append(v[0]) #0
    line.append(v[1]) #1
    for q in ['axpby','dot']:
        bandwidth       = runtimes[l-s:l][q+'_bw'].mean()
        err_bandwidth   = runtimes[l-s:l][q+'_bw'].std()
        mean_latency    = avgruntimes[0:nmax][(q,'mean')].mean()/1e-6
        min_latency     = avgruntimes[(q,'mean')].min()/1e-6
        idx_min_latency = avgruntimes[(q,'mean')].idxmin()
        corr_min_latency= min_latency - avgruntimes['size'].loc[idx_min_latency]*names[q]/bandwidth/1e-3
        err_latency     = avgruntimes[(q,'std')].loc[idx_min_latency]/1e-6
        if corr_min_latency <0 : corr_min_latency = 0 
        line.append(bandwidth) #2 bandwidth
        line.append(err_bandwidth)  #3 err_bandwidth
        line.append(mean_latency) #4 latency mean
        line.append(min_latency)  #5 latency min
        line.append(corr_min_latency ) #6 corrected lat 
        line.append(err_latency ) #6 error lat 
    ##now compute latency and bandwidths of dx and y
    for n in [2,3,4,5]:
        #take n
        dxdy=runtimes[runtimes['n']==n]
        
        avgdxdy = avgruntimes[avgruntimes['n']==n]
        dxdy=dxdy.sort_values(by='size')
        avgdxdy=avgdxdy.sort_values(by='size') #sort by size
        bandwidth       = dxdy[(dxdy['size']>10)&(dxdy['size']<1000)]['dxdy_bw'].mean()
        err_bandwidth   = dxdy[(dxdy['size']>10)&(dxdy['size']<1000)]['dxdy_bw'].std()
        mean_latency    = avgdxdy[0:nmax][('dxdy','mean')].mean()/1e-6
        min_latency     = avgdxdy[('dxdy','mean')].min()/1e-6
        idx_min_latency = avgdxdy[('dxdy','mean')].idxmin()
        corr_min_latency= min_latency - avgdxdy['size'].loc[idx_min_latency]*names['dx']/bandwidth/1e-3
        err_latency     = avgdxdy[('dxdy','std')].loc[idx_min_latency]/1e-6
        if corr_min_latency <0 : 
            corr_min_latency = 0 
        line.append(bandwidth) #2 bandwidth
        line.append(err_bandwidth)  #3 err_bandwidth
        line.append(mean_latency) #4 latency mean
        line.append(min_latency)  #5 latency min
        line.append(corr_min_latency ) #6 corrected lat  
        line.append(err_latency ) #6 error lat     
    values.append(line)

In the following cells we just display the results in a more accessible way or extract values to show in publications.

#now construct new more ordered table with values from previous cell      
tuples=[('arch','',''),('nodes','','')]        
for q in ['axpby','dot','dxdy2','dxdy3','dxdy4','dxdy5']:
    tuples.append((q,'bw','avg'))
    tuples.append((q,'bw','std'))
    tuples.append((q,'lat','avg'))
    tuples.append((q,'lat','min'))
    tuples.append((q,'lat','corr'))
    tuples.append((q,'lat','std'))
    
cols=pd.MultiIndex.from_tuples(tuples)
arr = pd.DataFrame(values,index=files.keys(), columns=cols)
arr.sort_values(by='arch',inplace=True)
arr.set_index(['arch','nodes'],inplace=True)
#arr.loc[:,[('dot','bw','avg'),('dot','lat','avg')]]
arr
axpby dot ... dxdy4 dxdy5
bw lat bw lat ... lat bw lat
avg std avg min corr std avg std avg min ... avg min corr std avg std avg min corr std
arch nodes
gtx1060 1 157.05 0.06 23.19 3.51 0.00 0.24 26.50 0.10 199.90 131.63 ... 322.56 72.50 0.00 0.27 69.26 17.04 571.43 125.05 0.00 0.45
i5 1 29.99 0.19 30.58 12.38 0.00 1.18 9.31 0.04 316.67 117.43 ... 1212.12 208.32 0.00 12.42 21.45 1.91 1956.57 340.41 0.00 1.31
knl 1 393.15 22.19 11.81 9.98 5.47 0.30 141.36 6.63 77.05 63.20 ... 215.74 52.06 0.00 1.59 101.34 14.94 451.55 98.29 0.00 37.85
2 420.41 40.35 10.71 10.01 7.89 0.21 128.36 2.71 99.39 91.51 ... 207.80 92.39 59.93 1.70 88.47 7.45 363.12 133.93 71.21 8.96
4 420.70 43.72 10.53 10.21 9.16 0.09 109.74 3.94 126.08 122.29 ... 155.14 82.89 67.54 5.47 89.24 13.82 235.93 108.08 76.99 6.04
p100 1 550.51 1.23 14.36 10.38 7.52 0.02 375.61 1.94 158.73 68.34 ... 190.41 138.27 17.53 2.44 171.81 17.67 275.12 71.80 14.58 5.15
2 553.30 0.97 7.03 5.75 0.00 0.65 369.20 2.37 76.43 59.87 ... 125.22 65.47 49.16 3.43 166.21 14.15 189.63 80.20 50.63 5.60
4 554.54 0.50 8.16 7.31 0.00 0.27 356.21 5.79 57.54 52.41 ... 101.07 69.34 60.86 2.02 160.39 16.98 139.32 89.97 74.65 1.59
skl 1 206.71 5.87 4.56 4.07 0.00 0.11 192.05 18.31 32.29 24.19 ... 340.88 82.83 15.38 1.05 110.87 8.03 442.70 132.54 20.32 2.07
2 216.29 7.02 4.17 3.95 0.00 0.18 182.31 11.77 31.34 25.87 ... 149.05 63.79 30.71 0.86 114.29 7.18 249.57 89.13 34.70 0.79
4 232.62 15.11 4.11 4.02 0.00 0.26 166.86 23.97 43.80 42.40 ... 90.99 47.41 29.99 1.16 110.39 9.24 142.54 61.74 33.57 2.50
titanXp 1 431.24 3.45 6.56 2.58 0.00 0.21 61.37 0.12 87.86 61.46 ... 115.69 28.69 3.20 0.12 197.91 31.44 211.14 48.84 0.00 1.62
v100 1 846.42 0.95 5.01 4.68 0.50 0.34 610.15 5.99 95.96 92.35 ... 45.70 11.74 2.70 0.02 589.97 65.17 78.97 22.98 6.32 0.08
2 846.01 1.07 5.66 5.58 0.00 0.03 578.96 12.17 102.71 99.70 ... 47.17 36.54 31.64 0.17 567.50 46.64 61.97 38.84 30.18 0.27
4 841.47 3.03 5.73 5.60 0.00 0.31 526.26 16.99 103.29 98.08 ... 42.52 38.74 36.29 2.10 556.61 56.29 47.54 39.42 35.01 1.77

15 rows × 36 columns

#arr=arr.reset_index()
#define conversion function for writing nice output tables
def toString(x): 
    if pd.isnull(x) : return 'n/a'
    #string = '%.1f'% x
    string = '%d' %np.ceil(x)
    #if np.ceil(x)<100 : string = '0'+string
    if np.ceil(x)<10 : string = '0'+string
    return string
addto = []
for n in ['axpby','dot','dxdy2','dxdy3','dxdy4','dxdy5']:
    arr.loc[:,(n,'bw','string')]= arr[n]['bw']['avg'].apply(toString) +" ± "+arr[n]['bw']['std'].apply(toString)
    arr.loc[:,(n,'lat','string')]= arr[n]['lat']['corr'].apply(toString) +" ± "+arr[n]['lat']['std'].apply(toString)
    addto.append((n,'lat','string'))
    addto.append((n,'bw','string'))

#make a table for display
nicetable=arr[addto]
drop = nicetable.columns.droplevel(2)
nicetable.columns=drop
#nicetable.reset_index(inplace=True)
#nicetable.set_index('arch')
newindex=[('i5',1)]
newindex.append(('gtx1060',1))
newindex.append(('titanXp',1))
for n in ['skl','knl']:
    for m in [1,2,4]:
        newindex.append((n,m))
for n in ['p100','v100']:
    for m in [1,2,4]:
        newindex.append((n,m))
    
nicetable=nicetable.reindex(newindex)

nicetable
axpby dot dxdy2 dxdy3 dxdy4 dxdy5
lat bw lat bw lat bw lat bw lat bw lat bw
arch nodes
i5 1 00 ± 02 30 ± 01 05 ± 01 10 ± 01 00 ± 02 28 ± 03 00 ± 04 30 ± 03 00 ± 13 26 ± 02 00 ± 02 22 ± 02
gtx1060 1 00 ± 01 158 ± 01 93 ± 09 27 ± 01 00 ± 01 131 ± 01 03 ± 01 112 ± 02 00 ± 01 84 ± 14 00 ± 01 70 ± 18
titanXp 1 00 ± 01 432 ± 04 45 ± 06 62 ± 01 03 ± 01 373 ± 05 02 ± 01 309 ± 10 04 ± 01 247 ± 08 00 ± 02 198 ± 32
skl 1 00 ± 01 207 ± 06 18 ± 03 193 ± 19 23 ± 03 182 ± 36 17 ± 01 162 ± 13 16 ± 02 119 ± 19 21 ± 03 111 ± 09
2 00 ± 01 217 ± 08 23 ± 01 183 ± 12 30 ± 03 175 ± 45 31 ± 03 158 ± 17 31 ± 01 121 ± 21 35 ± 01 115 ± 08
4 00 ± 01 233 ± 16 38 ± 05 167 ± 24 29 ± 03 168 ± 44 30 ± 02 160 ± 08 30 ± 02 115 ± 25 34 ± 03 111 ± 10
knl 1 06 ± 01 394 ± 23 55 ± 02 142 ± 07 10 ± 01 240 ± 18 08 ± 02 173 ± 27 00 ± 02 127 ± 19 00 ± 38 102 ± 15
2 08 ± 01 421 ± 41 87 ± 02 129 ± 03 49 ± 01 176 ± 28 56 ± 03 142 ± 23 60 ± 02 110 ± 15 72 ± 09 89 ± 08
4 10 ± 01 421 ± 44 120 ± 06 110 ± 04 53 ± 04 156 ± 25 59 ± 04 128 ± 24 68 ± 06 116 ± 22 77 ± 07 90 ± 14
p100 1 08 ± 01 551 ± 02 51 ± 08 376 ± 02 27 ± 01 294 ± 08 09 ± 02 239 ± 13 18 ± 03 209 ± 08 15 ± 06 172 ± 18
2 00 ± 01 554 ± 01 51 ± 09 370 ± 03 51 ± 06 257 ± 19 51 ± 05 223 ± 17 50 ± 04 193 ± 17 51 ± 06 167 ± 15
4 00 ± 01 555 ± 01 52 ± 01 357 ± 06 55 ± 01 240 ± 12 57 ± 02 200 ± 26 61 ± 03 186 ± 16 75 ± 02 161 ± 17
v100 1 01 ± 01 847 ± 01 89 ± 05 611 ± 06 05 ± 01 795 ± 21 03 ± 01 736 ± 34 03 ± 01 697 ± 16 07 ± 01 590 ± 66
2 00 ± 01 847 ± 02 99 ± 05 579 ± 13 32 ± 01 703 ± 59 34 ± 01 689 ± 48 32 ± 01 642 ± 55 31 ± 01 568 ± 47
4 00 ± 01 842 ± 04 98 ± 01 527 ± 17 38 ± 01 678 ± 37 38 ± 04 627 ± 84 37 ± 03 641 ± 47 36 ± 02 557 ± 57
index = ['i5','gtx1060','skl','knl','p100','titanXp','v100']  
lines = []
for arch in  index: 
    line = []
    line.append(arch)
    #first the bandwidths
    for n in ['axpby','dot','dxdy2','dxdy3','dxdy4','dxdy5']:
        line.append( arr.loc[(arch,1),(n,'bw','avg')] )
        line.append( arr.loc[(arch,1),(n,'bw','std')])
    for n in ['axpby','dot','dxdy2'] :
        line.append( arr.loc[(arch,1),(n,'lat','corr')] )
        line.append( arr.loc[(arch,1),(n,'lat','std')])
        if arch == 'i5' or arch == 'gtx1060' or arch == 'titanXp':
            line.append(None)
            line.append(None)
        else:
            line.append( arr.loc[(arch,4),(n,'lat','corr')] )
            line.append( arr.loc[(arch,4),(n,'lat','std')] )
    lines.append(line)
    
tuples=['arch']     

for n in ['axpby','dot','dxdy2','dxdy3','dxdy4','dxdy5']:
    tuples.append(n+'_bw')
    tuples.append(n+'_bw_err')
for n in ['axpby','dot','dxdy']:
    tuples.append(n+'_lat_shared')
    tuples.append(n+'_lat_shared_err')
    tuples.append(n+'_lat_dist')
    tuples.append(n+'_lat_dist_err')
cols=tuples
toDisk = pd.DataFrame(lines, columns=cols)
toDisk.to_csv('performance.csv',sep=' ',index=False)
pd.set_option('display.float_format', lambda x: '%.2f' % x)
test = pd.read_csv('performance.csv',delimiter=' ')
test
arch axpby_bw axpby_bw_err dot_bw dot_bw_err dxdy2_bw dxdy2_bw_err dxdy3_bw dxdy3_bw_err dxdy4_bw ... axpby_lat_dist axpby_lat_dist_err dot_lat_shared dot_lat_shared_err dot_lat_dist dot_lat_dist_err dxdy_lat_shared dxdy_lat_shared_err dxdy_lat_dist dxdy_lat_dist_err
0 i5 29.99 0.19 9.31 0.04 27.79 2.97 29.12 2.84 25.58 ... NaN NaN 4.76 0.23 NaN NaN 0.00 1.44 NaN NaN
1 gtx1060 157.05 0.06 26.50 0.10 130.63 0.40 111.23 1.11 83.82 ... NaN NaN 92.06 8.70 NaN NaN 0.00 0.82 NaN NaN
2 skl 206.71 5.87 192.05 18.31 181.56 35.38 161.75 13.00 118.06 ... 0.00 0.26 17.28 2.32 37.93 4.14 22.70 2.11 28.52 2.10
3 knl 393.15 22.19 141.36 6.63 239.04 17.02 172.69 26.80 126.04 ... 9.16 0.09 54.83 1.79 119.59 5.14 9.93 0.70 52.67 3.72
4 p100 550.51 1.23 375.61 1.94 293.25 7.11 238.99 12.63 208.44 ... 0.00 0.27 50.89 7.06 51.67 0.59 26.23 0.05 54.40 0.35
5 titanXp 431.24 3.45 61.37 0.12 372.85 4.16 308.92 9.47 246.73 ... NaN NaN 44.37 5.15 NaN NaN 2.38 0.57 NaN NaN
6 v100 846.42 0.95 610.15 5.99 794.43 20.52 735.42 33.02 696.49 ... 0.00 0.31 88.49 4.68 97.58 0.79 4.20 0.02 37.19 0.42

7 rows × 25 columns

Observations

  • note the high latency in the knl MPI implementation of dxdy. It seems to suffer from the same problem as the GPUs. (Is this the speed of PCIe we see?)

#index = ['i5','gtx1060','titanXp', 'skl','knl','p100','v100']  

lines = []
for arch in  index: 
    line = []        
    for n in ['axpby','dot','dxdy2','dxdy3','dxdy4','dxdy5']:
        bw  = arr.loc[(arch,1),(n,'bw','avg')]
        bw_err = arr.loc[(arch,1),(n,'bw','std')]
        lat1 = arr.loc[(arch,1),(n,'lat','corr')]
        lat1_err = arr.loc[(arch,1),(n,'lat','std')]
        line.append( toString( bw)+" $\pm$ "+toString(bw_err))
        line.append( toString( lat1)+" $\pm$ "+toString(lat1_err) )
        if arch == 'i5' or arch == 'gtx1060' or arch == 'titanXp':
            line.append(toString(None))
        else:
            if (n == 'dot') or (n == 'axpby'):
                lat4 = arr.loc[(arch,4),(n,'lat','corr')]
                lat4_err = arr.loc[(arch,4),(n,'lat','std')]
            else:
                lat4 = arr.loc[(arch,4),('dxdy2','lat','corr')]
                lat4_err = arr.loc[(arch,4),('dxdy2','lat','std')]  
            line.append( toString( lat4)+" $\pm$ "+toString(lat4_err))

                
    lines.append(line)
    
tuples=[]  


for p in ['axpby','dot','B(P=2)','B(P=3)','B(P=4)','B(P=5)']:
    #for q in ['efficiency [\% bw]','lat s [us]','lat d [us]']:
    for q in ['bandwidth [GB/s]','$T_{lat}(1)$ [$\mu$s]','$T_{lat}(4)$ [$\mu$s]']:
        tuples.append((p,q))
tuples[0] = ('axpby','bandwidth [GB/s]')
    

cols=pd.MultiIndex.from_tuples(tuples)

toDisk = pd.DataFrame(lines, index=index, columns=cols)
theo = [34,192,256,'>400',732,547,898]
toDisk.insert(0,('peak','bandwidth [GB/s]'),theo)
pd.set_option('display.float_format', lambda x: '%.0f' % x)
filename='axpby-dot.tex'
with open(filename, 'wb') as f:
    f.write(bytes(toDisk.iloc[:,0:7].to_latex(
        escape=False,
        column_format='@{}lp{1.5cm}p{1.5cm}p{1.2cm}p{1.2cm}p{1.5cm}p{1.2cm}p{1.4cm}@{}',
        bold_rows=True),'UTF-8'))
toDisk.iloc[:,0:7]
peak axpby dot
bandwidth [GB/s] bandwidth [GB/s] $T_{lat}(1)$ [$\mu$s] $T_{lat}(4)$ [$\mu$s] bandwidth [GB/s] $T_{lat}(1)$ [$\mu$s] $T_{lat}(4)$ [$\mu$s]
i5 34 30 $\pm$ 01 00 $\pm$ 02 n/a 10 $\pm$ 01 05 $\pm$ 01 n/a
gtx1060 192 158 $\pm$ 01 00 $\pm$ 01 n/a 27 $\pm$ 01 93 $\pm$ 09 n/a
skl 256 207 $\pm$ 06 00 $\pm$ 01 00 $\pm$ 01 193 $\pm$ 19 18 $\pm$ 03 38 $\pm$ 05
knl >400 394 $\pm$ 23 06 $\pm$ 01 10 $\pm$ 01 142 $\pm$ 07 55 $\pm$ 02 120 $\pm$ 06
p100 732 551 $\pm$ 02 08 $\pm$ 01 00 $\pm$ 01 376 $\pm$ 02 51 $\pm$ 08 52 $\pm$ 01
titanXp 547 432 $\pm$ 04 00 $\pm$ 01 n/a 62 $\pm$ 01 45 $\pm$ 06 n/a
v100 898 847 $\pm$ 01 01 $\pm$ 01 00 $\pm$ 01 611 $\pm$ 06 89 $\pm$ 05 98 $\pm$ 01
dxdy = toDisk.loc[:,[('B(P=2)','bandwidth [GB/s]'),
                     ('B(P=3)','bandwidth [GB/s]'),
                     ('B(P=4)','bandwidth [GB/s]'),
                     ('B(P=5)','bandwidth [GB/s]'),
                     ('B(P=2)','$T_{lat}(1)$ [$\mu$s]'),
                     ('B(P=2)','$T_{lat}(4)$ [$\mu$s]'),
                    ]]
dxdy.columns = ['B(P=2) [GB/s]','B(P=3) [GB/s]','B(P=4) [GB/s]','B(P=5) [GB/s]',
                         '$T_{lat}(1)$ [$\mu$s]','$T_{lat}(4)$ [$\mu$s]']

filename='dxdy.tex'
with open(filename, 'wb') as f:
    f.write(bytes(dxdy.to_latex(
        escape=False,
        column_format='lp{1.5cm}p{1.5cm}p{1.5cm}p{1.5cm}p{1.2cm}p{1.2cm}',
        bold_rows=True),'UTF-8'))
dxdy
B(P=2) [GB/s] B(P=3) [GB/s] B(P=4) [GB/s] B(P=5) [GB/s] $T_{lat}(1)$ [$\mu$s] $T_{lat}(4)$ [$\mu$s]
i5 28 $\pm$ 03 30 $\pm$ 03 26 $\pm$ 02 22 $\pm$ 02 00 $\pm$ 02 n/a
gtx1060 131 $\pm$ 01 112 $\pm$ 02 84 $\pm$ 14 70 $\pm$ 18 00 $\pm$ 01 n/a
skl 182 $\pm$ 36 162 $\pm$ 13 119 $\pm$ 19 111 $\pm$ 09 23 $\pm$ 03 29 $\pm$ 03
knl 240 $\pm$ 18 173 $\pm$ 27 127 $\pm$ 19 102 $\pm$ 15 10 $\pm$ 01 53 $\pm$ 04
p100 294 $\pm$ 08 239 $\pm$ 13 209 $\pm$ 08 172 $\pm$ 18 27 $\pm$ 01 55 $\pm$ 01
titanXp 373 $\pm$ 05 309 $\pm$ 10 247 $\pm$ 08 198 $\pm$ 32 03 $\pm$ 01 n/a
v100 795 $\pm$ 21 736 $\pm$ 34 697 $\pm$ 16 590 $\pm$ 66 05 $\pm$ 01 38 $\pm$ 01