Determine the latencies and bandwidths quantitatively
Contents
Determine the latencies and bandwidths quantitatively¶
import numpy as np
from collections import OrderedDict as odict
import pandas as pd
import matplotlib.pyplot as plt
import matplotlib.cm as cm
This notebook computes the latencies and bandwidths of the three primitive function types. Again, start with a list of files and architectures that we want to investigate.
#(hardware name, number of nodes)
files = odict({})
files['i5'] = ('i5',1)
files['gtx1060'] = ('gtx1060',1)
files['skl_mpi1'] = ('skl',1)
files['skl_mpi2'] = ('skl',2)
files['skl_mpi4'] = ('skl',4)
files['knl_mpi1'] = ('knl',1)
files['knl_mpi2'] = ('knl',2)
files['knl_mpi4'] = ('knl',4)
files['p100nv_mpi1'] = ('p100',1)
files['p100nv_mpi2'] = ('p100',2)
files['p100nv_mpi4'] = ('p100',4)
files['titanxp'] = ('titanXp',1)
files['v100nv_mpi1'] = ('v100',1)
files['v100nv_mpi2'] = ('v100',2)
files['v100nv_mpi4'] = ('v100',4)
pd.set_option('precision',2)
Assumptions¶
there are three basic functions: trivially parallel(axpby), nearest neighbor (dxdy), global reduction (dot)
each can be represented by the single node bandwidth, the single node latency and the multinode latency
But¶
does not capture cache effect e.g. in SKl
First, we read in files and extract runtimes and compute bandwidth for every measurement. We also compute the average runtime and bandwidth for every combination of n, Nx and Ny.
Axpby and Dot Bandwidths¶
The bandwidths are determined by taking the average bandwidth of the 30 bandwidths corresponding to the 3 largest sizes. The error is the standard deviation on these bandwidths.
Axpby and Dot Latencies¶
We try three different methods to compute the latency. First, we try the mean average runtime for the 3 smallest sizes. Second, we take the minimum of the average runtime among all combinations of n, Nx, and Ny. The error is the standard deviation corresponding to this average runtime. Finally, we use a correction factor from the more accurate bandwidths.
Dx-Dy Bandwidths¶
Since the efficiency of the matrix-vector multiplications depends on the polynomial coefficient we should compute these bandwidths separately for each coefficient and use sizes between 10MB and 1000MB
Dx-Dy Latencies¶
As in Axpby
names={'axpby':3,'dot':2,'dx':3, 'dy':3}
#ns=[3,4]
values = []
for f, v in files.items() :
runtimes=pd.read_csv('benchmark_'+f+'.csv', delimiter=' ')
#add size and bandwidth columns
runtimes.insert(0,'size', 8*runtimes['n']*runtimes['n']
*runtimes['Nx']*runtimes['Ny']/1e6/v[1]) #inplace transformation
for name,memops in names.items() :
runtimes.insert(0,name+'_bw',runtimes['size']/1000*memops/runtimes[name])
runtimes = runtimes.assign( dxdy=(runtimes['dx']+runtimes['dy'])/2)
runtimes = runtimes.assign( dxdy_bw=2.0*runtimes['dx_bw']*runtimes['dy_bw']
/(runtimes['dx_bw']+runtimes['dy_bw']))
#compute one version with aggregated grouped sizes and one without
avgruntimes=runtimes.groupby(['n', 'Nx','Ny','size']).agg(['mean', 'std'])
avgruntimes=avgruntimes.reset_index(level=['n','Nx','Ny','size'])
avgruntimes.sort_values(by='size',inplace=True) #sort by size
runtimes.sort_values(by='size',inplace=True)
##first compute axpby and dot latencies and bandwidths
nmax = 3
s =30
line = []
l=len(runtimes)
line.append(v[0]) #0
line.append(v[1]) #1
for q in ['axpby','dot']:
bandwidth = runtimes[l-s:l][q+'_bw'].mean()
err_bandwidth = runtimes[l-s:l][q+'_bw'].std()
mean_latency = avgruntimes[0:nmax][(q,'mean')].mean()/1e-6
min_latency = avgruntimes[(q,'mean')].min()/1e-6
idx_min_latency = avgruntimes[(q,'mean')].idxmin()
corr_min_latency= min_latency - avgruntimes['size'].loc[idx_min_latency]*names[q]/bandwidth/1e-3
err_latency = avgruntimes[(q,'std')].loc[idx_min_latency]/1e-6
if corr_min_latency <0 : corr_min_latency = 0
line.append(bandwidth) #2 bandwidth
line.append(err_bandwidth) #3 err_bandwidth
line.append(mean_latency) #4 latency mean
line.append(min_latency) #5 latency min
line.append(corr_min_latency ) #6 corrected lat
line.append(err_latency ) #6 error lat
##now compute latency and bandwidths of dx and y
for n in [2,3,4,5]:
#take n
dxdy=runtimes[runtimes['n']==n]
avgdxdy = avgruntimes[avgruntimes['n']==n]
dxdy=dxdy.sort_values(by='size')
avgdxdy=avgdxdy.sort_values(by='size') #sort by size
bandwidth = dxdy[(dxdy['size']>10)&(dxdy['size']<1000)]['dxdy_bw'].mean()
err_bandwidth = dxdy[(dxdy['size']>10)&(dxdy['size']<1000)]['dxdy_bw'].std()
mean_latency = avgdxdy[0:nmax][('dxdy','mean')].mean()/1e-6
min_latency = avgdxdy[('dxdy','mean')].min()/1e-6
idx_min_latency = avgdxdy[('dxdy','mean')].idxmin()
corr_min_latency= min_latency - avgdxdy['size'].loc[idx_min_latency]*names['dx']/bandwidth/1e-3
err_latency = avgdxdy[('dxdy','std')].loc[idx_min_latency]/1e-6
if corr_min_latency <0 :
corr_min_latency = 0
line.append(bandwidth) #2 bandwidth
line.append(err_bandwidth) #3 err_bandwidth
line.append(mean_latency) #4 latency mean
line.append(min_latency) #5 latency min
line.append(corr_min_latency ) #6 corrected lat
line.append(err_latency ) #6 error lat
values.append(line)
In the following cells we just display the results in a more accessible way or extract values to show in publications.
#now construct new more ordered table with values from previous cell
tuples=[('arch','',''),('nodes','','')]
for q in ['axpby','dot','dxdy2','dxdy3','dxdy4','dxdy5']:
tuples.append((q,'bw','avg'))
tuples.append((q,'bw','std'))
tuples.append((q,'lat','avg'))
tuples.append((q,'lat','min'))
tuples.append((q,'lat','corr'))
tuples.append((q,'lat','std'))
cols=pd.MultiIndex.from_tuples(tuples)
arr = pd.DataFrame(values,index=files.keys(), columns=cols)
arr.sort_values(by='arch',inplace=True)
arr.set_index(['arch','nodes'],inplace=True)
#arr.loc[:,[('dot','bw','avg'),('dot','lat','avg')]]
arr
axpby | dot | ... | dxdy4 | dxdy5 | ||||||||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
bw | lat | bw | lat | ... | lat | bw | lat | |||||||||||||||
avg | std | avg | min | corr | std | avg | std | avg | min | ... | avg | min | corr | std | avg | std | avg | min | corr | std | ||
arch | nodes | |||||||||||||||||||||
gtx1060 | 1 | 157.05 | 0.06 | 23.19 | 3.51 | 0.00 | 0.24 | 26.50 | 0.10 | 199.90 | 131.63 | ... | 322.56 | 72.50 | 0.00 | 0.27 | 69.26 | 17.04 | 571.43 | 125.05 | 0.00 | 0.45 |
i5 | 1 | 29.99 | 0.19 | 30.58 | 12.38 | 0.00 | 1.18 | 9.31 | 0.04 | 316.67 | 117.43 | ... | 1212.12 | 208.32 | 0.00 | 12.42 | 21.45 | 1.91 | 1956.57 | 340.41 | 0.00 | 1.31 |
knl | 1 | 393.15 | 22.19 | 11.81 | 9.98 | 5.47 | 0.30 | 141.36 | 6.63 | 77.05 | 63.20 | ... | 215.74 | 52.06 | 0.00 | 1.59 | 101.34 | 14.94 | 451.55 | 98.29 | 0.00 | 37.85 |
2 | 420.41 | 40.35 | 10.71 | 10.01 | 7.89 | 0.21 | 128.36 | 2.71 | 99.39 | 91.51 | ... | 207.80 | 92.39 | 59.93 | 1.70 | 88.47 | 7.45 | 363.12 | 133.93 | 71.21 | 8.96 | |
4 | 420.70 | 43.72 | 10.53 | 10.21 | 9.16 | 0.09 | 109.74 | 3.94 | 126.08 | 122.29 | ... | 155.14 | 82.89 | 67.54 | 5.47 | 89.24 | 13.82 | 235.93 | 108.08 | 76.99 | 6.04 | |
p100 | 1 | 550.51 | 1.23 | 14.36 | 10.38 | 7.52 | 0.02 | 375.61 | 1.94 | 158.73 | 68.34 | ... | 190.41 | 138.27 | 17.53 | 2.44 | 171.81 | 17.67 | 275.12 | 71.80 | 14.58 | 5.15 |
2 | 553.30 | 0.97 | 7.03 | 5.75 | 0.00 | 0.65 | 369.20 | 2.37 | 76.43 | 59.87 | ... | 125.22 | 65.47 | 49.16 | 3.43 | 166.21 | 14.15 | 189.63 | 80.20 | 50.63 | 5.60 | |
4 | 554.54 | 0.50 | 8.16 | 7.31 | 0.00 | 0.27 | 356.21 | 5.79 | 57.54 | 52.41 | ... | 101.07 | 69.34 | 60.86 | 2.02 | 160.39 | 16.98 | 139.32 | 89.97 | 74.65 | 1.59 | |
skl | 1 | 206.71 | 5.87 | 4.56 | 4.07 | 0.00 | 0.11 | 192.05 | 18.31 | 32.29 | 24.19 | ... | 340.88 | 82.83 | 15.38 | 1.05 | 110.87 | 8.03 | 442.70 | 132.54 | 20.32 | 2.07 |
2 | 216.29 | 7.02 | 4.17 | 3.95 | 0.00 | 0.18 | 182.31 | 11.77 | 31.34 | 25.87 | ... | 149.05 | 63.79 | 30.71 | 0.86 | 114.29 | 7.18 | 249.57 | 89.13 | 34.70 | 0.79 | |
4 | 232.62 | 15.11 | 4.11 | 4.02 | 0.00 | 0.26 | 166.86 | 23.97 | 43.80 | 42.40 | ... | 90.99 | 47.41 | 29.99 | 1.16 | 110.39 | 9.24 | 142.54 | 61.74 | 33.57 | 2.50 | |
titanXp | 1 | 431.24 | 3.45 | 6.56 | 2.58 | 0.00 | 0.21 | 61.37 | 0.12 | 87.86 | 61.46 | ... | 115.69 | 28.69 | 3.20 | 0.12 | 197.91 | 31.44 | 211.14 | 48.84 | 0.00 | 1.62 |
v100 | 1 | 846.42 | 0.95 | 5.01 | 4.68 | 0.50 | 0.34 | 610.15 | 5.99 | 95.96 | 92.35 | ... | 45.70 | 11.74 | 2.70 | 0.02 | 589.97 | 65.17 | 78.97 | 22.98 | 6.32 | 0.08 |
2 | 846.01 | 1.07 | 5.66 | 5.58 | 0.00 | 0.03 | 578.96 | 12.17 | 102.71 | 99.70 | ... | 47.17 | 36.54 | 31.64 | 0.17 | 567.50 | 46.64 | 61.97 | 38.84 | 30.18 | 0.27 | |
4 | 841.47 | 3.03 | 5.73 | 5.60 | 0.00 | 0.31 | 526.26 | 16.99 | 103.29 | 98.08 | ... | 42.52 | 38.74 | 36.29 | 2.10 | 556.61 | 56.29 | 47.54 | 39.42 | 35.01 | 1.77 |
15 rows × 36 columns
#arr=arr.reset_index()
#define conversion function for writing nice output tables
def toString(x):
if pd.isnull(x) : return 'n/a'
#string = '%.1f'% x
string = '%d' %np.ceil(x)
#if np.ceil(x)<100 : string = '0'+string
if np.ceil(x)<10 : string = '0'+string
return string
addto = []
for n in ['axpby','dot','dxdy2','dxdy3','dxdy4','dxdy5']:
arr.loc[:,(n,'bw','string')]= arr[n]['bw']['avg'].apply(toString) +" ± "+arr[n]['bw']['std'].apply(toString)
arr.loc[:,(n,'lat','string')]= arr[n]['lat']['corr'].apply(toString) +" ± "+arr[n]['lat']['std'].apply(toString)
addto.append((n,'lat','string'))
addto.append((n,'bw','string'))
#make a table for display
nicetable=arr[addto]
drop = nicetable.columns.droplevel(2)
nicetable.columns=drop
#nicetable.reset_index(inplace=True)
#nicetable.set_index('arch')
newindex=[('i5',1)]
newindex.append(('gtx1060',1))
newindex.append(('titanXp',1))
for n in ['skl','knl']:
for m in [1,2,4]:
newindex.append((n,m))
for n in ['p100','v100']:
for m in [1,2,4]:
newindex.append((n,m))
nicetable=nicetable.reindex(newindex)
nicetable
axpby | dot | dxdy2 | dxdy3 | dxdy4 | dxdy5 | ||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|
lat | bw | lat | bw | lat | bw | lat | bw | lat | bw | lat | bw | ||
arch | nodes | ||||||||||||
i5 | 1 | 00 ± 02 | 30 ± 01 | 05 ± 01 | 10 ± 01 | 00 ± 02 | 28 ± 03 | 00 ± 04 | 30 ± 03 | 00 ± 13 | 26 ± 02 | 00 ± 02 | 22 ± 02 |
gtx1060 | 1 | 00 ± 01 | 158 ± 01 | 93 ± 09 | 27 ± 01 | 00 ± 01 | 131 ± 01 | 03 ± 01 | 112 ± 02 | 00 ± 01 | 84 ± 14 | 00 ± 01 | 70 ± 18 |
titanXp | 1 | 00 ± 01 | 432 ± 04 | 45 ± 06 | 62 ± 01 | 03 ± 01 | 373 ± 05 | 02 ± 01 | 309 ± 10 | 04 ± 01 | 247 ± 08 | 00 ± 02 | 198 ± 32 |
skl | 1 | 00 ± 01 | 207 ± 06 | 18 ± 03 | 193 ± 19 | 23 ± 03 | 182 ± 36 | 17 ± 01 | 162 ± 13 | 16 ± 02 | 119 ± 19 | 21 ± 03 | 111 ± 09 |
2 | 00 ± 01 | 217 ± 08 | 23 ± 01 | 183 ± 12 | 30 ± 03 | 175 ± 45 | 31 ± 03 | 158 ± 17 | 31 ± 01 | 121 ± 21 | 35 ± 01 | 115 ± 08 | |
4 | 00 ± 01 | 233 ± 16 | 38 ± 05 | 167 ± 24 | 29 ± 03 | 168 ± 44 | 30 ± 02 | 160 ± 08 | 30 ± 02 | 115 ± 25 | 34 ± 03 | 111 ± 10 | |
knl | 1 | 06 ± 01 | 394 ± 23 | 55 ± 02 | 142 ± 07 | 10 ± 01 | 240 ± 18 | 08 ± 02 | 173 ± 27 | 00 ± 02 | 127 ± 19 | 00 ± 38 | 102 ± 15 |
2 | 08 ± 01 | 421 ± 41 | 87 ± 02 | 129 ± 03 | 49 ± 01 | 176 ± 28 | 56 ± 03 | 142 ± 23 | 60 ± 02 | 110 ± 15 | 72 ± 09 | 89 ± 08 | |
4 | 10 ± 01 | 421 ± 44 | 120 ± 06 | 110 ± 04 | 53 ± 04 | 156 ± 25 | 59 ± 04 | 128 ± 24 | 68 ± 06 | 116 ± 22 | 77 ± 07 | 90 ± 14 | |
p100 | 1 | 08 ± 01 | 551 ± 02 | 51 ± 08 | 376 ± 02 | 27 ± 01 | 294 ± 08 | 09 ± 02 | 239 ± 13 | 18 ± 03 | 209 ± 08 | 15 ± 06 | 172 ± 18 |
2 | 00 ± 01 | 554 ± 01 | 51 ± 09 | 370 ± 03 | 51 ± 06 | 257 ± 19 | 51 ± 05 | 223 ± 17 | 50 ± 04 | 193 ± 17 | 51 ± 06 | 167 ± 15 | |
4 | 00 ± 01 | 555 ± 01 | 52 ± 01 | 357 ± 06 | 55 ± 01 | 240 ± 12 | 57 ± 02 | 200 ± 26 | 61 ± 03 | 186 ± 16 | 75 ± 02 | 161 ± 17 | |
v100 | 1 | 01 ± 01 | 847 ± 01 | 89 ± 05 | 611 ± 06 | 05 ± 01 | 795 ± 21 | 03 ± 01 | 736 ± 34 | 03 ± 01 | 697 ± 16 | 07 ± 01 | 590 ± 66 |
2 | 00 ± 01 | 847 ± 02 | 99 ± 05 | 579 ± 13 | 32 ± 01 | 703 ± 59 | 34 ± 01 | 689 ± 48 | 32 ± 01 | 642 ± 55 | 31 ± 01 | 568 ± 47 | |
4 | 00 ± 01 | 842 ± 04 | 98 ± 01 | 527 ± 17 | 38 ± 01 | 678 ± 37 | 38 ± 04 | 627 ± 84 | 37 ± 03 | 641 ± 47 | 36 ± 02 | 557 ± 57 |
index = ['i5','gtx1060','skl','knl','p100','titanXp','v100']
lines = []
for arch in index:
line = []
line.append(arch)
#first the bandwidths
for n in ['axpby','dot','dxdy2','dxdy3','dxdy4','dxdy5']:
line.append( arr.loc[(arch,1),(n,'bw','avg')] )
line.append( arr.loc[(arch,1),(n,'bw','std')])
for n in ['axpby','dot','dxdy2'] :
line.append( arr.loc[(arch,1),(n,'lat','corr')] )
line.append( arr.loc[(arch,1),(n,'lat','std')])
if arch == 'i5' or arch == 'gtx1060' or arch == 'titanXp':
line.append(None)
line.append(None)
else:
line.append( arr.loc[(arch,4),(n,'lat','corr')] )
line.append( arr.loc[(arch,4),(n,'lat','std')] )
lines.append(line)
tuples=['arch']
for n in ['axpby','dot','dxdy2','dxdy3','dxdy4','dxdy5']:
tuples.append(n+'_bw')
tuples.append(n+'_bw_err')
for n in ['axpby','dot','dxdy']:
tuples.append(n+'_lat_shared')
tuples.append(n+'_lat_shared_err')
tuples.append(n+'_lat_dist')
tuples.append(n+'_lat_dist_err')
cols=tuples
toDisk = pd.DataFrame(lines, columns=cols)
toDisk.to_csv('performance.csv',sep=' ',index=False)
pd.set_option('display.float_format', lambda x: '%.2f' % x)
test = pd.read_csv('performance.csv',delimiter=' ')
test
arch | axpby_bw | axpby_bw_err | dot_bw | dot_bw_err | dxdy2_bw | dxdy2_bw_err | dxdy3_bw | dxdy3_bw_err | dxdy4_bw | ... | axpby_lat_dist | axpby_lat_dist_err | dot_lat_shared | dot_lat_shared_err | dot_lat_dist | dot_lat_dist_err | dxdy_lat_shared | dxdy_lat_shared_err | dxdy_lat_dist | dxdy_lat_dist_err | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | i5 | 29.99 | 0.19 | 9.31 | 0.04 | 27.79 | 2.97 | 29.12 | 2.84 | 25.58 | ... | NaN | NaN | 4.76 | 0.23 | NaN | NaN | 0.00 | 1.44 | NaN | NaN |
1 | gtx1060 | 157.05 | 0.06 | 26.50 | 0.10 | 130.63 | 0.40 | 111.23 | 1.11 | 83.82 | ... | NaN | NaN | 92.06 | 8.70 | NaN | NaN | 0.00 | 0.82 | NaN | NaN |
2 | skl | 206.71 | 5.87 | 192.05 | 18.31 | 181.56 | 35.38 | 161.75 | 13.00 | 118.06 | ... | 0.00 | 0.26 | 17.28 | 2.32 | 37.93 | 4.14 | 22.70 | 2.11 | 28.52 | 2.10 |
3 | knl | 393.15 | 22.19 | 141.36 | 6.63 | 239.04 | 17.02 | 172.69 | 26.80 | 126.04 | ... | 9.16 | 0.09 | 54.83 | 1.79 | 119.59 | 5.14 | 9.93 | 0.70 | 52.67 | 3.72 |
4 | p100 | 550.51 | 1.23 | 375.61 | 1.94 | 293.25 | 7.11 | 238.99 | 12.63 | 208.44 | ... | 0.00 | 0.27 | 50.89 | 7.06 | 51.67 | 0.59 | 26.23 | 0.05 | 54.40 | 0.35 |
5 | titanXp | 431.24 | 3.45 | 61.37 | 0.12 | 372.85 | 4.16 | 308.92 | 9.47 | 246.73 | ... | NaN | NaN | 44.37 | 5.15 | NaN | NaN | 2.38 | 0.57 | NaN | NaN |
6 | v100 | 846.42 | 0.95 | 610.15 | 5.99 | 794.43 | 20.52 | 735.42 | 33.02 | 696.49 | ... | 0.00 | 0.31 | 88.49 | 4.68 | 97.58 | 0.79 | 4.20 | 0.02 | 37.19 | 0.42 |
7 rows × 25 columns
Observations¶
note the high latency in the knl MPI implementation of dxdy. It seems to suffer from the same problem as the GPUs. (Is this the speed of PCIe we see?)
#index = ['i5','gtx1060','titanXp', 'skl','knl','p100','v100']
lines = []
for arch in index:
line = []
for n in ['axpby','dot','dxdy2','dxdy3','dxdy4','dxdy5']:
bw = arr.loc[(arch,1),(n,'bw','avg')]
bw_err = arr.loc[(arch,1),(n,'bw','std')]
lat1 = arr.loc[(arch,1),(n,'lat','corr')]
lat1_err = arr.loc[(arch,1),(n,'lat','std')]
line.append( toString( bw)+" $\pm$ "+toString(bw_err))
line.append( toString( lat1)+" $\pm$ "+toString(lat1_err) )
if arch == 'i5' or arch == 'gtx1060' or arch == 'titanXp':
line.append(toString(None))
else:
if (n == 'dot') or (n == 'axpby'):
lat4 = arr.loc[(arch,4),(n,'lat','corr')]
lat4_err = arr.loc[(arch,4),(n,'lat','std')]
else:
lat4 = arr.loc[(arch,4),('dxdy2','lat','corr')]
lat4_err = arr.loc[(arch,4),('dxdy2','lat','std')]
line.append( toString( lat4)+" $\pm$ "+toString(lat4_err))
lines.append(line)
tuples=[]
for p in ['axpby','dot','B(P=2)','B(P=3)','B(P=4)','B(P=5)']:
#for q in ['efficiency [\% bw]','lat s [us]','lat d [us]']:
for q in ['bandwidth [GB/s]','$T_{lat}(1)$ [$\mu$s]','$T_{lat}(4)$ [$\mu$s]']:
tuples.append((p,q))
tuples[0] = ('axpby','bandwidth [GB/s]')
cols=pd.MultiIndex.from_tuples(tuples)
toDisk = pd.DataFrame(lines, index=index, columns=cols)
theo = [34,192,256,'>400',732,547,898]
toDisk.insert(0,('peak','bandwidth [GB/s]'),theo)
pd.set_option('display.float_format', lambda x: '%.0f' % x)
filename='axpby-dot.tex'
with open(filename, 'wb') as f:
f.write(bytes(toDisk.iloc[:,0:7].to_latex(
escape=False,
column_format='@{}lp{1.5cm}p{1.5cm}p{1.2cm}p{1.2cm}p{1.5cm}p{1.2cm}p{1.4cm}@{}',
bold_rows=True),'UTF-8'))
toDisk.iloc[:,0:7]
peak | axpby | dot | |||||
---|---|---|---|---|---|---|---|
bandwidth [GB/s] | bandwidth [GB/s] | $T_{lat}(1)$ [$\mu$s] | $T_{lat}(4)$ [$\mu$s] | bandwidth [GB/s] | $T_{lat}(1)$ [$\mu$s] | $T_{lat}(4)$ [$\mu$s] | |
i5 | 34 | 30 $\pm$ 01 | 00 $\pm$ 02 | n/a | 10 $\pm$ 01 | 05 $\pm$ 01 | n/a |
gtx1060 | 192 | 158 $\pm$ 01 | 00 $\pm$ 01 | n/a | 27 $\pm$ 01 | 93 $\pm$ 09 | n/a |
skl | 256 | 207 $\pm$ 06 | 00 $\pm$ 01 | 00 $\pm$ 01 | 193 $\pm$ 19 | 18 $\pm$ 03 | 38 $\pm$ 05 |
knl | >400 | 394 $\pm$ 23 | 06 $\pm$ 01 | 10 $\pm$ 01 | 142 $\pm$ 07 | 55 $\pm$ 02 | 120 $\pm$ 06 |
p100 | 732 | 551 $\pm$ 02 | 08 $\pm$ 01 | 00 $\pm$ 01 | 376 $\pm$ 02 | 51 $\pm$ 08 | 52 $\pm$ 01 |
titanXp | 547 | 432 $\pm$ 04 | 00 $\pm$ 01 | n/a | 62 $\pm$ 01 | 45 $\pm$ 06 | n/a |
v100 | 898 | 847 $\pm$ 01 | 01 $\pm$ 01 | 00 $\pm$ 01 | 611 $\pm$ 06 | 89 $\pm$ 05 | 98 $\pm$ 01 |
dxdy = toDisk.loc[:,[('B(P=2)','bandwidth [GB/s]'),
('B(P=3)','bandwidth [GB/s]'),
('B(P=4)','bandwidth [GB/s]'),
('B(P=5)','bandwidth [GB/s]'),
('B(P=2)','$T_{lat}(1)$ [$\mu$s]'),
('B(P=2)','$T_{lat}(4)$ [$\mu$s]'),
]]
dxdy.columns = ['B(P=2) [GB/s]','B(P=3) [GB/s]','B(P=4) [GB/s]','B(P=5) [GB/s]',
'$T_{lat}(1)$ [$\mu$s]','$T_{lat}(4)$ [$\mu$s]']
filename='dxdy.tex'
with open(filename, 'wb') as f:
f.write(bytes(dxdy.to_latex(
escape=False,
column_format='lp{1.5cm}p{1.5cm}p{1.5cm}p{1.5cm}p{1.2cm}p{1.2cm}',
bold_rows=True),'UTF-8'))
dxdy
B(P=2) [GB/s] | B(P=3) [GB/s] | B(P=4) [GB/s] | B(P=5) [GB/s] | $T_{lat}(1)$ [$\mu$s] | $T_{lat}(4)$ [$\mu$s] | |
---|---|---|---|---|---|---|
i5 | 28 $\pm$ 03 | 30 $\pm$ 03 | 26 $\pm$ 02 | 22 $\pm$ 02 | 00 $\pm$ 02 | n/a |
gtx1060 | 131 $\pm$ 01 | 112 $\pm$ 02 | 84 $\pm$ 14 | 70 $\pm$ 18 | 00 $\pm$ 01 | n/a |
skl | 182 $\pm$ 36 | 162 $\pm$ 13 | 119 $\pm$ 19 | 111 $\pm$ 09 | 23 $\pm$ 03 | 29 $\pm$ 03 |
knl | 240 $\pm$ 18 | 173 $\pm$ 27 | 127 $\pm$ 19 | 102 $\pm$ 15 | 10 $\pm$ 01 | 53 $\pm$ 04 |
p100 | 294 $\pm$ 08 | 239 $\pm$ 13 | 209 $\pm$ 08 | 172 $\pm$ 18 | 27 $\pm$ 01 | 55 $\pm$ 01 |
titanXp | 373 $\pm$ 05 | 309 $\pm$ 10 | 247 $\pm$ 08 | 198 $\pm$ 32 | 03 $\pm$ 01 | n/a |
v100 | 795 $\pm$ 21 | 736 $\pm$ 34 | 697 $\pm$ 16 | 590 $\pm$ 66 | 05 $\pm$ 01 | 38 $\pm$ 01 |