These test results were generated by running SpecialNumbers.exe. This program was designed to measure the worst case penalties for operations on NANs, infinities and denormal numbers, along with the costs for high precision divides and square roots, divide by zero, overflow, etc.
For more information see the overview article.
These results demonstrate the performance penalties incurred by various x86 compatible processors when handling 'special' numbers such as NANs, infinities, and denormals. The article that explains this is available here.
These are raw results that can be difficult to understand at first, so here's a quick guide. The test program was run on several machines and the results were pasted in, without any modification except inserting the type of CPU in parentheses in the heading and breaking the results into two sections. The results vary a bit from run to run - perhaps five percent - but not enough to be worrisome.
For each test the chart shows how many times that operation was performed. Fast operations were done additional times, to ensure sufficient accuracy.
The chart also shows how long that test took - most of the tests take about a tenth of a second. The timing is done using the processor's rdtsc instruction, which tells you how many clock cycles had elapsed, and the processor’s measured clock speed (which is displayed in the heading and has been verified in each case to be correct).
Then, based on the count and the time, the chart displays how many million operations per second (Mops/sec) could be performed. It also displays approximately how many cycles each operation takes.
Finally, the chart displays the 'slowdown'. This slowdown is not relative to another processor, it is relative to the first item in that block, for that same processor. So, when the chart says that "adding nan to nan" has a slowdown of 95.347 for the 166.0 MHz CPU, it means that adding of NANs runs more than 95 times slower than "adding nrm to nrm" - adding normal numbers to other normal numbers.
Note that the program used to generate these results is a benchmarking program, not a real app. It was designed to tease out performance problems and emphasize them as much as possible. Thus, it is unlikely that there is any real-world application where NANs would slow down performance 900 to 1. However, if a real-world application sees just 10% - or even 1% - of the slowdown that was found here, it could be pretty significant - and at least one guy did. Some of the tests shown at this page show about a 100:1 slowdown caused by NANs and infinities - compared to a P4 using SSE2 or an Athlon.
All recent Intel x86 processors can do one add per clock cycle, but special numbers (in this case, infinities and nans) cause them some grief.
Pentium has 60-95 times penalty for adding special numbers.
Pentium III has 115-130 times penalty for adding special numbers.
Pentium IV has 850-930 times penalty for adding special numbers.
Pentium can do an fld/fst pair in 3.4 cycles.
Pentium can do an fld/fst pair in 2.0 cycles.
Pentium can do an fld/fst pair in 3.7 cycles.
Pentium has 20 times penalty for loading nans (Pentium only).
Pentium has 47 times penalty for loading denormals.
Pentium has 3 times penalty for loading across QWORD boundaries.
Pentium has 3 times penalty for loading across cache line boundaries.
Pentium has 3 times penalty for loading across page boundaries.
Pentium has 6 times penalty for saving across QWORD boundaries.
Pentium has 6 times penalty for saving across cache line boundaries.
Pentium has 16 times penalty for saving across page boundaries.
Pentium III has 110 times penalty for loading denormals.
Pentium III has 0.7 times!!! ‘penalty’ for loading across QWORD boundaries.
Pentium III has 5.5 times penalty for loading across cache line boundaries.
Pentium III has 47 times penalty for loading across page boundaries.
Pentium III has 0.7 times!!! ‘penalty’ for saving across QWORD boundaries.
Pentium III has 4.5 times penalty for saving across cache line boundaries.
Pentium III has 64 times penalty for saving across page boundaries.
Pentium IV has 380 times penalty for loading denormals.
Pentium IV has no penalty for loading across QWORD boundaries.
Pentium IV has 5.5 times penalty for loading across cache line boundaries.
Pentium IV has 20 times penalty for loading across page boundaries.
Pentium IV has no penalty for saving across QWORD boundaries.
Pentium IV has 22 times penalty for saving across cache line boundaries.
Pentium IV has 23 times penalty for saving across page boundaries.
Athlon processors do all of the tested floating point operations with regular numbers at the same speed per clock or slightly faster than the Intel chips. The Athlon chip has essential no penalties for dealing with denormals, infinities, NANs, and unaligned data. The one exception was in the load/store test where there is a significant penalty on denormals, probably for storing the denormal to memory. This penalty, while large, is smaller than the equivalent penalty on most of the Intel chips.
Identifier count time Mops/sec Cycles/op Slowdown
Register to register adds
adding nrm to nrm , 100000000, 0.66707, 149.909, 1.107, 1.000
adding nan to nan , 1000000, 0.63604, 1.572, 105.582, 95.347
adding inf to inf , 1000000, 0.45547, 2.196, 75.607, 68.278
adding inf to nan , 1000000, 0.43608, 2.293, 72.388, 65.371
adding nrm to inf , 1000000, 0.41178, 2.429, 68.355, 61.729
adding inf to nrm , 1000000, 0.45432, 2.201, 75.418, 68.107
adding den to nrm , 100000000, 0.66594, 150.164, 1.105, 0.998
Memory to register adds
adding nrm to nrm , 50000000, 0.90880, 55.017, 3.017, 1.000
adding den to nrm , 1000000, 0.44808, 2.232, 74.381, 24.652
Loads and stores - type tests
Loading nrm , 50000000, 1.03000, 48.543, 3.420, 1.000
Loading nan , 5000000, 2.06494, 2.421, 68.556, 20.048
Loading inf , 50000000, 1.02932, 48.576, 3.417, 0.999
Loading den , 1000000, 0.96876, 1.032, 160.814, 47.027
Loads and stores - alignment tests
quad aligned , 50000000, 1.03167, 48.465, 3.425, 1.000
dword aligned src , 5000000, 0.30293, 16.505, 10.057, 2.936
byte aligned src , 5000000, 0.30275, 16.515, 10.051, 2.935
cache unaligned src , 2000000, 0.12124, 16.497, 10.063, 2.938
page unaligned src , 2000000, 0.12111, 16.514, 10.052, 2.935
dword aligned dst , 5000000, 0.61809, 8.089, 20.521, 5.991
byte aligned dst , 5000000, 0.61789, 8.092, 20.514, 5.989
cache unaligned dst , 2000000, 0.24734, 8.086, 20.529, 5.994
page unaligned dst , 2000000, 0.67550, 2.961, 56.066, 16.369
Identifier count time Mops/sec Cycles/op Slowdown
Register to register adds
adding nrm to nrm , 100000000, 0.14869, 672.561, 1.038, 1.000
adding nan to nan , 1000000, 0.19101, 5.235, 133.325, 128.466
adding inf to inf , 1000000, 0.17423, 5.740, 121.609, 117.177
adding inf to nan , 1000000, 0.17603, 5.681, 122.867, 118.389
adding nrm to inf , 1000000, 0.17755, 5.632, 123.928, 119.411
adding inf to nrm , 1000000, 0.17402, 5.747, 121.464, 117.037
adding den to nrm , 100000000, 0.14937, 669.498, 1.043, 1.005
Memory to register adds
adding nrm to nrm , 50000000, 0.22451, 222.705, 3.134, 1.000
adding den to nrm , 1000000, 0.18501, 5.405, 129.140, 41.204
Loads and stores - type tests
Loading nrm , 50000000, 0.15132, 330.418, 2.112, 1.000
Loading nan , 5000000, 0.01461, 342.331, 2.039, 0.965
Loading inf , 50000000, 0.14986, 333.645, 2.092, 0.990
Loading den , 1000000, 0.35793, 2.794, 249.837, 118.267
Loads and stores - alignment tests
quad aligned , 50000000, 0.14971, 333.984, 2.090, 1.000
dword aligned src , 50000000, 0.14936, 334.769, 2.085, 0.998
byte aligned src , 50000000, 0.15080, 331.563, 2.105, 1.007
cache unaligned src , 2000000, 0.03420, 58.481, 11.936, 5.711
page unaligned src , 2000000, 0.27447, 7.287, 95.790, 45.834
dword aligned dst , 50000000, 0.14881, 335.993, 2.077, 0.994
byte aligned dst , 50000000, 0.14903, 335.511, 2.080, 0.995
cache unaligned dst , 2000000, 0.02654, 75.369, 9.261, 4.431
page unaligned dst , 2000000, 0.36247, 5.518, 126.503, 60.530
Identifier count time Mops/sec Cycles/op Slowdown
Register to register adds
adding nrm to nrm , 100000000, 0.10285, 972.310, 1.025, 1.000
adding nan to nan , 1000000, 0.13444, 7.438, 134.037, 130.718
adding inf to inf , 1000000, 0.12093, 8.269, 120.571, 117.585
adding inf to nan , 1000000, 0.12290, 8.137, 122.531, 119.496
adding nrm to inf , 1000000, 0.12043, 8.303, 120.070, 117.097
adding inf to nrm , 1000000, 0.12074, 8.282, 120.377, 117.396
adding den to nrm , 100000000, 0.10302, 970.656, 1.027, 1.002
Memory to register adds
adding nrm to nrm , 50000000, 0.15501, 322.564, 3.091, 1.000
adding den to nrm , 1000000, 0.12830, 7.795, 127.910, 41.383
Loads and stores - type tests
Loading nrm , 50000000, 0.10488, 476.718, 2.091, 1.000
Loading nan , 5000000, 0.01040, 480.653, 2.074, 0.992
Loading inf , 50000000, 0.10354, 482.900, 2.065, 0.987
Loading den , 1000000, 0.25692, 3.892, 256.153, 122.480
Loads and stores - alignment tests
quad aligned , 50000000, 0.10371, 482.099, 2.068, 1.000
dword aligned src , 50000000, 0.10307, 485.119, 2.055, 0.994
byte aligned src , 50000000, 0.10314, 484.771, 2.057, 0.994
cache unaligned src , 2000000, 0.02356, 84.894, 11.744, 5.679
page unaligned src , 2000000, 0.18921, 10.570, 94.324, 45.610
dword aligned dst , 50000000, 0.10500, 476.205, 2.094, 1.012
byte aligned dst , 50000000, 0.10338, 483.634, 2.061, 0.997
cache unaligned dst , 2000000, 0.01855, 107.794, 9.249, 4.472
page unaligned dst , 2000000, 0.25033, 7.990, 124.788, 60.341
Identifier count time Mops/sec Cycles/op Slowdown
Register to register adds
adding nrm to nrm , 100000000, 0.03609, 2770.968, 1.010, 1.000
adding nan to nan , 1000000, 0.33711, 2.966, 943.917, 934.130
adding inf to inf , 1000000, 0.30770, 3.250, 861.567, 852.634
adding inf to nan , 1000000, 0.33289, 3.004, 932.101, 922.436
adding nrm to inf , 1000000, 0.30551, 3.273, 855.422, 846.553
adding inf to nrm , 1000000, 0.30793, 3.247, 862.216, 853.276
adding den to nrm , 100000000, 0.03611, 2769.486, 1.011, 1.001
Memory to register adds
adding nrm to nrm , 50000000, 0.10056, 497.225, 5.631, 1.000
adding den to nrm , 1000000, 0.37543, 2.664, 1051.196, 186.672
Loads and stores - type tests
Loading nrm , 50000000, 0.06476, 772.029, 3.627, 1.000
Loading nan , 5000000, 0.00643, 777.771, 3.600, 0.993
Loading inf , 50000000, 0.06490, 770.447, 3.634, 1.002
Loading den , 1000000, 0.50423, 1.983, 1411.835, 389.278
Loads and stores - alignment tests
quad aligned , 50000000, 0.03951, 1265.602, 2.212, 1.000
dword aligned src , 50000000, 0.04021, 1243.500, 2.252, 1.018
byte aligned src , 50000000, 0.04508, 1109.027, 2.525, 1.141
cache unaligned src , 2000000, 0.01441, 138.763, 20.178, 9.121
page unaligned src , 2000000, 0.05470, 36.565, 76.577, 34.613
dword aligned dst , 50000000, 0.03964, 1261.332, 2.220, 1.003
byte aligned dst , 50000000, 0.03938, 1269.640, 2.205, 0.997
cache unaligned dst , 2000000, 0.05760, 34.725, 80.634, 36.447
page unaligned dst , 2000000, 0.06204, 32.236, 86.860, 39.261
Identifier count time Mops/sec Cycles/op Slowdown
Register to register adds
adding nrm to nrm , 100000000, 0.06112, 1636.098, 1.059, 1.000
adding nan to nan , 1000000, 0.00058, 1732.889, 1.000, 0.944
adding inf to inf , 1000000, 0.00058, 1732.842, 1.000, 0.944
adding inf to nan , 1000000, 0.00058, 1732.842, 1.000, 0.944
adding nrm to inf , 1000000, 0.00058, 1718.254, 1.009, 0.952
adding inf to nrm , 1000000, 0.00058, 1732.863, 1.000, 0.944
adding den to nrm , 100000000, 0.09620, 1039.453, 1.667, 1.574
Memory to register adds
adding nrm to nrm , 50000000, 0.11755, 425.341, 4.074, 1.000
adding den to nrm , 1000000, 0.00274, 365.608, 4.740, 1.163
Loads and stores - type tests
Loading nrm , 50000000, 0.06750, 740.785, 2.339, 1.000
Loading nan , 5000000, 0.00693, 721.740, 2.401, 1.026
Loading inf , 50000000, 0.06797, 735.599, 2.356, 1.007
Loading den , 1000000, 0.09880, 10.122, 171.213, 73.186
Loads and stores - alignment tests
quad aligned , 50000000, 0.06933, 721.147, 2.403, 1.000
dword aligned src , 5000000, 0.00656, 762.122, 2.274, 0.946
byte aligned src , 5000000, 0.00612, 816.519, 2.122, 0.883
cache unaligned src , 2000000, 0.00277, 722.040, 2.400, 0.999
page unaligned src , 2000000, 0.00463, 432.322, 4.009, 1.668
dword aligned dst , 5000000, 0.00817, 611.628, 2.833, 1.179
byte aligned dst , 5000000, 0.00812, 615.461, 2.816, 1.172
cache unaligned dst , 2000000, 0.00332, 602.360, 2.877, 1.197
page unaligned dst , 2000000, 0.00368, 543.530, 3.188, 1.327
SSE2 adds
adding nrm to nrm , 100000000, 2.57211, 38.879, 43.572, 1.000
adding nan to nan , 10000000, 0.25681, 38.939, 43.504, 0.998
adding inf to inf , 10000000, 0.25628, 39.021, 43.413, 0.996
adding inf to nan , 10000000, 0.25656, 38.977, 43.461, 0.997
adding nrm to inf , 10000000, 0.25836, 38.706, 43.766, 1.004
adding inf to nrm , 10000000, 0.25733, 38.861, 43.591, 1.000
adding den to nrm , 10000000, 7.42158, 1.347, 1257.216, 28.854
More recent Intel processors are faster at square roots
Athlon processors are faster at divide and square root than Intel processors
Operations that produce overflows run more slowly than those that don’t
Dividing by a power of two is faster than dividing by other numbers
Divide and square root generally run faster if the processor is set to lower precision
Identifier count time Mops/sec Cycles/op Slowdown
Divide tests - float precision
dividing nrm by nrm , 10000000, 1.75820, 5.688, 29.186, 1.000
dividing nrm by 16 , 10000000, 1.75492, 5.698, 29.132, 0.998
dividing nrm by zero , 1000000, 0.58146, 1.720, 96.523, 3.307
Divide tests - double precision
dividing nrm by nrm , 10000000, 2.60363, 3.841, 43.220, 1.481
dividing nrm by 16 , 10000000, 2.78655, 3.589, 46.257, 1.585
dividing nrm by zero , 1000000, 0.57505, 1.739, 95.459, 3.271
Divide tests - extended precision
dividing nrm by nrm , 10000000, 2.96461, 3.373, 49.213, 1.686
dividing nrm by 16 , 10000000, 2.96476, 3.373, 49.215, 1.686
dividing nrm by zero , 1000000, 0.57494, 1.739, 95.440, 3.270
Multiply tests - float precision
multiply nrm by nrm , 10000000, 0.78636, 12.717, 13.054, 1.000
multiply nrm by 16 , 10000000, 0.96808, 10.330, 16.070, 1.231
multiply to overflow , 1000000, 0.62912, 1.590, 104.434, 8.000
multiply to underflow, 1000000, 0.64137, 1.559, 106.467, 8.156
Multiply tests - double precision
multiply nrm by nrm , 10000000, 0.78666, 12.712, 13.059, 1.000
multiply nrm by 16 , 10000000, 0.96816, 10.329, 16.071, 1.231
multiply to overflow , 1000000, 0.62944, 1.589, 104.486, 8.004
multiply to underflow, 1000000, 0.64139, 1.559, 106.470, 8.156
Multiply tests - extended precision
multiply nrm by nrm , 10000000, 0.78643, 12.716, 13.055, 1.000
multiply nrm by 16 , 10000000, 0.96804, 10.330, 16.070, 1.231
multiply to overflow , 1000000, 0.63159, 1.583, 104.845, 8.032
multiply to underflow, 1000000, 0.64139, 1.559, 106.472, 8.157
Square root tests - float precision
sqrt nrm , 1000000, 0.52406, 1.908, 86.994, 1.000
sqrt 4 , 5000000, 2.42029, 2.066, 80.354, 0.924
sqrt 2 , 1000000, 0.48393, 2.066, 80.332, 0.923
sqrt 9 , 1000000, 0.51420, 1.945, 85.358, 0.981
sqrt negative , 1000000, 0.85300, 1.172, 141.597, 1.628
Square root tests - double precision
sqrt nrm , 1000000, 0.48426, 2.065, 80.388, 0.924
sqrt 4 , 5000000, 2.42105, 2.065, 80.379, 0.924
sqrt 2 , 1000000, 0.48422, 2.065, 80.381, 0.924
sqrt 9 , 1000000, 0.48372, 2.067, 80.298, 0.923
sqrt negative , 1000000, 0.86213, 1.160, 143.114, 1.645
Square root tests - extended precision
sqrt nrm , 1000000, 0.52535, 1.904, 87.208, 1.002
sqrt 4 , 5000000, 2.42056, 2.066, 80.362, 0.924
sqrt 2 , 1000000, 0.48390, 2.067, 80.327, 0.923
sqrt 9 , 1000000, 0.48396, 2.066, 80.338, 0.923
sqrt negative , 1000000, 0.85405, 1.171, 141.772, 1.630
Identifier count time Mops/sec Cycles/op Slowdown
Divide tests - float precision
dividing nrm by nrm , 10000000, 0.28900, 34.603, 20.172, 1.000
dividing nrm by 16 , 10000000, 0.13355, 74.880, 9.322, 0.462
dividing nrm by zero , 1000000, 0.21059, 4.749, 146.990, 7.287
Divide tests - double precision
dividing nrm by nrm , 10000000, 0.49940, 20.024, 34.858, 1.728
dividing nrm by 16 , 10000000, 0.13491, 74.122, 9.417, 0.467
dividing nrm by zero , 1000000, 0.20092, 4.977, 140.245, 6.952
Divide tests - extended precision
dividing nrm by nrm , 10000000, 0.58864, 16.988, 41.087, 2.037
dividing nrm by 16 , 10000000, 0.13394, 74.662, 9.349, 0.463
dividing nrm by zero , 1000000, 0.20140, 4.965, 140.575, 6.969
Multiply tests - float precision
multiply nrm by nrm , 10000000, 0.13447, 74.368, 9.386, 1.000
multiply nrm by 16 , 10000000, 0.13665, 73.179, 9.538, 1.016
multiply to overflow , 1000000, 0.19417, 5.150, 135.530, 14.440
multiply to underflow, 1000000, 0.20847, 4.797, 145.510, 15.503
Multiply tests - double precision
multiply nrm by nrm , 10000000, 0.13386, 74.706, 9.343, 0.995
multiply nrm by 16 , 10000000, 0.13445, 74.378, 9.385, 1.000
multiply to overflow , 1000000, 0.19370, 5.163, 135.204, 14.405
multiply to underflow, 1000000, 0.21135, 4.732, 147.519, 15.717
Multiply tests - extended precision
multiply nrm by nrm , 10000000, 0.13364, 74.828, 9.328, 0.994
multiply nrm by 16 , 10000000, 0.13431, 74.455, 9.375, 0.999
multiply to overflow , 1000000, 0.19341, 5.170, 135.002, 14.384
multiply to underflow, 1000000, 0.20898, 4.785, 145.870, 15.542
Square root tests - float precision
sqrt nrm , 1000000, 0.04441, 22.516, 31.001, 1.000
sqrt 4 , 5000000, 0.06707, 74.546, 9.363, 0.302
sqrt 2 , 1000000, 0.04503, 22.210, 31.428, 1.014
sqrt 9 , 1000000, 0.04557, 21.945, 31.807, 1.026
sqrt negative , 1000000, 0.17409, 5.744, 121.512, 3.920
Square root tests - double precision
sqrt nrm , 1000000, 0.08880, 11.262, 61.980, 1.999
sqrt 4 , 5000000, 0.06703, 74.589, 9.358, 0.302
sqrt 2 , 1000000, 0.09143, 10.937, 63.818, 2.059
sqrt 9 , 1000000, 0.08826, 11.331, 61.603, 1.987
sqrt negative , 1000000, 0.17327, 5.771, 120.940, 3.901
Square root tests - extended precision
sqrt nrm , 1000000, 0.10371, 9.642, 72.392, 2.335
sqrt 4 , 5000000, 0.06699, 74.638, 9.352, 0.302
sqrt 2 , 1000000, 0.10674, 9.368, 74.508, 2.403
sqrt 9 , 1000000, 0.10395, 9.620, 72.557, 2.341
sqrt negative , 1000000, 0.17500, 5.714, 122.153, 3.940
Identifier count time Mops/sec Cycles/op Slowdown
Divide tests - float precision
dividing nrm by nrm , 10000000, 0.20179, 49.556, 20.119, 1.000
dividing nrm by 16 , 10000000, 0.09307, 107.451, 9.279, 0.461
dividing nrm by zero , 1000000, 0.14032, 7.126, 139.902, 6.954
Divide tests - double precision
dividing nrm by nrm , 10000000, 0.35426, 28.228, 35.320, 1.756
dividing nrm by 16 , 10000000, 0.09311, 107.404, 9.283, 0.461
dividing nrm by zero , 1000000, 0.13949, 7.169, 139.072, 6.913
Divide tests - extended precision
dividing nrm by nrm , 10000000, 0.41130, 24.313, 41.007, 2.038
dividing nrm by 16 , 10000000, 0.09298, 107.550, 9.270, 0.461
dividing nrm by zero , 1000000, 0.13949, 7.169, 139.069, 6.912
Multiply tests - float precision
multiply nrm by nrm , 10000000, 0.09569, 104.500, 9.541, 1.000
multiply nrm by 16 , 10000000, 0.09349, 106.967, 9.321, 0.977
multiply to overflow , 1000000, 0.13649, 7.327, 136.081, 14.263
multiply to underflow, 1000000, 0.14458, 6.917, 144.147, 15.109
Multiply tests - double precision
multiply nrm by nrm , 10000000, 0.09462, 105.687, 9.434, 0.989
multiply nrm by 16 , 10000000, 0.09300, 107.531, 9.272, 0.972
multiply to overflow , 1000000, 0.13560, 7.375, 135.192, 14.170
multiply to underflow, 1000000, 0.14506, 6.893, 144.629, 15.159
Multiply tests - extended precision
multiply nrm by nrm , 10000000, 0.09281, 107.752, 9.253, 0.970
multiply nrm by 16 , 10000000, 0.09314, 107.370, 9.286, 0.973
multiply to overflow , 1000000, 0.13438, 7.442, 133.976, 14.043
multiply to underflow, 1000000, 0.14421, 6.934, 143.778, 15.070
Square root tests - float precision
sqrt nrm , 1000000, 0.03167, 31.573, 31.577, 1.000
sqrt 4 , 5000000, 0.04628, 108.043, 9.228, 0.292
sqrt 2 , 1000000, 0.03141, 31.840, 31.313, 0.992
sqrt 9 , 1000000, 0.03320, 30.117, 33.104, 1.048
sqrt negative , 1000000, 0.12069, 8.286, 120.326, 3.811
Square root tests - double precision
sqrt nrm , 1000000, 0.06136, 16.297, 61.178, 1.937
sqrt 4 , 5000000, 0.04666, 107.160, 9.304, 0.295
sqrt 2 , 1000000, 0.06083, 16.440, 60.644, 1.920
sqrt 9 , 1000000, 0.06086, 16.432, 60.674, 1.921
sqrt negative , 1000000, 0.11991, 8.339, 119.554, 3.786
Square root tests - extended precision
sqrt nrm , 1000000, 0.07247, 13.799, 72.250, 2.288
sqrt 4 , 5000000, 0.04657, 107.368, 9.286, 0.294
sqrt 2 , 1000000, 0.07202, 13.884, 71.809, 2.274
sqrt 9 , 1000000, 0.07241, 13.810, 72.195, 2.286
sqrt negative , 1000000, 0.11991, 8.340, 119.548, 3.786
Identifier count time Mops/sec Cycles/op Slowdown
Divide tests - float precision
dividing nrm by nrm , 10000000, 0.08834, 113.205, 24.734, 1.000
dividing nrm by 16 , 10000000, 0.08382, 119.309, 23.468, 0.949
dividing nrm by zero , 1000000, 0.34291, 2.916, 960.147, 38.819
Divide tests - double precision
dividing nrm by nrm , 10000000, 0.13623, 73.408, 38.143, 1.542
dividing nrm by 16 , 10000000, 0.13644, 73.291, 38.204, 1.545
dividing nrm by zero , 1000000, 0.35332, 2.830, 989.303, 39.998
Divide tests - extended precision
dividing nrm by nrm , 10000000, 0.15428, 64.816, 43.199, 1.747
dividing nrm by 16 , 10000000, 0.15457, 64.694, 43.280, 1.750
dividing nrm by zero , 1000000, 0.34648, 2.886, 970.149, 39.223
Multiply tests - float precision
multiply nrm by nrm , 10000000, 0.03588, 278.672, 10.048, 1.000
multiply nrm by 16 , 10000000, 0.04919, 203.279, 13.774, 1.371
multiply to overflow , 1000000, 0.29624, 3.376, 829.474, 82.554
multiply to underflow, 1000000, 0.29573, 3.381, 828.041, 82.411
Multiply tests - double precision
multiply nrm by nrm , 10000000, 0.03591, 278.452, 10.056, 1.001
multiply nrm by 16 , 10000000, 0.03652, 273.837, 10.225, 1.018
multiply to overflow , 1000000, 0.29066, 3.441, 813.835, 80.998
multiply to underflow, 1000000, 0.30558, 3.272, 855.638, 85.158
Multiply tests - extended precision
multiply nrm by nrm , 10000000, 0.03582, 279.154, 10.030, 0.998
multiply nrm by 16 , 10000000, 0.03595, 278.150, 10.066, 1.002
multiply to overflow , 1000000, 0.29020, 3.446, 812.546, 80.869
multiply to underflow, 1000000, 0.29553, 3.384, 827.486, 82.356
Square root tests - float precision
sqrt nrm , 1000000, 0.00822, 121.624, 23.022, 1.000
sqrt 4 , 5000000, 0.05234, 95.537, 29.308, 1.273
sqrt 2 , 1000000, 0.00821, 121.739, 23.000, 0.999
sqrt 9 , 1000000, 0.00826, 121.033, 23.134, 1.005
sqrt negative , 1000000, 0.34335, 2.912, 961.388, 41.760
Square root tests - double precision
sqrt nrm , 1000000, 0.01366, 73.196, 38.253, 1.662
sqrt 4 , 5000000, 0.06810, 73.422, 38.136, 1.657
sqrt 2 , 1000000, 0.01365, 73.278, 38.211, 1.660
sqrt 9 , 1000000, 0.01361, 73.494, 38.098, 1.655
sqrt negative , 1000000, 0.33747, 2.963, 944.906, 41.044
Square root tests - extended precision
sqrt nrm , 1000000, 0.01542, 64.831, 43.189, 1.876
sqrt 4 , 5000000, 0.07723, 64.744, 43.247, 1.879
sqrt 2 , 1000000, 0.01539, 64.959, 43.104, 1.872
sqrt 9 , 1000000, 0.01540, 64.924, 43.127, 1.873
sqrt negative , 1000000, 0.34904, 2.865, 977.309, 42.452
Identifier count time Mops/sec Cycles/op Slowdown
Divide tests - float precision
dividing nrm by nrm , 10000000, 0.07605, 131.493, 13.179, 1.000
dividing nrm by 16 , 10000000, 0.04638, 215.617, 8.037, 0.610
dividing nrm by zero , 1000000, 0.00462, 216.623, 8.000, 0.607
Divide tests - double precision
dividing nrm by nrm , 10000000, 0.09926, 100.742, 17.202, 1.305
dividing nrm by 16 , 10000000, 0.04665, 214.363, 8.084, 0.613
dividing nrm by zero , 1000000, 0.00465, 215.169, 8.054, 0.611
Divide tests - extended precision
dividing nrm by nrm , 10000000, 0.12411, 80.575, 21.508, 1.632
dividing nrm by 16 , 10000000, 0.04630, 215.982, 8.024, 0.609
dividing nrm by zero , 1000000, 0.00497, 201.092, 8.618, 0.654
Multiply tests - float precision
multiply nrm by nrm , 10000000, 0.04117, 242.877, 7.135, 1.000
multiply nrm by 16 , 10000000, 0.04053, 246.708, 7.024, 0.984
multiply to overflow , 1000000, 0.00404, 247.567, 7.000, 0.981
multiply to underflow, 1000000, 0.06858, 14.581, 118.851, 16.657
Multiply tests - double precision
multiply nrm by nrm , 10000000, 0.04048, 247.027, 7.015, 0.983
multiply nrm by 16 , 10000000, 0.04111, 243.231, 7.125, 0.999
multiply to overflow , 1000000, 0.00404, 247.567, 7.000, 0.981
multiply to underflow, 1000000, 0.06873, 14.550, 119.108, 16.693
Multiply tests - extended precision
multiply nrm by nrm , 10000000, 0.04270, 234.208, 7.399, 1.037
multiply nrm by 16 , 10000000, 0.04117, 242.908, 7.134, 1.000
multiply to overflow , 1000000, 0.00405, 246.920, 7.018, 0.984
multiply to underflow, 1000000, 0.06965, 14.358, 120.699, 16.916
Square root tests - float precision
sqrt nrm , 1000000, 0.00923, 108.312, 16.000, 1.000
sqrt 4 , 5000000, 0.04634, 107.896, 16.062, 1.004
sqrt 2 , 1000000, 0.00927, 107.863, 16.067, 1.004
sqrt 9 , 1000000, 0.00923, 108.312, 16.000, 1.000
sqrt negative , 1000000, 0.00934, 107.042, 16.190, 1.012
Square root tests - double precision
sqrt nrm , 1000000, 0.01408, 71.036, 24.396, 1.525
sqrt 4 , 5000000, 0.06941, 72.036, 24.057, 1.504
sqrt 2 , 1000000, 0.01389, 72.008, 24.067, 1.504
sqrt 9 , 1000000, 0.01394, 71.747, 24.154, 1.510
sqrt negative , 1000000, 0.01436, 69.646, 24.883, 1.555
Square root tests - extended precision
sqrt nrm , 1000000, 0.01864, 53.651, 32.301, 2.019
sqrt 4 , 5000000, 0.09368, 53.372, 32.470, 2.029
sqrt 2 , 1000000, 0.01873, 53.394, 32.457, 2.029
sqrt 9 , 1000000, 0.02084, 47.981, 36.119, 2.257
sqrt negative , 1000000, 0.01879, 53.206, 32.572, 2.036
Multiply tests - SSE2
multiply nrm by nrm , 10000000, 0.25227, 39.640, 42.734, 4.265
multiply nrm by 16 , 10000000, 0.26231, 38.122, 44.436, 4.435
multiply to overflow , 1000000, 0.02533, 39.479, 42.909, 4.282
multiply to underflow, 1000000, 0.77311, 1.293, 1309.654, 130.699