Empirical Study of Complexity Graphs for Sorting Algorithms
IJCCIT Vol.1, No.1, Dec.2013
Empirical Study of Complexity Graphs for Sorting Algorithms Gaurav Kumar Panipat Institute of Engineering and Technology Panipat, Haryana, India
[email protected]
Harish Chugh Panipat Institute of Engineering and Technology Panipat, Haryana, India
[email protected] Abstract-This study investigates the characteristic of the sorting algorithms with reference to number of comparisons made for the specific number of elements. Sorting algorithms are used by many applications to arrange the elements in increasing/decreasing order or any other permutation. Sorting algorithms, like Quick Sort, Merge Sort, Heap Sort, Insertion Sort, Bubble Sort etc. have different complexities depending on the number of elements to sort. The purpose of this investigation is to determine the number of comparisons, number of swap operations and after that plotting line graph for the same to extract values for polynomial equation. The values a, b and c got is then used for drawing parabola graph. The study concludes what algorithm to use for a large number of elements. For larger arrays, the best choice is Quick sort, which uses recursion method to sort the elements and leads to faster results. Least square method and Matrix inversion method is used to get the value of constants a, b and c for each polynomial equation of sorting algorithms. After calculating the values, Graph is drawn for each sorting algorithm for the polynomial equation i.e. Y=AX2 + BX + C or Y=AX lgX + BX + C.
Keywords: Bubble Sort, Heap Sort, Insertion Sort, Merge Sort, Quick Sort.
I. INTRODUCTION Sorting operation is the most important operation that is performed by the computer on data. Searching operation is the most used algorithm in computing in which computer spends much of its time. To get higher efficiency while doing sorting - fast, efficient and
31
Empirical Study of Complexity Graphs for Sorting Algorithms
IJCCIT Vol.1, No.1, Dec.2013
inexpensive algorithms for sorting and ordering elements are required. For this reason, the development of fast, efficient and inexpensive algorithms for sorting and ordering lists and arrays is a fundamental field of computing. The study investigates the sorting algorithms applied on long lists and draws graphs for the sorting algorithms mentioned leading to efficient algorithms for sorting of long length [1]. The investigation is to determine which of these algorithms is the fastest to sort lists of different lengths, and to therefore determine which algorithm should be used depending on the list length. Although asymptotic analysis of the algorithms is taken into consideration, the main type of comparison discussed is an empirical assessment based on running each algorithm against random lists of different sizes.
II. BACKGROUND A limitation of the empirical comparison is that it is system-dependent. A more effective way of comparing algorithms is through their time complexity upper bound to guarantee that an algorithm will run at most a certain time with order of magnitude O(f(n)) where n is the number of items in the list to be sorted. This type of comparison is called asymptotic analysis [2,3]. The time complexities of the algorithms studied are shown in table 1. TABLE I Time Complexity of Sorting Algorithms Algorithm Bubble Sort Insertion Sort Quick Sort Merge Sort Heap Sort
Best Case O(n) O(n) O(n lgn) O(n lgn) O(n lgn)
Time Complexity Average Case Worst Case O(n2) O(n2) 2 O(n ) O(n2) O(n lgn) O(n2) O(n lgn) O(n lgn) O(n lgn) O(n lgn)
Although all algorithms have a worst-case runtime of O(n2), only Quicksort have best and average runtime of O(n lgn). This implies that Quicksort, on average, will always be faster than Bubble, Insertion and Selection sort, if the list is sufficiently large [4,5].
III METHODOLOGY USED Each algorithm is tested on the lists of length 1, 3, 5, 7, 10, and 15 lakhs using C#.net and VC++. The number of comparisons and number of Assignment/ Swap operations are 32
IJCCIT Vol.1, No.1, Dec.2013
Empirical Study of Complexity Graphs for Sorting Algorithms
recorded by using a counter. Execution is done on Windows XP Pro. SP 2, with an Intel Core2 Duo 2.40GHz processor and 1GB of RAM. The raw results are recorded by reading and writing in the file. These raw results were tabulated, calculated, and graphed using VC++ and MS-Excel. Numbers of comparisons, swap operations for each algorithm and corresponding list length are shown in tabular and graphical format are shown below: TABLE II NUMBER OF COMPARISONS FOR (N LGN) BASED SORTING ALGORITHMS m
Algo / No.of elements Quick Sort Merge Sort Heap Sort
1 lacs 2083013 1868928 5173930
3 lacs 7154514 6075712 16938139
Number of elements 5 lacs 7 lacs 12439847 18266103 10475712 15051424 29321482 42113995
10 lacs 25881883 21951424 61645978
15 lacs 41478603 33902848 95209435
Figure 1. Total number of comparisons for (N lgN) sorting algorithms TABLE III NUMBER OF SWAP/ASSIGNMENT OPERATIONS FOR (N LGN) BASED SORTING ALGORITHMS Algom/ No.of elements Quick Sort Merge Sort Heap Sort
1 lacs 1026729 1668928 1674642
3 lacs 3609352 5475712 5496045
Number of elements 5 lacs 7 lacs 6335691 8665471 9475712 13651424 9523826 13687997
10 lacs 12252113 19951424 20048658
15 lacs 22301526 30902848 30986477
33
IJCCIT Vol.1, No.1, Dec.2013
Empirical Study of Complexity Graphs for Sorting Algorithms
Figure 2. Total number of Swap/Assignment for (N lgN) based sorting algorithms TABLE IV NUMBER OF COMPARISONS FOR (N2) BASED SORTING ALGORITHMS Algom/ No.of elements Bubble Sort Insertion Sort
Number of elements 5 lacs 7 lacs
1 lacs
3 lacs
4999950001
44999850001
124999750001
2503921057
22500033726
62489124089
10 lacs
15 lacs
244999650001
499999500001
1124999250001
122464377656
249931402775
562246099741
Figure 3. Total number of comparisons for (N2) sorting algorithms TABLE V TOTAL NUMBER OF SWAP/ASSIGNMENT FOR (N2) BASED SORTING ALGORITHMS Algom/ No.of elements Bubble Sort Insertion Sort
1 lacs 2503821057 2503821057
3 lacs 22499733726 22500033725
Number of elements 5 lacs 7 lacs 62488624089 122463677656 62489124088 122464377655
10 lacs 249930402775 249931402774
15 lacs 562244599741 562246099740
IV EVALUATION The results are evaluated using Least Square Fitting for each sorting algorithm is shown in tabular as well as in graphical format. In the calculations done, X denotes number of elements and Y denotes number of comparisons.
34
IJCCIT Vol.1, No.1, Dec.2013
Empirical Study of Complexity Graphs for Sorting Algorithms
Figure 4 Total number of Swap/Assignment for (N2) sorting algorithms 4.1 Bubble Sort
As per the methodology used, calculations done on bubble sort are given in table 6 and then shown in figure 5. TABLE VI BUBBLE SORT CALCULATION USING LEAST SQUARE FITTING METHOD X (No. of elements) 100000 300000 500000 700000 1000000 1500000 ∑X=4100000
Y (No. of Comparison) 4.9999E+9 4.4998E+10 1.25E+11 2.45E+11 5E+11 1.125E+12 ∑Y= 2.045E+12
X2
X3
X4
YX
YX2
1.0E+10 9E+10 2.5E+11 4.9E+11 1E+12 2.25E+12 ∑X2= 4.09E+12
1E+15 2.7E+16 1.25E+17 3.43E+17 1E+18 3.375E+18 ∑ X3= 4.871E+18
1E+20 8.1E+21 6.25E+22 2.401E+23 1E+24 5.0625E+24 ∑X4= 6.373E+24
4.99995E+14 1.35E+16 6.24999E+16 1.715E+17 5E+17 1.6875E+18 ∑YX= 2.4355E+18
4.99995E+19 4.04999E+21 3.12499E+22 1.2005E+23 5E+23 2.53125E+24 ∑ YX2= 3.18665E+24
6a1+4100000a2+4090000000000a3=2044997950006 4100000a1+4090000000000a2+4871000000000000000a3=2435497955004100000 4090000000000a1+4871000000000000000a2+6.3733E+24a3=3.18664756450409E+24 Calculating the values of coefficients a1, a2 and a3 by using Matrix Inversion Method: a1=0.98144531250000, a2=-0.50000003352761, a3=0.49999999999999 y=0.98144531250000-0.50000003352761x+0.49999999999999x2
Figure 5 Bubble Sort Graph for X-Y values 35
IJCCIT Vol.1, No.1, Dec.2013
Empirical Study of Complexity Graphs for Sorting Algorithms
4.2 Insertion Sort
As per the methodology used, calculations done on insertion sort are given in table 7 and then shown in figure 6. TABLE VII INSERTION SORT CALCULATION USING LEAST SQUARE FITTING METHOD X (No. of elements) 100000 300000 500000 700000 1000000 1500000 ∑X=4100000
Y (No. of Comparison) 2503921057 2.25E+10 6.248E+10 1.2246E+11 2.4993E+11 5.6224E+11 ∑Y= 1.0221E+12
X2
X3
X4
YX
YX2
1E+10 9E+10 2.5E+11 4.9E+11 1E+12 2.25E+12 ∑X2= 4.09E+12
1E+15 2.7E+16 1.25E+17 3.43E+17 1E+18 3.375E+18 ∑X3= 4.871E+18
1E+20 8.1E+21 6.25E+22 2.401E+23 1E+24 5.0625E+24 ∑X4= 6.373E+24
2.50392E+14 6.75001E+15 3.12446E+16 8.57251E+16 2.49931E+17 8.43369E+17 ∑YX= 1.21727E+18
2.50392E+19 2.025E+21 1.56223E+22 6.00075E+22 2.49931E+23 1.26505E+24 ∑YX2= 1.59266E+24
6a1+4100000a2+4090000000000a3=1022139059039 4100000a1+4090000000000a2+4871000000000000000a3=1217274671010100000 4090000000000a1+4871000000000000000a2+6.3733E+24a3=1.592669E+24 Calculating the values of coefficients a1, a2 and a3 by using Matrix Inversion Method: a1=-9.2744141
a2=95.404689602554 a3=0.24983064989564
y=-9.2744141+95.404689602554x+0.24983064989564x2 4.3 Merge Sort
As per the methodology used, calculations done on merge sort using table 2 are shown in figure 7. 500000log500000a+500000b+c=10475712 9465784.2846621a+500000b+c=10475712 1000000a-c=1000000 2377443.7510817a-2c=2475712 377443.75108170a=475712 Calculating the values of coefficients a, b and c by using Matrix Inversion Method: a=1.2603520355992, b=-2.909018983,
c=260352.0356
y=1.260352036x lgx - 2.909018983x+260352.0356
36
IJCCIT Vol.1, No.1, Dec.2013
Empirical Study of Complexity Graphs for Sorting Algorithms
Figure 6. Insertion Sort Graph for X-Y values.
Figure 7. Merge Sort Graph for X-Y values. 4.4 Heap Sort
As per the methodology used, calculations are done on heap sort using table 2 and are shown in figure 8. 500000log500000a+500000b+c=29321482 9465784.2846621a+500000b+c=29321482 1000000a-c=3003014 2377443.7510817a-2c=7244989 377443.75108170a=1238961 Calculating the values of coefficients a, b and c by using Matrix Inversion Method: a= 3.2825049996173,
b= -3.500006479,
c= 279490.9996
y=13.2825049996173xlgx-3.500006479x+279490.9996
37
IJCCIT Vol.1, No.1, Dec.2013
Empirical Study of Complexity Graphs for Sorting Algorithms
Figure 8. Heap Sort Graph for X-Y values. 4.5 Quick Sort
As per the methodology used, calculations are done on quick sort using table 2 and are shown in figure 9. 500000log500000a+500000b+c=12439847 9465784.28466208a+500000b+c=12439847 19931568.5693241a+1000000b+c=25881883 30774796.605068a+1500000b+c=41478603 Calculating the values of coefficients a, b and c by using Matrix Inversion Method: a=5.7086227916717434
b=-12.6063574003357
c=235321.896
Y=5.7086227916717434XlgX-92.606357400272373X+4706433.79167138
V RESULT ANALYSIS Based on the calculations done in section 4, the analysis of result values for polynomial equations are given in table 8. Graph showing polynomial equation comparison for NlgN and N2 are shown in figure 10 and 11 respectively.
Figure 9. Quick Sort Graph for X-Y values. 38
IJCCIT Vol.1, No.1, Dec.2013
Empirical Study of Complexity Graphs for Sorting Algorithms
TABLE VIII SUMMARY OF CALCULATIONS OF SORTING ALGORITHMS Sorting Algom Merge Sort Heap Sort Quick Sort Insertion Sort Bubble Sort
A 1.2603520355992 3.2825049996173 5.70862279167174 -9.2744141 0.98144531250000
B -2.909018983 -3.500006479 -12.60635740033 95.404689602554 -0.50000003352761
C 260352.0356 279490.9996 235321.896 0.24983064989564 0.49999999999999
Y Values
Polynomial Comparison 400000 350000 300000 250000 200000 150000 100000 50000 0
Merge Quick Heap
10
500
1000
1500
2000
X Values
Figure 10. Polynomial Equation Comparison for NlgN algorithm
VI CONCLUSION The empirical data obtained by using the program reveals the speed of each algorithm, from fastest to slowest for very large list and ranks them as follows: 1. Merge sort 2. Quick sort 3. Heap sort 4. Insertion sort 5. Bubble sort There is a large difference in the time taken to sort very large lists between the fastest three and the slowest two. This is due to the efficiency Merge Sort, Quick Sort and Heap sort have over the others when the list to sort is sufficiently large.
39
IJCCIT Vol.1, No.1, Dec.2013
Empirical Study of Complexity Graphs for Sorting Algorithms
Polynomial Comparison 2500000
Y Value
2000000 1500000
Insertion
1000000
Bubble
500000 0 10
500
1000
1500
2000
X Value
Figure 11. Polynomial Equation Comparison for N2 algorithm
REFERENCES 1.
L. M. Busse, M. H. Chehreghani, J. M. Buhmann, “The information content in sorting algorithms”, IEEE International Symposium on Information Theory Proceedings (ISIT), pp. 2746 - 2750, July 2012.
2.
Wang Xiang, “Analysis of the Time Complexity of Quick Sort Algorithm”, International Conference on Information Management, Innovation Management and Industrial Engineering (ICIII), vol. 1, pp. 408-410, Nov. 2011.
3.
Zhiyi Huang, S. Kannan, S. Khanna, “Algorithms for the Generalized Sorting Problem”, IEEE 52nd Annual Symposium on Foundations of Computer Science (FOCS), pp. 738-747, Oct. 2011.
4.
T. Cormen, C. Leiserson, R. Rivest, “Introduction to Algorithms”, MIT Press and McGraw-Hill, 3rd edition, Aug. 2009.
5.
Juliana Pena Ocampo, “An empirical comparison of the runtime of five sorting algorithms”, International Baccalaureate Extended Essay, 2008.
6.
J. Pagter, T. Rauhe, "Optimal time-space trade-offs for sorting", Proc. 39th Annual Symposium on Foundations of Computer Science, pp. 264 - 268, Nov 1998.
7.
A.V. Aho, J.E. Hopcroft, J.D. Ullman, “The Design and Analysis of Computer Algorithms”, Addison-Wesley, 1974.
8.
Wikipedia
for
Sorting
algorithm
retrieved
on
Aug.
20,
2013
from
2013
from
http://en.wikipedia.org/w/index.php?title=Sorting_algorithm&oldid=162949511 9.
http://warp.povusers.org/SortComparison/index.html
10.
Sorting
Algorithms
description
retrieved
on
Aug.
20,
http://www.cs.wm.edu/~wm/CS303/ln7.pdf
40