JCNC Journal of Computer Networks and Communications 2090-715X 2090-7141 Hindawi 10.1155/2019/3027578 3027578 Research Article SA Sorting: A Novel Sorting Technique for Large-Scale Data http://orcid.org/0000-0001-5106-7609 Shabaz Mohammad 1 Kumar Ashok 2 Xu Youyun 1 Apex Institute of Technology Chandigarh University S.A.S. Nagar India chandigarhuniversity.ac.in 2 Chitkara University Institute of Engineering and Technology Chitkara University Rajpura India chitkara.edu.in 2019 2312019 2019 19 08 2018 06 12 2018 2312019 2019 Copyright © 2019 Mohammad Shabaz and Ashok Kumar. This is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

Sorting is one of the operations on data structures used in a special situation. Sorting is defined as an arrangement of data or records in a particular logical order. A number of algorithms are developed for sorting the data. The reason behind developing these algorithms is to optimize the efficiency and complexity. The work on creating new sorting approaches is still going on. With the rise in the generation of big data, the concept of big number comes into existence. To sort thousands of records either sorted or unsorted, traditional sorting approaches can be used. In those cases, we can ignore the complexities as very minute difference exists in their execution time. But in case the data are very large, where execution time or processed time of billion or trillion of records is very large, we cannot ignore the complexity at this situation; therefore, an optimized sorting approach is required. Thus, SA sorting is one of the approaches developed to check sorted big numbers as it works better on sorted numbers than quick sort and many others. It can also be used to sort unsorted records as well.

1. Introduction

Sorting big numbers in an optimized way is a challenging task to perform. A number of such sorting algorithms exist which are optimized, but their execution time is still to be optimized. Sometimes, these algorithms take the same amount of time to sort a sorted record as they take to sort an unsorted record. The sorting algorithm should be stable, effective, less complex, and efficient. SA sorting follows most of these parameters. SA sorting is introduced as a new approach to operate on both sorted and unsorted lists or records and shows better execution time on the sorted list.

The following section discusses the existing sorting approaches.

2. Sorting Techniques 2.1. Bubble Sort

It is a stable sorting algorithm in which each element in the list is compared with its next adjacent element, and the process is repeated until the elements are sorted. If we have n elements, then there are n1 number of passes and nn1/2 number of iterations in totality. Mathematically,(1)Fn=nn12=n2n2,and thus,(2)Fnn2.

The algorithm for bubble sort is given in Algorithm 1.

<bold>Algorithm 1: </bold>Bubble sort.

for X=1  to  n do

set Nswap0

for y=nto  x+1 do

if ay<ay1 then

T = a[y]

a[y] = a[y − 1]

a[y − 1] = T

Nswap = 1

end if

end for

if Nswap=0 then

Break

end if

end for

In this algorithm, if no swap occurs, it will break the loop and directly go to end, and thus, only one loop executes that determines best-case analysis which becomes On. On the contrary, the complexity of average and worst cases would be On2. Bubble sort is highly code inefficient, and it is one of the worst sorting approaches, so professionals do not use it.

2.2. Insertion Sort

It is the stable sorting algorithm in which starting from the first number from the record, this first number is compared to every adjacent number to find its sorted position. When the position is found, the number is inserted there. The algorithm for insertion sort is given in Algorithm 2.

<bold>Algorithm 2: </bold>Insertion sort.

for e=1 to lengthX do

set kXe

Insert Xe to sorted list x1..e1

L e 1

while L>0 and XL>k do

X L 1 X L

L L 1

X L + 1 K

end while

end for

Insertion sort holds good for smaller datasets since it is also code inefficient for large data/lists or big numbers. Insertion sort is 2 times faster than bubble sort. For the best case, it is On, while for the average or worst case, it is On2.

2.3. Selection Sort

By nature, selection sort is unstable, but it can be improved to become stable. In this sorting technique, we are supposed to find the smallest number from the record, put this number at the starting position, change the position to next, and then again find the smallest number from the remaining list. This process goes on and on, until the whole list becomes sorted. This algorithm is efficient for smaller records, but for larger records, this technique is again code inefficient. The algorithm is given in Algorithm 3.

<bold>Algorithm 3: </bold>Selection sort.

for d0  to  S2 do

set smld

for fd1  to  S1 do

if Af<ASml then

set smlf

end if

end for

T A d

A d A S m l

A S m l T

end for

Its execution time is better for smaller data/records (up to hundreds of records). The best-, average-, or worst-case complexity is the same as On2.

2.4. Merge Sort

It is a stable sorting algorithm and is very efficient to handle big numbers. It is based on the following three steps :

Divide the given list or record into number of sublists in such a way that every list or sublist is half divided

Conquer the sublist by the recursion method

Combine the sorted sublist simply by merging

The sorting is actually done in the second step. Merge sort is the only sorting technique which is purely based on the divide-and-conquer technique of solving the algorithm. It requires double the memory as required by other sorting techniques. The algorithm for merge sort is given in Algorithms 4 and 5.

<bold>Algorithm 4: </bold>Merge sort (<italic>A</italic>, <italic>S</italic>, <italic>T</italic>).

if S<T then

U S + T / 2

Merge Sort (A, S, U)

Merge Sort (A, U + 1, T)

Merge (A, S, U, T)

end if

<bold>Algorithm 5:</bold> Merge (<italic>A</italic>, <italic>S</italic>, <italic>U</italic>, <italic>T</italic>).

I n t B S T

x y s

z U + 1

while xU  and  zT do

if AxAz then

B y + + A x + +

else

B y + + A z + +

end if

end while

while xU do

B y + + A x + +

end while

while zT do

B y + + A z + +

end while

for X=stot do

A x B x

end for

From the algorithm, it is analyzed that merge sort has the following recurrence relation:(3)Tn=2Tn2+cn.

Using the master method,

F(n) = cn, a = 2, and b = 2, and thus,(4)nlogba=nlog22=n1=n.

Now,(5)Tn=Onlogn.

For all three cases, it would be Onlogn.

2.5. Quick Sort

It is unstable but efficient and is one of the fast working sorting algorithms. It is based on the divide-and-conquer strategy. While talking about quick sort at an instant, in our mind, there comes the concept of pivot element. The pivot element is one of the randomly chosen members from the list which is under the operation of sorting. It works well for both smaller and larger lists or records, but in case if the list is sorted, some results are sometimes obtained that we cannot even imagine. It is built on the recursion phenomenon. The algorithm is given in Algorithms 6 and 7.

<bold>Algorithm 6: </bold>Quick sort (<italic>A</italic>, <italic>S</italic>, <italic>T</italic>).

if T>S then

X r a n d o m n u m b e r f r o m l i s t S T

Interchange A[X] with A[S]

U P a r t i t i o n A , S , T

Quick Sort (A, S, U − 1)

Quick Sort (A, U + 1, T)

end if

<bold>Algorithm 7: </bold>Partition (<italic>A</italic>, <italic>S</italic>, <italic>T</italic>).

Z A S

U S

for LS+1 to T do

if AL<Z then

U U + 1

end if

Interchange A[U] with A[L]

Interchange A[S] with A[U]

end for

Return U

The partition function works as follows:

A[S] = Z //(Pivot)

A[SU − 1] //value <Z

A[U + 1L − 1] //value Z

A[LT] //unknown values

Complexity analysis:

Let m : size and m = 2n

Comparisons = m + 2(m/2) m(m/m)

m + m + m (n term)

O (nm)

If n = m, then it is the worst-case scenario and complexity = Om2. But for the average- or best-case scenario, n=log2m and then complexity would be Omlog2m.

2.6. Tree Sort

It is an unstable sorting algorithm built on the binary search tree (BST). The element in tree sort is sorted using in-order traversing operation. Tree sort requires extra memory space, and complexity changes from balanced BST to unbalanced BST. The complexity of the tree sort is On2 for the worst case and Onlogn for average and best cases.

2.7. Gnome Sort

It is a stable sorting approach. When we think that the list or record is sorted but we are not sure, we need an algorithm which works best on the sorted list; for this purpose, we use gnome sort. It performs well not only on the sorted list but also on the unsorted list. The algorithm is given in Algorithm 8.

<bold>Algorithm 8: </bold>Fgnome (X[]).

K 1

while K<X.length do

if XKXK1 then

K K + 1

else

T X K

X K X K 1

X K 1 T

if K>1 then

K K 1

end if

end if

end while

From the algorithms, it is clearly seen that if the list is sorted, no interchange of elements is done; hence, it executes linearly. Thus, for the best case, it is On, and for average and worst cases, it is On2.

2.8. Counting Sort

It is a stable and easily understandable sorting algorithm. As the name depicts, counting sort works by finding the largest element in the given list/record, and then starting from the least element/number, its frequency is counted and at last the sorted list is produced maintaining the order of its occurrence while sorting. It is useful in those cases where the difference between the numbers is very small and the dataset is also very small. The step-by-step procedure of counting sort is discussed in Algorithm 9.

<bold>Algorithm 9: </bold>Counting sort(<italic>P</italic>, <italic>Q</italic>, <italic>T</italic>).

for a1 to m do

D a 0

for b1 to lengthP do

D P b D P b + 1

end for

for blengthPbelow1 do

Q D P b P b

D P b D P b 1

end for

end for

2.9. Grouping Comparison Sort (GCS)

Suleiman with his team of three other members proposed the GCS algorithm. The methodology they have used is to divide the given list/record into groups. Each of these groups contains three elements, and comparison is done in such a way that every element of one group is compared to that in other groups. The main drawback of this algorithm is that the input size must be less than or equal to 25000 records to get the better results. The complexity becomes On2 for all the three cases.

2.10. Heap Sort

Heap sort is a stable but efficient sorting algorithm which is based on the complete binary tree and follows the heap order. Heap sort contains min heap, in which the root node is having the minimum value, and max heap, in which the root node is having the maximum value. The procedure of heap sort is explained through Algorithms 10 and 11.

<bold>Algorithm 10: </bold>Heap sort (<italic>A</italic>, <italic>X</italic>).

B H e a p A , X

a X

while a2 do

A L A a

A a Y

a a L

H e a p y A , L , a

end while

<bold>Algorithm 11: </bold>Heapy (<italic>A</italic>, <italic>L</italic>, <italic>a</italic>).

L L f k

R L r k

m a x i m u m k

if La  and  AL>Amaximum then

m a x i m u m L

end if

if Ra  and  AR>Amaximum then

m a x i m u m R

end if

if maximum!=k

t A k

end if

A k A m a x i m u m

A m a x i m u m t

H e a p y A , m a x i m u m , a

For heap sort, in all the three cases (i.e., best, average, and worst cases), the complexity is the same as Onlogn, where n is the number of records in the list.

It is a stable and efficient sorting algorithm when the size of list/record is small. Internally, it acts like counting sort. One of the drawbacks of radix sort is it operates on one number many times as it has the number of significant bits on a digit. Suppose if there is a number 169, the radix sort operates on it three times, sorting from the least significant digit 9 to the most significant digit 1 in the list. Radix sort compares the LSB of all the numbers in a similar way to proceed with further results in a sorting list. The procedure of radix sort is explained in Algorithm 12.

C h a n g e 1

for k=1 to size do

for e=1toa do

z = X e . s i z e / c h a n g e m o d 10

a p p e n d b u c k e t z , X e

X = c o m b i n e b u c k e t s

C h a n g e C h a n g e 10

end for

end for

If the size is the longest length m, then there are a elements and the complexity would be Omn. If the size varies to a constant number, then the size is ignored and the complexity becomes On for all the three cases.

2.12. Cocktail Sort

It is a stable and efficient sorting algorithm as compared to bubble sort as it is the extended version of bubble sort. Cocktail sort works on both sides of the list. During sorting, it puts the largest element to the tail side and smallest element to the head side. The head side and tail side are shown in Figure 1.

Working of cocktail sort.

Bubble sort puts the biggest element to the tail side after every pass, while cocktail sort puts the smallest element to the head side and the biggest element to the tail side after every pass. The complexity of cocktail sort for worst and average cases is On2, but for the best case, it is On.

2.13. Comb Sort

It is the stable and another improved version of bubble sort as it changes the gap size from 1 to 1.3 for every iteration. Gap size tells the algorithm to swap. As the gap size increases, the number of swaps decreases; thus, on the average-case scenario, comb sort performs better. But for the worst case, it remains the same as On2. For the best case, it is On as no swap is done.

2.14. Enhanced Selection Sort

It is the extended version of selection sort which is made stable by decreasing the regular size of the list. This is done in such a way that first the biggest element is found, a swap is made, the size of the list is decreased by 1, and then the sort is again performed in a similar way to get the list sorted. Although the complexity would be the same as that of selection sort but in the best-case scenario, the number of swaps would be zero for enhanced selection sort. The complexity for all the three cases would be On2.

2.15. Shell Sort

It is the unstable and efficient sorting algorithm which is the extended version of insertion sort. Shell sort works well if the given list is partially sorted. This means we are discussing about the average-case scenario. Shell sort uses Knuth’s formula to calculate the interval or spacing. This formula is given as follows:(6)x=x3+1,where x has the starting value 1 and is called the interval/spacing. Shell sort divides the given list into sublists by using an increment which is called the gap, compares the elements, and then interchanges them depending upon the order either increasing or decreasing. The algorithm for shell sort is given in Algorithm 13.

<bold>Algorithm 13: </bold>Shell sort.

X 1

while x<m do

x 3 x + 1

end while

while x>0 do

x h / 3

end while

for i=1:x do

I n s e r t i o n I i : x : m i n v a r i a n t : x s u b l i s t i s s o r t e d

end for

For best and worst cases, the complexity is Onlogn and On2, respectively. For the average case, it depends upon the gap.

2.16. Bucket Sort

It is a stable sort and consists of bucket. The elements are inserted in the bucket, and then, the sorting is employed on the bucket. Bucket sort does not have comparisons and uses the index to sort. These indexes are not just obtained from any mathematical function but are obtained in such a way that they could satisfy the order of arrangement of numbers inserted in the bucket. The procedure is explained in Algorithm 14.

<bold>Algorithm 14: </bold>Bucket sort ().

Y 0.... a 1

a X . l e n

for k=0toa1 do

B k = E m p t y

end for

for k = 0 t o a do

p u t X k i n t o Y a X k

end for

for k=0toa1 do

S o r t Y k u s i n g i n s e r t i o n s o r t t e c h n i q u e

end for

j o i n Y 0 , Y 1 . Y a 1

while x>0 do

x h / 3

end while

The complexity for bucket sort would be On for all the three cases.

2.17. Tim Sort

It is a combination of insertion and merge sort. It is a stable and efficient sorting technique in which the list or record is split into blocks called “run.” If the size of list or record is less than run, then it can be sorted using insertion sort. The maximum size of the run would be 64 depending upon the list or record. But if the size of the unsorted list is very large, then both insertion and merge sorting techniques are used. The complexity for best, average, and worst cases is Onlogn.

2.18. Even-Odd/Brick Sort

It is another extension of bubble sort, in which the BS algorithm is partitioned into two parts: even part and odd part. Even part consists of even numbers, and odd part consists of odd numbers. Both the parts are executed one by one, and at last, the combined result is obtained and the records are sorted. It is a stable algorithm with the same complexity as the bubble sort:(7)best caseOn,worst caseOn2,average caseOn2.

2.19. Bitonic Sort

It is introduced through the concept of merge sort. In bitonic sort, we move the list to level L − 1 with two parts: left part and right part; the left part is settled in an increasing order, and the right part is settled in a decreasing order . These parts are merged, moved to level L, and then sorted to form the sorted sequence. The complexity for bitonic sort is On2 on best, worst, and average cases.

3. Literature Review and Related Work

Ali  discussed the number of sorting algorithms in this paper. Evaluation of time complexity, stability, and in-place nature is done. Ali found their running time in the virtual and real environment and suggested where to use a particular algorithm so that the result obtained is efficient, and Ali concluded that quick sort is a better option for sorting in the average-case scenario; counting, bucket, and radix sorts are efficient for smaller size of the list/record and on integer-type data.

Hammad  compared three sorting algorithms, namely, bubble, selection, and gnome sorts, based on their average running time in this paper. Hammad took the number of readings to find the running time and concluded that whenever the record list is sorted, gnome sort appears as the fastest sorting algorithm, but when the list or record is unsorted, gnome sort took the same running time as bubble sort or selection sort has in their worst or average case (i.e. On2).

Elkahlout and Maghari  discussed the two advanced versions of bubble sort, namely, comb and cocktail sort, and one linear-time technique, namely, counting sort, in this paper. Comparing these techniques, they concluded that cocktail sort performs better on average evaluation of their process time.

All these algorithms are graphically implemented in this paper. The main focus is on time complexity.

Jehad and Rami  made some changes in the bubble sort and selection sort so as to reduce the number of swaps during sorting operation. In this paper, the author follows the same procedure, compares the enhanced version of bubble sort and selection sort with the original bubble sort and selection sort, and then reduces the execution time. The complexity of enhanced bubble sort is reduced from On2 to Onlogn and remains the same as that of selection sort.

Pankaj  using the C programming language compared five sorting techniques, namely, bubble, selection, quick, merge, and insertion sort, on the basis of average running time. The execution time is calculated in microseconds, and he concluded that quick sort is a better option for sorting the number of elements from 10 to 10000; the paper graphically represents the average running time of each algorithm.

Khalid et al.  proposed a new algorithm, namely, grouping comparison sort, which is then compared with the traditional sorting technique. This proposed algorithm has a limitation of having an input size of 25000 elements. As the input size increases, the results become horrible. All these above-discussed papers use the traditional sorting algorithms and compare them to get the average running time and conclude which of them is better to use.

4. SA Sorting

Starting from the left extreme end of the list/record, the first element is obtained as the target. The target is compared with all the other elements until the smaller element is found. The target is swapped with the smaller element and continuously compared till the extreme right end of the list. Again, going back to the target position, a new target element is taken at that position. Similarly, it is processed in the same way. The position is not changed until the targeted element is found as is already operated. When the targeted element is found to be already operated, the position is changed to next by 1. In this way, SA sorting works. The step-by-step process of SA sorting is given in Algorithm 15.

<bold>Algorithm 15: </bold>SA sort ().

i n t d ,n , f = 0,0 , p , j , t , i , j

for i = 0  to  n do

j 0

while fj!=0 and j<n do

j j + 1

end while

if j<n then

p j

f p 1

else

e x i t 0

end if

j p + 1

while j<n do

if dp>dj then

t d p

d p a j

d j t

t f p

f p f j

f j t

p j

end if

j j + 1

end while

end for

The number of comparisons, C, for SA sorting is given as follows:(8)C=nn12+S,where S is the number of swaps. In the best case, S = 0, so(9)C=nn12,ori=1ni1.

Let T (n) = [(n(n − 1))/2], T (1) = 0, T (2) = 1, and T (3) = 3.

Now creating the recurrence relation,(10)Tn=9Tn3+n.

Using induction to prove,

At n = 3, T (3) = 9T (3/3) + 3 = 9T (1) + 3 = 3

At n = 6, T (6) = 9T (6/3) + 3 = 9T (2) + 6 = 15

At n = 9, T (9) = 9T (9/3) + 3 = 9T (3) + 9 = 36, and so on.

Solving using the master method, a = 9, b = 3, f (n) = n, and nlogba=nlog39=n2>n; thus, Tn=On2

However, fn is a polynomial smaller than nlogba. SA sorting can be optimized in the future, but in comparison with the optimized quick sort and merge sort, it performs better on the already sorted list. For worst and average cases, complexity is On2+S. Since S is included in C, it can be neglected; thus, for all the three cases, it is On2.

5. Results

The proposed sorting technique is implemented in C++ and tested with different numbers of elements. The performance of the SA sorting is measured in terms of execution time and memory required for sorting. The comparison of execution time and memory used by existing sorting techniques with SA sorting is shown in Tables 1 and 2. As moving from lower dataset to higher with sorted nature of the dataset, we found that SA sorting improves and performs better. Discussing about memory requirement from lower dataset to higher, only slight change can be seen.

Comparison of execution time in microseconds.

S. no. No. of entries Nature of entry Quick Merge Heap Insertion Selection Shell SA
1 100 Unsorted 67 75 44 65 76 33 109
2 500 Unsorted 150 165 185 207 495 114 1078
3 1k Unsorted 232 271 231 553 1722 373 4223
4 5k Unsorted 1207 1389 2141 19335 39865 1452 124973
5 10k Unsorted 2359 2826 4588 67283 158382 3100 506885
6 20k Unsorted 4932 5934 9747 268644 627031 6710 2054339
7 1k Sorted 3492 265 377 144 1736 156 3146

Comparison of memory used in MB.

S. no. No. of entries Nature of entry Quick Merge Heap Insertion Selection Shell SA
1 1k Unsorted 1.503 1.503 3.110 1.502 1.514 3.109 1.350
2 5k Unsorted 1.561 1.570 3.242 1.620 1.520 3.242 1.611
3 10k Unsorted 1.630 1.640 3.300 1.600 1.678 3.343 1.615
4 20k Unsorted 1.800 1.795 3.385 1.622 1.771 3.424 1.786
5 1k Sorted 1.470 1.491 3.230 1.420 1.510 3.105 1.413
6. Conclusion

While implementing all these sorting techniques and comparing them with SA sorting, the following points are concluded:

If we increase the space, the time reduces as shell sort and heap sort do

The sorting techniques which work well on unsorted records are not very good on sorted records as quick sort and merge sort do

In the worst-case scenario, most of the sorting techniques rely on On2 as SA sorting does

No sorting technique is universally used, and its usage depends upon their nature and users requirement

SA sorting needs to be improved and optimized in the future

Data Availability

Our article is purely based on algorithm design. We evolve our results from unsorted and sorted data files which include different records, in order to compare the algorithm with already established algorithms. Thus, no such proper data have been used.

Conflicts of Interest

The authors declare that they have no conflicts of interest.

Cormen T. H. Introduction to Algorithms 2009 Cambridge, MA, USA MIT press Horowitz E. Sahni S. Fundamentals of Computer Algorithms 1978 Cambridge, MA, USA Computer Science Press Ali K. A comparative study of well known sorting algorithms International Journal 2017 8 1 Hammad J. A comparative study between various sorting algorithms International Journal of Computer Science and Network Security (IJCSNS) 2015 15 3 11 Elkahlout A. H. Maghari A. Y. A. A comparative study of sorting algorithms comb, cocktail and counting sorting 2017 Jehad A. Rami M. An enhancement of major sorting algorithms International Arab Journal of Information Technology 2010 7 1 Sareen P. Comparison of sorting algorithms (on the basis of average case) International Journal of Advanced Research in Computer Science and Software Engineering 2013 3 3 522 532 Al-Kharabsheh K. S. AlTurani I. M. AlTurani A. M. I. Zanoon N. I. Review on sorting algorithms a comparative study International Journal of Computer Science and Security (IJCSS) 2013 7 3 120 126