Update: Working on a void method now to preserve the in-place operation.
Update 2: Working alternative, doesn't seem to help enough, though. Previous implementation sorted a 6-million-index array in 45+ seconds, this one 10~15 seconds. Merge Sort takes 1.3~1.6 seconds.
proper in-place implementation
PHP Code:
public static Random RNG = new Random();
// sort input array from/to specified indices
// use to/from to indicate "new" arrays
// to - from = array's supposed length
public static void quick_sort(int[] A, int from, int to) {
if(to-from >= 1) {
int p = RNG.nextInt(to - from) + from; // random index from 'from' to 'to'
// in-place partitioning subroutine
// swap index p and first index
int temp1 = A[from]; A[from] = A[p]; A[p] = temp1;
// process A[from, to] and swap A[j] and A[i] if A[j] < A[0]
int i = from+1;
for(int j=from+1; j<to; j++) {
if(A[j] < A[from]) {
int temp2 = A[i]; A[i] = A[j]; A[j] = temp2; i++;
}
}
// swap pivot to appropriate position, i.e. i-1
int temp3 = A[from]; A[from] = A[i-1]; A[i-1] = temp3;
// recursively sort smaller portions of A
quick_sort(A, from, i-1);
quick_sort(A, i, to);
}
}
public static int[] quick_sort(int[] A) {
quick_sort(A, 0, A.length);
return A;
}
This is my attempt at an in-place randomized Quick Sort implementation. As it turns out, the code works fine, just too very slowly compared with Merge Sort.
From what I do know, the run time is expected to be O(nlogn), practically better than Merge Sort due to smaller coefficients and in-place swapping.
So, what am I doing so too horribly wrong to have jeopardized the algorithm so badly? Aside from maybe building two copies of the array itself and rebuilding into a brand new array, which kinda destroys the whole in-place thing.
Code:
outdated sauce
PHP Code:
public static int[] quick_sort(int[] A) {
if(A.length < 2) {
return A;
}
else {
int p = choose_pivot(A); // got pivot index
// in-place partitioning subroutine
// swap index p and first index
int temp1 = A[0]; A[0] = A[p]; A[p] = temp1;
// process A and swap A[j] and A[i] if A[j] < A[0]
int i = 1;
for(int j=1; j<A.length; j++) {
if(A[j] < A[0]) {
int temp2 = A[i]; A[i] = A[j]; A[j] = temp2; i++;
}
}
// swap pivot to appropriate position, i.e. i-1
int temp3 = A[0]; A[0] = A[i-1]; A[i-1] = temp3;
int[] less = Arrays.copyOfRange(A, 0, i-1);
int[] more = Arrays.copyOfRange(A, i, A.length);
less = quick_sort(less); more = quick_sort(more); // recursion
A = rebuild(less, A[i-1], more);
return A;
}
}
public static int choose_pivot(int[] A) {
Random rng = new Random();
return rng.nextInt(A.length);
}
public static int[] rebuild(int[] A, int pval, int[] B) {
int[] ret = new int[A.length + B.length + 1];
/*for(int x=0; x<A.length; x++) {
ret[x] = A[x];
}
ret[A.length] = pval;
for(int y=0; y<B.length; y++) {
ret[A.length+1+y] = B[y];
}*/
System.arraycopy(A, 0, ret, 0, A.length);
ret[A.length] = pval;
System.arraycopy(B, 0, ret, A.length+1, B.length);
return ret;
}
public static void log(Object obj) {
System.out.println(String.valueOf(obj));
}
public static String str(int[] A) {
return Arrays.toString(A);
}
public static void main(String[] args) {
// TODO Auto-generated method stub
Random rng = new Random();
int size = 4000000; // problematic starting at around 7,000,000 indices
int[] a = new int[size];
for(int i=0; i<a.length; i++) {
a[i] = rng.nextInt(1000);
}
long start = System.currentTimeMillis();
a = quick_sort(a);
long end = System.currentTimeMillis();
log("Time taken to sort array of size " + size + ": " + Long.toString(end - start) + "ms.");
}
The recursion doesn't seem to be faulty, the work outside of each recursion step is as expected: splitting the array into two subproblems, then re-combine their sorted versions. I just can't figure out where the issue might be.
EDIT: There's also a minor Stack Overflow issue with inputs of size larger than 6,000,000. Seems to have something to do with the process of generating random integers.
console
PHP Code:
Exception in thread "main" java.lang.StackOverflowError
at java.lang.Number.<init>(Number.java:49)
at java.util.concurrent.atomic.AtomicLong.<init>(AtomicLong.java:87)
at java.util.Random.<init>(Random.java:94)
at java.util.Random.<init>(Random.java:77)
at fun_programming.Sorts.choose_pivot(Sorts.java:94)
at fun_programming.Sorts.quick_sort(Sorts.java:68)
2012-04-29, 05:21 PM
Russt
Re: Quicksort implementation issue - Java
Hmm. So, is it just slower by a constant factor? It looks to me like you're doing maybe ~4 times as many array accesses as you need to, so you could speed it up by a small constant factor by being a bit more clever (e.g. by storing A[from] in a temporary variable and eliminating some unnecessary swaps).
But I don't see anything fundamentally wrong here that would make it more than about 10x slower than it should.
Also is the merge sort you're comparing to one that you wrote in Java too?
2012-05-01, 12:04 PM
Kalovale
Re: Quicksort implementation issue - Java
Yes, here's the merge sort, there should be plenty of things to improve on, but I'll save that for after I get quick sort to work:
sauce
PHP Code:
public static int[] merge_sort(int[] A) {
if(A.length >= 2) {
int[] left = Arrays.copyOfRange(A, 0, Math.round(A.length/2));
int[] right = Arrays.copyOfRange(A, Math.round(A.length/2), A.length);
left = merge_sort(left);
right = merge_sort(right);
int[] sorted_A = merge(left, right);
A = sorted_A;
}
return A;
}
public static int[] merge(int[] left, int[] right) {
int i = 0; int j = 0; int k = 0;
int[] sorted_A = new int[left.length + right.length];
// inversion counter for different problem - not used here
// piggyback riding on merge_sort's nlogn run time
int inversion_count = 0;
while(i+j < left.length + right.length) {
// Not exhausted, can continue adding
if(i == left.length) {
// exhausted left, just add right
while(j < right.length) {
sorted_A[k] = right[j]; j++; k++;
}
}
else if(j == right.length) {
// exhausted right, just add left
while(i < left.length) {
sorted_A[k] = left[i]; i++; k++;
}
}
else {
// not exhausted, compare and add back
if(left[i] <= right[j]) {
sorted_A[k] = left[i]; i++; k++;
}
else {
sorted_A[k] = right[j]; j++; k++; inversion_count++;
}
}
}
return sorted_A;
}
I'll run a few tests with different input sizes and see how the run time grows.
EDIT:
results
merge_sort|
input size
milliseconds
1000000
331
2000000
606
3000000
914
4000000
1148
5000000
1413
6000000
1726
quick_sort
input size
milliseconds
1000000
372
2000000
1288
3000000
2715
4000000
4652
5000000
7139
6000000
10148
Obviously it's not just off by a constant.
The boolean check is indicative of a successful sort, it's done outside of the timing period.
main()
PHP Code:
public static void main(String[] args) {
// TODO Auto-generated method stub
for(int iter=1; iter<=6; iter++) {
int size = 1000000*iter; // problematic starting at above 6,000,000 indices
int[] a = new int[size];
for(int i=0; i<a.length; i++) {
a[i] = RNG.nextInt(1000);
}
long start = System.currentTimeMillis();
int[] b = quick_sort(a);
long end = System.currentTimeMillis();
log("Time taken to quick_sort array of size " + size + ": " + Long.toString(end - start) + "ms.");
log(check_sort(b));
}
2012-05-02, 02:39 AM
happylight
Re: Quicksort implementation issue - Java
This is because of your choice of array to be sorted. You're using random numbers less than 1000 to fill your 6 million size array. This is going result in a lot of ties, which is quick sort's worst case situation.
If you change the RNG.nextInt(1000) to RNG.nextInt(size), you will see quick sort being faster than merge sort.
2012-05-02, 02:36 PM
Corn
Re: Quicksort implementation issue - Java
Quote:
Originally Posted by happylight
This is because of your choice of array to be sorted. You're using random numbers less than 1000 to fill your 6 million size array. This is going result in a lot of ties, which is quick sort's worst case situation.
If you change the RNG.nextInt(1000) to RNG.nextInt(size), you will see quick sort being faster than merge sort.
Was going to mention this. Quicksort is only faster than mergesort in certain scenarios.
Also an FYI quicksort has to sort completely random situations. If there's any semblance of order mergesort can be faster.
2012-05-03, 05:58 PM
Kalovale
Re: Quicksort implementation issue - Java
I'm not convinced that is the case, for three possibly related reasons:
1. I neglected to mention that I copied someone's implementation and it ran quicker than my merge_sort on this same input.
2. As well as I can remember my own code, I do nothing in the case of ties; It's just one check, move on. Swaps only happen when the accessed index has smaller value than the pivot, which is why I also believe that my code is doing the (almost) minimum number of swaps necessary (one swap for each index less than pivot lying on the right side of pivot).
3. quickSort's worst case involves choosing pivot values on either extreme ends, thus ending up doing the maximum number of swaps, i.e. n-1. E.g. Choosing 9 or 1 in the [9;8;7;6;5;4;3;2;1] array will both result in 8 swaps for the partitioning step; whereas choosing 4 in [4;4;4;4;4;4;4;4;4] will result in no swaps at all (just linear scan and comparison).
@ Corn: Randomized quickSort makes no assumptions about the input's order, due to the internal randomization step. (Random + more random = random // deterministic + random = random). Its runtime is "on average" O(nlogn) regardless what input is given, with a smaller constant coefficient than mergeSort's. Theoretically, quickSort's worst runtime is O(n^2), but this almost never happens (regardless of input). I still haven't quite mastered the proof, though.
2012-05-03, 09:28 PM
happylight
Re: Quicksort implementation issue - Java
1. Maybe you should paste their code here then.
2.3. The whole point is that quicksort is only fast when two partitions are roughly the same size. When you have a large number of duplicate values one partition will be much larger than the other, causing more recursive calls. In the case of the entire array being duplicates, it becomes the same as selection sort, i.e. O(n^2). You should be counting the number of comparisons there, not number of swaps.
When people discuss sorting algorithms, they generally assume there are no duplicate values.
2012-05-03, 10:34 PM
thinbear
Re: Quicksort implementation issue - Java
When the array is passed into the function, does it passed as reference or by value?
If the array is cloned many times, it would slow down a lot; delete the intermediate arrays immediately when it become useless (vs. waiting for garbage collection) will help speeding up the recursive process;
I haven't deal with java for a long time, so I could be wrong.
2012-05-03, 11:07 PM
happylight
Re: Quicksort implementation issue - Java
Quote:
Originally Posted by thinbear
When the array is passed into the function, does it passed as reference or by value?
If the array is cloned many times, it would slow down a lot; delete the intermediate arrays immediately when it become useless (vs. waiting for garbage collection) will help speeding up the recursive process;
I haven't deal with java for a long time, so I could be wrong.
It's passed by reference in Java so there's no problem there.
2012-05-04, 12:54 AM
nRxUs
Re: Quicksort implementation issue - Java
Merge sort is supposed to be faster than quick sort.
On worst case scenarios quick sort is O(n^2) while Merge sort is O(nlogn).
2012-05-04, 06:27 AM
Spaz
Re: Quicksort implementation issue - Java
Profile your code. Find a Java profiler or use the poor man's profiler: run your code in the debugger, hit break a few times, see where it usually is when you break. Performance problems are often not where you think.
2012-05-04, 06:38 AM
Corn
Re: Quicksort implementation issue - Java
Quote:
Originally Posted by nRxUs
Merge sort is supposed to be faster than quick sort.
On worst case scenarios quick sort is O(n^2) while Merge sort is O(nlogn).
int partition(int arr[], int left, int right) { int i = left, j = right; int tmp; int pivot = arr[(left + right) / 2];
while (i <= j) { while (arr[i] < pivot) i++; while (arr[j] > pivot) j--; if (i <= j) { tmp = arr[i]; arr[i] = arr[j]; arr[j] = tmp; i++; j--; } };
return i; }
void quickSort(int arr[], int left, int right) { int index = partition(arr, left, right); if (left < index - 1) quickSort(arr, left, index - 1); if (index < right) quickSort(arr, index, right); }
Quote:
Originally Posted by happylight
2.3. The whole point is that quicksort is only fast when two partitions are roughly the same size. When you have a large number of duplicate values one partition will be much larger than the other, causing more recursive calls. In the case of the entire array being duplicates, it becomes the same as selection sort, i.e. O(n^2). You should be counting the number of comparisons there, not number of swaps.
When people discuss sorting algorithms, they generally assume there are no duplicate values.
I think I get what you're trying to say, which is essentially the same thing I said, just from a different perspective. To put it into my language:
"When dealing with complete duplicates, any values chosen to be pivot will end up being the extreme value that quickSort sucks at dealing with. In the same manner, with a large number of duplicates, the probability of picking a good pivot (one that gives a 25-75 split or better) becomes significantly smaller when the duplicates happen to be a bad pivot choice. With no duplicates, this probability is 50%."
This is the new execution time for quick_sort where possible values are of the range [0, 100000+):
results
input size
milliseconds
1000000
176
2000000
265
3000000
410
4000000
549
5000000
682
6000000
828
7000000
988
8000000
1138
9000000
1276
10000000
1412
11000000
1569
12000000
1704
13000000
1857
14000000
2005
15000000
2143
16000000
2288
17000000
2433
18000000
2607
19000000
2782
20000000
2910
merge_sort did not benefit from this change.
merge_sort run time
input size
milliseconds
1000000
370
2000000
685
3000000
1053
4000000
1372
5000000
1688
6000000
1996
I would call this a satisfactory result, thanks happylight, though two issue remain:
1. Why is their implementation efficient even in the case of duplicates?
2. Why did I get StackOverFlow errors?
2012-05-04, 04:13 PM
happylight
Re: Quicksort implementation issue - Java
Quote:
Originally Posted by Kalovale
1. Why is their implementation efficient even in the case of duplicates?
2. Why did I get StackOverFlow errors?
1. The magic is in this segment:
Code:
while (i <= j) {
while (arr[i] < pivot)
i++;
while (arr[j] > pivot)
j--;
if (i <= j) {
tmp = arr[i];
arr[i] = arr[j];
arr[j] = tmp;
i++;
j--;
}
};
It moves any value less than the pivot to one partition and any value larger than the pivot to the other partition. So any value equal to the pivot stays where they were. Whereas your code would move all values equal to the pivot to a single partition. I've never actually seen quick sort implemented their way. It's quite cool.
2. Essentially each duplicate value causes an extra recursive call. You ran out of memory for function stacks.
2012-05-04, 05:15 PM
Kalovale
Re: Quicksort implementation issue - Java
I can see how that makes sense. Leaving the equal values in both arrays lowers their concentration, which in turn lowers their probability of being picked, and even if they are picked, their effects are lowered due to low concentration. It's quite cool.
This would mean that for arrays of exact duplicates, this implementation would take just as long to run as mine does.
2012-05-04, 05:31 PM
happylight
Re: Quicksort implementation issue - Java
Quote:
Originally Posted by Kalovale
I can see how that makes sense. Leaving the equal values in both arrays lowers their concentration, which in turn lowers their probability of being picked, and even if they are picked, their effects are lowered due to low concentration. It's quite cool.
This would mean that for arrays of exact duplicates, this implementation would take just as long to run as mine does.
Er no? With an array of all duplicates, their implementation would still run as fast as merge sort whereas yours would be as slow as selection sort.
2012-05-04, 05:43 PM
Kalovale
Re: Quicksort implementation issue - Java
Quote:
Originally Posted by happylight
Er no? With an array of all duplicates, their implementation would still run as fast as merge sort whereas yours would be as slow as selection sort.
Let's see.
Theirs does:
- n comparisons
- n/2 swaps
- (50<>50) split
- The total work done outside of the recursion is O(n), and they have logn number of recursion layers.
Mine does:
- n comparisons
- no swaps
- (1<>n-1) split
- The total work done outside of the recursion is O(n) -albeit with a lower coefficient-, and I have n-1 number of recursion layers.
Seems to be the case. I should somehow circumvent this.
2012-05-05, 02:55 AM
Russt
Re: Quicksort implementation issue - Java
Ah, yes. Duplicate values would do it.
The other thing is, if you anticipate having to sort arrays with many duplicate values, you can also treat values equal to the pivot as a third "partition" and not recurse on it at all. This requires a bit more overhead but would yield O(n) time on an array of all duplicates, which is the best possible result.