Talk:Quicksort/Archive 1
This is an archive of past discussions about Quicksort. Do not edit the contents of this page. If you wish to start a new discussion or revive an old one, please do so on the current talk page. |
Archive 1 | Archive 2 | Archive 3 |
(Older discussions)
Quicksort can actually be done in O(n log n) time worst case, by carefully choosing the pivot - the algorithm to do so is a bit complex though. See http://www.comp.mq.edu.au/courses/comp333/Lecture/divide-and-conquer_4.pdf for details. -- SJK
- Just choose the true median as the pivot. You can find the true median in Θ(n) time, and since partitioning takes Θ(n) time too, you are not increasing the complexity. Since the median guarantees perfect halving, you have O(n log n). You definitely increase the constant factor, though. Also, finding the median in Θ(n) time is quite complicated. -- Hari
- For the record, this is mentioned in the Relationship to selection section. Derrick Coetzee 17:11, 17 Sep 2004 (UTC)
One simple way to avoid worst-case of sorting sorted input is to add a loop before Quicksort to check if the elements are already sorted. This addition takes only O(n) time and does not lead to change in the overall complexity of O(nlogn). Ashvin
The article uses list to mean sequence of elements. But the terminology is confusing because quicksort does not work on linked lists. I think list should be changed to array everywhere in the article. -- Arvindn
Er, QuickSort works just fine on linked lists, as long as you write your implementation using linked lists and not arrays... Or so I'd think Chinju
- Yep. The key operation is "partition", which is essentially an operation to split a queue/stack/bag into two queues/stacks/bags. You can define a nice partition function for linked lists. -- Hari
- Quicksort actually is particularly suitable for doubly linked lists, because it only accesses elements sequentially (although in both orders). It doesn't even require any extra storage. For singly linked lists though mergesort is probably the way to go. Heapsort doesn't work on linked lists at all. Perhaps this is worth adding to some comparison sections. Derrick Coetzee 17:11, 17 Sep 2004 (UTC)
This so-called "realistic data" in which one needs to use a random number generator to ensure proper performance... What kind of data would that be? Data that's already been pre-sorted to make things hard for you? It seems to me this is business about using a random number generator is the result of inaccurate thinking about probability rather than a realistic concern (except, of course, in the case mentioned in the article where the data is fed to the algorithm by an adversary). Does anyone mind if I remove most of the discussion about random number generator usage?
Also, the article mentions that Quicksort can be optimized since every element needs only to be compared against the (mostly unchanging) pivot element ("The inner loop which performs the partition is often very amenable to optimization as all comparisons are being done with a single pivot element"). What exactly is meant by this? How exactly can one take advantage of that situation for optimization? Just wondering...Chinju 21:53 Mar 6, 2003 (UTC)
- I adjusted the random pivot wording based on real-world knowledge that lists are often partially sorted and random isn't the best for that situation. Random is fine if the data isn't partially sorted but humans don't have to be purists when optimizing algorithms.:) The seldom changing pivot element may let it be held in a register or improve the locality of memory references. JamesDay 15:46, 22 Oct 2003 (UTC)~
"This works well in practice but chance can still place the lowest element in the middle and cause worst case performance."
Unclear. Putting the list's lowest element in the middle would not cause worst case performance, or even make it worse than O(n log (n)). If it kept happening at every level of the recursion, that would cause worst case performance all right, but that's very unlikely, and is a possibility for any quick and simple way of choosing a pivot.
- I agree. I've reworded that and added a bit to the one I described as "best", since that does have higher overhead and can't always be best. In the rewriting I did I was trying to get away from the over-emphasis of the performance with attack data and left in a criticism which middle element choice didn't deserve so much in the general case. JamesDay 12:47, 28 Oct 2003 (UTC)
Sorry about the unnecessary alphabetizing. I must be going blind since I completely overlooked the intro to the section that the explained ordering of algorithms. Richss 03:17, Sep 8, 2004 (UTC)
- Frankly, I found the ordering by length of code peculiar, considering most of the shorter examples are in non-mainstream languages and aren't efficient in practice. In particular, 5 different functional language examples may be overkill. Perhaps alphabetical would be less biased. Derrick Coetzee 03:49, 8 Sep 2004 (UTC)
- 5 different functional language examples may be overkill —actually, the first eight samples are written in functional languages... nevertheless, the first four are all fundamentally different from each other (even the Miranda and NGL entries, which look somewhat alike, are really different: the former is value-level, the latter function-level). The Erlang, Haskell, and Miranda entries are indeed completely analogous, though (however, which to leave out and how to justify the choice? for then we should also choose between the Java and C# implementations) The OCaml and CL entries are closely analogous in spirit to those three, but expressed very differently. Regarding efficiency, some of those first eight implementations are not only clear, but very fast. If speed were the only concern, then the right example would be in assembler.
- Anyway, the article is about the algorithm not about popular or mainstream prog langs, and this provides some support to having the shortest implementations come first. Alphabetical order is best when political correctness or the like are in order. IMO, this is not the case: it is not a matter of giving preference to one or another prog lang, it is a matter of keeping the interests of the reader of the article above everything else. This is, in fact, one reason for moving the samples section to the end of the article, right past the External Links, See Also, and References section. With some of the longest samples among the first, an alphabetical ordering makes it very hard for the article reader to realize that there are vastly different ways of expressing the quicksort algorithm in different prog langs. It becomes Just imagine if someone comes with an assembler implementation that takes 200 lines... Furthermore, the shortest-first arrangement is more informative than the alphabetical one in the sense that with the latter it is very hard for the user to find out several things that can be readily glanced with the former: which implementations are alike, wich is the shortest one, etc. If there were dozens of implementations, then the alphabetical listing would indeed come in handy in order to locate a particular one, but this doesn't apply for small numbers. — danakil 05:06, Sep 8, 2004 (UTC)
- My thought had not been about somebody reading the article for pleasure (whom you wish to keep interested). Rather, I was thinking about what someone like myself would be doing, which is using it as a reference (encyclopedia) to see an example in whatever language I am using.Richss 12:04, Sep 8, 2004 (UTC)
- I'm sorry, Richss. I hadn't intended to convey the idea of reading for pleasure (although now that you mention it, it doesn't seem to be a goal to strive to keep the reader entertained). By keeping the interests of the reader [above all] I certainly meant to tender to the interests of those who use Wikipedia as a reference tool.—danakil 16:24, Sep 8, 2004 (UTC)
- My thought had not been about somebody reading the article for pleasure (whom you wish to keep interested). Rather, I was thinking about what someone like myself would be doing, which is using it as a reference (encyclopedia) to see an example in whatever language I am using.Richss 12:04, Sep 8, 2004 (UTC)
- It's true that efficiency is not the main concern, and I concede that the functional examples are sufficiently different, but can you really call a sort which isn't O(n log n) at all quicksort, even if it makes the same exchanges in the same order? (referring here to the example using list comprehensions - I'm not sure how the others work out) I agree that the number of examples is so far small enough that it's easy enough to spot your favourite language in the listing. Derrick Coetzee 14:25, 8 Sep 2004 (UTC)
- I agree with you: it seems reasonable to require that a sort be actually O(n log n) in order for it to be called quicksort. Nevertheless, I'm sure there would be quite a few moors entrenched in that coast. Furthermore, there are quite a few different Haskell implementations, I wonder if all of them suffer from the same problem (I've heard several people claim that Miranda is much faster than the common Haskell implementations—Miranda is a commercial product while Haskell's use is mainly academic). On the other hand, the J,NGL,OCaml entries are, IME, certainly O(n log n), though. And I would be surprised if the Erlang and CL ones weren't. One final remark about performance: if someone was implementing production quicksort using Java, C, C++ or C#, he/she would likely go the non-recursive route. It would be nice to have an example of that. — danakil 16:15, Sep 8, 2004 (UTC)
Just out of curiosity, can somebody provide an explanation (possibly just a link to an explanation) of why the list comprehensions make the Haskell implementation of quicksort have an average case complexity of Theta(n^2) instead of Theta(n log n)? Thanks. Chinju 05:28, 26 Sep 2004 (UTC)
Doesn't seem Theta(n^2) to me. It is inefficient since partitioning that way means traversing the list twice, but it's still a Theta(n) operation, as well as the performed list concatenation (which could also be avoided). So the final complexity just depends on the number of times you'll perform those operations (which depends on how the partitions turn out). --jadrian 00:35, 29 Sep 2004 (UTC)
Yeah, it seems Theta(n log n) in the average case to me, despite the inefficiencies you pointed out, which is why I was curious as to how the quadratic complexity was derived. If I can't find any explanations soon, I think I'll remove the notice about being "O(n^2) on average". --Chinju 19:38, 30 Sep 2004 (UTC)
- I don't oppose this change. The original statement probably had to do with excessive delay of computation in Haskell implementations without memoization, in that if you fully expand out the quicksort expression for a given Haskell list before evaluating it, you end up with a very large expression. Just a guess though. Derrick Coetzee 20:32, 6 Oct 2004 (UTC)
"Note that the partition procedure only requires the ability to traverse the list sequentially in both directions; therefore, quicksort is not confined to operating on arrays (it can be used, for example, on doubly-linked lists)."
No, it does not require that. Doing it with singly linked lists is trivial. --jadrian 23:34, 16 Oct 2004 (UTC)
- You're right, this isn't hard, since the order of elements in the resulting sublists is unimportant, allowing you to build them both backwards. I'll update the comment. Derrick Coetzee 04:40, 17 Oct 2004 (UTC)
Capitalization
Perhaps this is a stupid question, but is quicksort capitalized (Quicksort) or not (quicksort)? What does the original paper do? What's common practice? Derrick Coetzee 04:48, 17 Oct 2004 (UTC)
Reduce recursion
A well-implemented quicksort requires at most lg(N) partition boundaries on the stack. Instead of recursing for both left and right partitions, recurse for the *smaller* partition, then "adjust" the partition boundaries at the current call level and loop. For N, say, one billion, the depth of recursion will not exceed 30 (as at each level of recursion the number of elements is at most half what it was at the preceding level). One-sided recursion is an *old* idea.
Pseudo-C Pseudo-code to illustrate the point:
quicksort(array, left, right) //say right-left+1 = Z, then...
{
while (left-right>threshold)
{
pivotPosition = partition ( array, left, right);
if (pivotPosition-left < right-pivotPosition)
{
quicksort(array, left, pivotPosition-1); //at most Z/2;
left = pivotPosition+1;
}
else
{
quicksort(array, pivotPosition+1, right); //at most Z/2
right = pivotPosition-1;
}
}
insertionSort(array, left, right);
}
-- james_barbetti@yahoo.com
- If you read a little further in the article you'll find a description of partially tail-recursive quicksort (which is what you describe) including pseudocode. Deco 23:59, 20 Nov 2004 (UTC)
My recent drastic changes
I moved almost all of the examples from this article to quicksort implementations; I think they were getting far too numerous, exceeding the rest of the article in size by multiple times. I attempted to leave behind a small set such that no two were too similar, the popular languages were mostly covered, and most of the basic implementation techniques were covered. I also extracted some of the introsort discussion to an article on introsort, not because it was too long but because I'd like to see introsort expanded in the future and it wasn't poised to do that as a section in this article. I've probably upset some people whose favourite language was moved out, and I removed some I rather like myself, but I think it benefits our readers to keep the focus on the quicksort algorithm itself. Deco 00:54, 21 Nov 2004 (UTC)
changing Θ() to O() ?
I see someone changed a bunch of Θ() to O(). Is this correct, or a misunderstanding of the debate at Talk:Big_O_notation ? --DavidCary 07:15, 15 Dec 2004 (UTC)
- It's called Big "O" Notation, but I thought it was technically written as Θ().--НиколайTalk 00:55, 27 September 2006 (UTC)
- There's a difference between Big O, which is denoted O(...) and big theta, which is denoted Θ(...). Θ(nlog(n)) means the algorithm's complexity is bounded tightly by nlog(n), while O(nlog(n)) just means it's bounded above. If I recall correctly, quicksort is in fact Θ(nlog(n)). Adamwg 07:30, 9 April 2007 (UTC)
- Correct. If you wanted to get technical, since Quicksort's worst case running time is n^2, it's complexity is O(n^2). Usually people use Big-O to mean Θ though, even though it's not technically correct.
- 76.104.149.106 05:13, 3 May 2007 (UTC)
C implementation
Has anyone tested the C implementation? There seems to be a bug in the algorithm, and the swap function won't work, as C uses call by value.
- Swap could be implemented as a macro — it's really acting as pseudocode there. I'll have a look at the rest of it later. Deco 17:52, 28 Dec 2004 (UTC)
The C implementation does not distinguish between pivot keys and pivot indices. The implementation shown works, but only because both keys and indices are integers.
numbers[left] = pivot // pivot key pivot = left // pivot index
To clarify and ease adaptation to other key types, I think the pivot index should be named pivotIndex, i.e. the algorithm should read
void q_sort(int numbers[], int left, int right) { int l_hold = left; int r_hold = right; int pivot = numbers[left]; int pivotIndex; int t; ... numbers[left] = pivot; pivotIndex = left; left = l_hold; right = r_hold; if (left < pivotIndex) q_sort(numbers, left, pivotIndex-1); if (right > pivotIndex) q_sort(numbers, pivotIndex+1, right); ...
C++ Code
The C++ code was taken from [link removed; blocked by spamfilter —Ruud 20:22, 23 April 2006 (UTC)] without notification of the original author. That's very bad style and shouldn't happen on wikipedia. I removed it for now, hoping someone comes up with another version.
That partition algorithm seems wrong
The partition algorithm given seems wrong to me. If the first item is larger than the pivot, for example, it swaps it with itself and then moves on. Am I missing something, or is this just a mistake? RSpeer 20:57, Feb 21, 2005 (UTC)
- I added some statements to try to help address your confusion. The swap is only done if the pivot is less than or equal to the pivot (not greater), but it might so happen that an element larger than the pivot is swapped into a position preceding the pivot's final position. The trick is that an element may be moved multiple times before reaching its final place, so this is okay. I hope that makes some sense. Deco 02:24, 24 Feb 2005 (UTC)
It seems that other people are confused by the partitioning algorithm as well. (I guess the frustrated anon defacing the page would be one example.) What is the advantage to this algorithm, and why don't we use the one from CLRS? RSpeer 03:07, Mar 21, 2005 (UTC)
- Currently article shows partitioning algorithm which has only minor differences compared to CLRS one (Cormen, Leiserson, Rivest, Stein, Introduction to algorithms, 2ed). So I think your complaint is obsolete.
- But the current partitioning algorithm in my opinion has one drawback which can be observed then we sort array which has identical values (v, v, ..., v). In this case we get worst case of quicksort. This easily follows from this statement which is in article: "It partitions the portion of the array between indexes left and right, inclusively, by moving all elements less than or equal to a[pivotIndex] to the beginning of the subarray, leaving all the greater elements following them." So we can very easily get worst case of quicksort by sorting array which contains equal values. Of course this worst case can also cause stack overflow.
- Some other partitioning algorithms can avoid this problem (eg partitioning code used Sedgewick book). Aleksas 15:03, 21 July 2007 (UTC)
Does "select" remove the pivot from the array? If not, and if one chooses the last element in the array as the pivot, and if the last element in the list is the largest, then the given algorithm does not terminate. Try this for example on [-2,5,3,10]: choosing 10 as the pivot, less=[-2,5,3,10] and greater=[] every time. This is of course a moron's mistake but hey, I'm a moron. The article should clarify that "selecting" the pivot also means "removing" the pivot, or else use three lists, less, equal, and greater, putting any element equal to the pivot in equal. Simplex 17:11, 13 November 2007 (UTC)
Partitioning statement seems wrong
This statement looks wrong to me:
- Because the above pseudocode procedure will sometimes swap elements which are equal, it is slightly unstable.
To my knowledge, there is no known in-place stable partitioning algorithm (and I've looked). The issue is not merely that it sometimes swaps equal elements (which is easy to fix), but that it can swap an element over an element which is equal to it. Consider this list:
- 7, 7, 3, 1, 4
The very first exchange would exchange the 4 with the first 7. Now, the first 7 is following the second 7, where before it was preceding it. This is an unstable exchange.
Of course, if you partition not-in-place, by building a list of those less-or-equal and those greater, each preserving the original list order, then the sort is stable.
In short, I'm deleting it, because it's wrong. :-) Deco 06:11, 4 Mar 2005 (UTC)
in-place sort
For the edit that was just made (deleting "it's an in-place sort"): the definition of in-place sort was wrong. Whoever wrote it said that it meant that elements already in their proper place would not be moved. That is completely false: in-place sort means that the algorithm has O(1) (constant) memory requirement. That's on the page now; everything is now as it should be regarding that. Decrypt3 18:56 EDT, 4 April 2004
- When Deco claims "there is no known in-place stable partitioning algorithm", I'm pretty sure that *includes* Quicksort. If Wikipedia is going to standardize on a definition of the term In-place algorithm that excludes Quicksort, we need some other term to describe algorithms like Quicksort and Tree traversal. What term would you suggest? What do you call algorithms that are not in-place" ? --DavidCary 05:03, 13 Mar 2005 (UTC)
- The partitioning step of quicksort is in-place. It's the recursive step that isn't. RSpeer 06:56, Mar 13, 2005 (UTC)
- Yes, my comment was meant to emphasize that most in-place partitioning algorithms are unstable, and most stable ones are not in place. Most variants of quicksort, however, are not in-place because they use a stack of depth at least log n (in fact, if the values being sorted may be arbitrarily large, it requires log2n space, where n is the total number of bits). The distinction between additional storage on the call stack and explicit additional storage is really a purely artificial one created by hardware and programming languges. You can easily write a version of quicksort that makes no recursive calls but manages an explicit stack of O(log n) size.
- The closest thing I can think of for describing the class of algorithms you refer to is DSPACE(log2 n), which is the class of problems that can be solved in space log2 n, where n is the total number of bits in the input. Any algorithm that has log n recursive depth and log n bits in each stack frame lies in it, including quicksort and recursive tree traversal. Tree traversal has a number of simple in-place versions, though. Deco 07:59, 13 Mar 2005 (UTC)
- To clarify, I refer here to the partially tail-recursive variant of quicksort, and to tree traversal of a complete binary tree. Other cases can require more space. Deco 08:10, 13 Mar 2005 (UTC)
- Some additional comments: quicksort is not a partitioning algorithm. It is a sorting algorithm. It uses partitioning as a subroutine. Algorithms that aren't in-place are sometimes called not-in-place, and I don't know of any briefer name. Deco 20:35, 13 Mar 2005 (UTC)
Capitalization
This article uses both "Quicksort" and "quicksort", in a hideously inconsistent way. Which way should it be? - Fredrik | talk 09:47, 21 Mar 2005 (UTC)
- I've asked before — no one can seem to decide. I would vote we do it however Hoare did it, but unfortunately I have no access to his paper. Mathworld seems to use lowercase quicksort however, and this seems nicer to me and more consistent with other sort names. I'm leaning towards quicksort if there is some consent. Deco 04:54, 29 Mar 2005 (UTC)
Average complexity
The average complexity given can't be right. Quicksort is O(n log n), which makes Θ(1.39 log n) impossible. --Doradus 15:27, May 12, 2005 (UTC)
Could anyone write this addition on the article?
[This information has been removed]
- I'm afraid this violates our Wikipedia: No original research policy. Deco 19:27, 20 May 2005 (UTC)
- You could have informed me before. It took me two hours writing that explanation in english. Oh, well. I thought this would be the perfect page to release the optimization for everyone. I'm going to remove the C source code, too. Thank you, anyways.
- Who could have informed you before? --Doradus 23:51, May 20, 2005 (UTC)
- Hey people, isn't a kind of harsh to delete the post and just tell a newbie that wikipedia is not a place for original research? While this cannot be put to the article, we can suggest that he (or possibly she) post it to Usenet or something else. Also, I don't think we have to remove the post. We tend to keep this kind of somehow irrelevant posts at the talkpage. See Japanese war crimes for example. Finally, I think P3do, your comment is a little too unhelpful or unfriendly; apparently, this guy is upset. I mean aren't we allowed to have little more fun and friendly yet irrelevant conversations in the talkpage?
- Since his method appears to be neat, I restored it to my user page. See User:TakuyaMurata/Barcelo's method. I am very curious if this really results in actual improvement. -- Taku 00:27, May 21, 2005 (UTC)
- He removed the information himself after being told that it violates the NOR policy. Fredrik | talk 07:13, 21 May 2005 (UTC)
- I don't get where he stores his 'blank' space. Furthermore, I don't see how it's any improvement over swapping. -- AllyUnion (talk) 07:09, 21 May 2005 (UTC)
- I'll try to explain it a bit better. As you have to take one element from the list - the pivot - to store it in a variable/register in order to make the comparisons, you can use the position it occupied as a 'blank space'. The pivot is already stored in a variable, so you only have to return the pivot to the 'blank' space at the end of the algorithm, and use it meanwhile.
- This is very helpful if the 'blank' space is at the start (or end) of the list, because then you can search the list from the other side, and when you find an element which is lower than the pivot, copy it to the 'blank space' directly, overwriting it. After you copy an element, it becomes the new 'blank space', so you can start searching from the other side.
- It's important to note that the 'blank space' stores old info until it's overwritten, so this might cause that an index moves too far, even outside the limits of the array. However, this problem can be fixed without losing efficiency.
- This requires much less reading/writing than the swapping function, which must preserve the values of both elements, and thus needs to store one of them in a 'temp' variable. This improvement should be more noticeable as the size of the elements increases.
- Hmm... my version of understanding of quicksort was that the pivot was stored in the list... and that the only temporary variables you had was the index positions which were passed recursively. -- AllyUnion (talk) 07:52, 25 May 2005 (UTC)
- Sorry if it looks confusing, but it's already confusing to explain it in my own language. -- DrJones 10:28, 21 May 2005 (UTC)
- Could you please change the name to Hoyos' method? Barcelo is my second surname, which works much like the middle initial in english. -- DrJones 10:28, 21 May 2005 (UTC)
- I already put an encouraging comment and suggested he post it to Usenet at Talk: Quicksort implementations. I have nothing against the guy, but of course we're not putting original research in the article. It's still cool that he wrote it up though and I'd encourage him to pursue these ideas and show them to people. If it does end up in a paper somewhere, we really could add it then. Deco 23:33, 21 May 2005 (UTC)
- My bad :) As Fredrik pointed out, that was all my misunderstanding. Deco, I should have known what kind of person you are by now. Also I missed Talk:Quicksort implementations. I was unaware of that article. -- Taku 23:49, May 21, 2005 (UTC)
Just a detail. For test purposes, sort routines are shown sorting an array of simple integers or the like. In usage, there is in general a data aggregrate (a "record", perhaps, in other terminology) being sorted on the basis of some aspect of the aggregrate, the key, which might be an integer (or a string of integers, or a text string, or whatever) which has its value stored in the pivot variable v (as distinct to using an index to the location in the array: the values in the array are going to be moved). Well, if there is a plan to take advantage of the "space" available by copying out the pivot's key value, this space is only fully available if the associated data is copied out as well. Having a pivot key whose value is not necessarily one of the values in the array being sorted but one chosen via computation (if the keys are amenable to computation) such as PivotKey:=(A[Left].Key + A[Right].Key)/2 or otherwise are further variants. NickyMcLean 21:38, 9 October 2006 (UTC)
Introductory pseudocode change
I tried to rewrite the first pseudocode to be easier to understand, per Fredrik's comments at Wikipedia:Featured article candidates/Quicksort. I based this version on the Haskell version, but I avoided using list comprehensions or funny things like that, expressing it using more familiar imperative constructs and some English. I moved the other version into a new section and added explanation for the in-place partition algorithm, which is really rather subtle. Deco 03:09, 25 May 2005 (UTC)
I have to say, i don't find the pseudocode any easier to read than the C version - in particular it seems extremely verbose for what is really a very simple algorithm. Further, i would imagine that most people have more experience with a C-like language. Is the pseudocode merely designed to mimic that found in Cormen + Rivest (which benefits from much nicer formatting than here) or is there some other reason for having it? --Shashmik11 12:52, 13 Jun 2005 (UTC)
The idea of pseudocode is that you can understand it without knowing what language-specific symbols like & mean. Sure, it's perfectly natural to you, because you know C. But actually, lots of people are now learning to program in languages other that C. RSpeer 13:28, Jun 13, 2005 (UTC)
I accept that some people might not know C, but the idea that the pseudo-code shown is free from language-specific details is wrong - it seems to be some odd variant of algol. I still contend that code written in this way isn't actually any more readable, especially things like select a pivot value which should really be typeset differently from the rest of the code. All in all i think the pseudocode would be much better if it wasn't typeset using a fixed width font, but maybe that's just me. Shashmik11 14:45, 13 Jun 2005 (UTC)
- I agree it should ideally not be typeset in this way. However, typesetting it properly makes it impossible to edit, which I think would be against the wiki spirit. The point of the change was not to make it easier to read but easier to understand - the typical C version uses an in-place partition algorithm that is, while simple, not as obvious as the out-of-place partition algorithm used in some functional implementaions and the first pseudocode example. The language/syntax is largely irrelevent, as long as it doesn't rely on symbols that are unfamiliar to a large portion of programmers (like the typical functional language example). Incidentally, I've heard people compare my pseudocode to Pascal, ML, C, Python, and now Algol — I really wish they'd make up their mind which language I ripped off. Deco 04:18, 14 Jun 2005 (UTC)
- Point taken. Maybe wikipedia should get some sort of pseudo-code renderer in a similar manner to the way equations are rendered. How would one campaign for something like this?--Shashmik11 11:41, 18 Jun 2005 (UTC)
- That'd be cool. I'd start with Wikipedia: Village pump (technical), since this would be a feature request. You might want to talk somewhere on meta about it, since it affects all Wikimedia projects. Somehow I imagine, though, that you might have trouble garnering support for something with an "easy workaround" (doing it how we do it now). Deco 00:33, 19 Jun 2005 (UTC)
- The pseudo code is not understandable. I'm a good C programmer, and I can't figure out what the hell the iterative one means. Its such an ugly mess, and it has smart-ass comments that don't help at all. It doesn't even attempt to explain what a two-argument Push() is supposed to do - tho I think it loads all the values from a range.. Fresheneesz 18:03, 4 November 2006 (UTC)
- This may come as a surprise to some, but programmes can be written in computer languages other than C. The pseudocode is (was) a slightly edited version of a programme written in Pascal, and the special point is that being from an actual source file of a working programme, it is known to work, rather than being some hands-waving generalised description that alas proves vague to someone actually using it to prepare code. Removed from the source was stuff that related to the animation of this sort method on a screen and certain other details relating to accounting of compares and swaps. Yes, I erred in failing to fully describe exactly what I meant by Push(a,b) and Pop(a,b): given languages such as Hesketh, intensely tricky things might be triggered thereby, but in fact they are simple enough, so simple that I failed to imagine the necessity for explication since I thought the notions of push and pop well-known. I agree that Hesketh code is odd, due to unfamiliarity (hah, try APL!), but the author has explained its operation. Transliteration to a pseudo-C may soothe some fear of the different, at the usual price of drivel trailing around the page. It is beyond me why a simple statement such as if L2 = R2 then swap(a,b); is thought better shown sprawling across three lines. And one of the brilliances of C is the difficulty over using =, thus the introduction of "equals". Capitalisation of some variables but not others had better be done carefully as some compilers distinguish the two forms. Avoiding such trivial mistakes is one reason for starting with a working source file. Removing "smart-arse" (not ass: read Chaucer) comments may sooth frustration, but also removed are comments noting that a one-element span needs no further consideration, that the scan and swap process is striving to attain a certain condition, and hints that a few "go to" statements might reduce wasted repetition. Comments encased in {...}, the odd Pascal style, may well cause cognitive dissonance to a C-monoglot, but //(space) is just flabby... I have a version of QuickSort that does not use a pivot element but the reference I have gives a source code (not Pascal, and shock awe, not C) featuring a number of "go to" statements that are difficult to finesse with concealments such as "while true do" and the like especially if wasteful repetition is to be avoided. Delicate minds might be upset at the sight.
- The pseudo code is not understandable. I'm a good C programmer, and I can't figure out what the hell the iterative one means. Its such an ugly mess, and it has smart-ass comments that don't help at all. It doesn't even attempt to explain what a two-argument Push() is supposed to do - tho I think it loads all the values from a range.. Fresheneesz 18:03, 4 November 2006 (UTC)
- I agree that many typefaces show damn all differences between the glyphs for Il1, but the courier typeface for the fixed-spacing components does well enough. Fixed-spacing typefaces allow indentation and alignment of code pieces to be presented without difficulty, in the absence of proper typesetting arrangements. NickyMcLean 21:01, 5 November 2006 (UTC)
I see that months after you C-favouring fellows insisted on converting the pseudocode of the iterative version to C there are still mistakes in your pseudocode being fixed (the latest being due to typography yet again: misreading l and 1 is a favourite), whereas the decried Pascal/Algolish pseudocode was taken from the source of an actual working programme that had been checked. But C is so wonderful that the rest of us have to endure recondite gibberish such as { and } scattered across the page. So is the new pseudocode correct yet? I don't know. Prior to the latest change, the enthusiasts would have answered yes to that, and they would have been wrong.
So here is a stripped-down version of a Pascal programme that does work, that (being Pascal with its rigid typing of parameters) has the array A to be sorted for elements first to last as a global. Not explicitly declared is its type which might be some sort of data aggregate, which aggregate contains a component of type SortStuffKeyType that defines the type of the component Key upon which the ordering is to be based, and likewise, a procedure Swap to interchange two elements of the array A whose precise details surely need not be explicated.
Procedure QuickSort(First,Last: longint); var v: SortStuffKeyType; var sp: integer; var l,l2,p,r,r2: longint; type pair = record l,r: longint; end; Var Stash: array[1..32] of pair; BEGIN sp:=1; stash[1].l:=first; stash[1].r:=last; while sp > 0 do begin l:=stash[sp].l; r:=stash[sp].r; sp:=sp - 1; While l < r do begin p:=(l + r) div 2; v:=A[p].key; l2:=l; r2:=r; repeat while A[l2].key < v do l2:=l2 + 1; while A[r2].key > v do r2:=r2 - 1; if l2 <= r2 then begin if l2 <> r2 then Swap(A[l2],A[r2]); l2:=l2 + 1; r2:=r2 - 1; end; until l2 >= r2; if r2 - l > r - l2 then begin if l < r2 then begin sp:=sp + 1; stash[sp].l:=l; stash[sp].r:=r2; end; l:=l2; end else begin if l2 < r then begin sp:=sp + 1; stash[sp].l:=l2; stash[sp].r:=r; end; r:=r2; end; end; end; END;
And to keep Fresheneesz happy, there are no comments. NickyMcLean 20:04, 12 February 2007 (UTC)
omega?
What is Ω(n)? (In the section "Version with in-place partition") -- Tarquin 07:51, 17 Jun 2005 (UTC)
It's asymptotically bounded from below by n, not from above. Ω(n) means it takes time at least on the order of n, where O(n) would mean at most on the order of n.
too complex
im an advanced cmputer user yet i didnt understand anything can somebody please explain it more simply
- If you are indeed an 'advanced cmputer user' then you should be able to rephrase the explanation yourself, if you find it too complex. I believe it is sufficiently clear.
- Pick an element (generally referred to as the pivot) from the list, any element will do.
- Put everything bigger than the pivot to the right, and everything smaller to the left. (right and left are actually lists of their own)
- So long as there is more than one element in either of these lists, apply steps one and two again (to each list, left and right).
Eventually the entire list will be in order. If you have a pouch of coins, and you pick out a random coin. It's 20 cents. That means you put all the 1,2,5 and 10 cent pieces to the left and the 50, $1 and $2 coins to the right. (If there are multiple copies of the coin, then choose whether they go to the left, right or stay with the pivot 20cent piece (this is up to the programmer). Then, in your first pile you have 1,2,5 and 10 cent coins. Pick a random one -> you get 2cents. then put to the left all of the 1cent coins, to the right everything else. now, you dont have to sort the 1cent coins because they are all the same (essentially). So you just sort the 2, 5 and 10 cent pieces. When that is done, you add them all together, including the pivot. Then you go back to the 50, $1 and $2 coins. Sort them in the same way.
I agree with the OP in this section. This explanation with the coins was helpful - why doesn't the article include a section like this for the layperson (without all the jargon)? As it is, the article is targeted to a fairly sophisticated audience. However, it's an interesting idea that young teens (for example) would be able to comprehend, but I doubt many young teens would be able to comprehend the article itself as it currently stands. Canberran 12:59, 19 October 2007 (UTC)
Quicksort implementations overload
Once again the quicksort implementations section is growing too large. I moved several implementations to the quicksort implementations article.
To inhibit this growth and to prevent forking between this page and quicksort implementations, I have created a template called {{Quicksort core implementations}} containing the implementations listed on both pages. Any further edits will have to be made to Template:Quicksort_core_implementations or to quicksort implementations (hopefully most will be to the latter). I will put an HTML comment in both pages with a direct link to the template editing page. Deco 00:23, 29 November 2005 (UTC)
- Update: I didn't realise this, but the section editing links for the sections transcluded from the template do work correctly and edit the correct sections of the template. This should keep editing them relatively easy while preventing divergence. Deco 00:30, 29 November 2005 (UTC)
- Another update: If you have this article on your watchlist, please also keep Template:Quicksort_core_implementations on your watchlist to help prevent transcluded vandalism to this page. This occurred today. Thanks. Deco 20:47, 30 November 2005 (UTC)
Line Removed
", especially if a cryptographically secure pseudo-random number generator is used to reduce the chance of an attacker predicting the "random" elements".
Actually the cryptographic strength of the generator has little effect. Cryptographic strengh only prevents your adversary working out your seed from generated numbers. Since the generated numbers are restricted to the computer using the generator, an attacker can just as easily find out the seed itself as numbers that might reveal the seed.
- The point of this comment was that the seed should not be a chosen in a predictable fashion, such as a fixed value or the current time, such that the random numbers can be determined without access to the sorting program's state. Not sure how best to put this. Deco 01:38, 10 January 2006 (UTC)
Simply state that it helps if the seed is not predictable. Linking cryptographically secure generators is a little misleading because even the strongest generators won't help if they are seeded the same way every time.
- Good idea. I added the following:
- One defense against this is to use highly unpredictable random seeds, such as numbers generated from hardware random number generators or entropy pools, and to avoid exposing partial information that might reveal information about the random seed or sequence.
- Edit if you like. Deco 22:53, 10 January 2006 (UTC)
Changes to quicksort core implementations
I have removed all languages from this template, execpt for C, Haskell and Prolog. The C++ example offered nothing spectacularly new over the C example. J being an esoteric language and Joy and SML not offering much over Haskell. Also, however well meant using templates to store encyclopedic contents is a bad idea. Unless someone objects, I will substitute it. Cheers, —Ruud 12:57, 28 January 2006 (UTC)
All the other implementations would belong on WikiSource or WikiForge, in my humble opinion. —Ruud 12:59, 28 January 2006 (UTC)
- Reducing number of languages sounds good to me. The purpose of the template, however, is to avoid duplicated content between two articles. This was done not on a whim but to fix an immediate problem where duplicated content was forking as people made "fixes" and "improvements" to one version or the other of the code. You haven't given any concrete reason for why "using templates to store encyclopedic contents is a bad idea", but I don't believe this to be true.
- I honestly wouldn't mind moving the other impls to another wiki, but we need to provide an outlet for these people or they'll just keep adding them here. Either that or someone needs to take on the onerous burden of moving stuff out every time somebody adds something. And believe me, the HTML comments don't stop them.
- Also, you should've moved these implementations to quicksort implementations, or if you believe some of them should've been deleted, at least removed textual references to them. It's not clear to me whether or not you intended to delete them instead of move them so I haven't taken any action. Deco 07:34, 2 February 2006 (UTC)
- I did not move them because I did not see that the implentations at quicksort implementations differed from those in the core implementation. Wikipedia:Template_namespace states that templates should not be used to store article content. —Ruud 02:30, 3 February 2006 (UTC)
- I don't believe the makers of that policy anticipated this specific circumstance. It's good advice in general, but here it solves a specific problem. The deletions of the other implementations is okay I guess, I can fix up the references. Deco 06:15, 3 February 2006 (UTC)
Changes to partition implementation
Recently a couple anonymous editors made some changes to the partition pseudocode which made it incorrect (and also really confusing). I was pretty sure the original was correct, but just to make sure I did a direct translation to C++ and tested it for a bit:
#include <algorithm> using std::swap; template <class T> int partition(T a[], int left, int right, int pivotIndex) { int pivotValue = a[pivotIndex]; swap(a[pivotIndex], a[right]); // Move pivot to end int storeIndex = left; for (int i=left; i<=right-1; i++) { if (a[i] <= pivotValue) { swap(a[storeIndex], a[i]); storeIndex = storeIndex + 1; } } swap(a[right], a[storeIndex]); // Move pivot to its final place return storeIndex; }
It seems to check out. Thought someone else might find this useful. Deco 10:20, 2 February 2006 (UTC)
Check these recent changes
I don't have time to check these right now, but these recent changes by 213.228.191.42 should be verified. This user also made these changes to Quicksort implementations. The 'print' comments in the second set of changes makes me pretty sure these edits are not correct. If nobody gets to this in a couple days, I'll look at it myself. – Doug Bell talk•contrib 02:17, 3 February 2006 (UTC)
- I already reverted them here. This is, as far as I can tell, intentional sneaky vandalism. Deco 06:16, 3 February 2006 (UTC)
Introsort
Should introsort be added to the "other optimizations" section of this article? --Deryck C. 06:29, 13 March 2006 (UTC)
- Already there. Deco 00:00, 21 June 2006 (UTC)
"Beatiful" Haskell-version
Removed...
qsort [] = [] qsort (x:xs) = qsort (filter (< x) xs) ++ [x] ++ qsort (filter (>= x) xs)
Because, well... what the heck is this? Is it just a call to an existing qsort function in the Haskell API? I don't know Haskell. ~ Booya Bazooka 19:16, 1 June 2006 (UTC)
- No, it's a definition for a function called "qsort". It's a recursive definition. Read it like this: "To qsort the empty list, return the empty list. To qsort a list whose first element is x (call the rest of the list xs), return the result of qsorting those elements of xs less than x, followed by x, followed by the result of qsorting those elements of xs greater than or equal to x." —Bkell (talk) 19:51, 1 June 2006 (UTC)
- Well, if you say it works, I suppose I believe it. Transwikiing to the Wikibook. ~ Booya Bazooka 21:01, 1 June 2006 (UTC)
- I was surprised that this wasn't on the main article page - it's well-known as a very elegant (and yes, correct) implementation of quicksort. Haskell fans like to use it to tout the conciseness of the language (it really is a pretty precise declarative description of how a quicksort algorithm works). Now I see it was removed to the talk page here. Is someone going to restore it, or is there some consensus that haskell is too obscure (outside of academia) for this to add value? 69.17.15.214 20:53, 22 April 2007 (UTC)
- No way is it a correct implementation of Quicksort, because it is not in-place. This simplification allows it to avoid the tricky parts of Quicksort, which involve working from both ends of the array towards the middle. If you're not going to implement Quicksort in-place, you're better off using Merge Sort. See Quicksort in Haskell: things that amuse me for an accurate implementation of Quicksort in Haskell, and it is just as ugly as if it were any other language. cojoco (talk) 21:17, 15 March 2009 (UTC)
Animation
Would it be ok to replace the static image with something like this animation?
The file size is (even after some tradeoffs) 273kiB; I'm not quite sure if that is too large for the article (or if it still needs to be improved). --RolandH 22:12, 20 June 2006 (UTC)
- Cool animation, I like it, very well done. I'm not sure exactly where the best place in the article is for it; ideally everything that it illustrates should be relatively close to it in the text. Deco 23:58, 20 June 2006 (UTC)
- Thanks for the acknowledgement. Hm, I've had a go and tried inserting the new image at various places, but it either overlaps and clashes with the non-wrapping source code boxes or doesn't fit well near the beginning, especially when your image (which in one respect is better as it shows the algorithm workings in a less specific way -- not necessarily the in-place version of quicksort) is kept in place. I've also tried to simply replace your image, but even then the layout of the leading section is less balanced afterwards. --RolandH 11:23, 22 June 2006 (UTC)
- Update: I should note that although this image is cool, it shouldn't replace the static image, since we still need a static image for print. Deco 23:06, 9 October 2006 (UTC)
Something seems wrong with the second animation (the diamond points one), at least me: I'm assuming that the vertical offset it the criteria by which the array is sorted. So then if you look towards the end of the animation you have points that are really low and yet they somehow become the highest points int the array. Did anyone else notice this?
I think the animation is great. It should be nominated as the picture of the day. Is there any way to slow it down? Epachamo 01:11, 31 January 2007 (UTC)
saving stack space
this is from Chuck Bradley, bradley at tiac.net. In the section about saving stack space by deferring the larger of the two portions identified by the partition function, I think the +1 and -1 can be eliminated in the IF statement. The code as written does not seem to cause the wrong branch to be taken, but just makes the reader work harder. The +1 and -1 in the recursive calls are needed. If more clarity is worth a few more lines, consider something like this:
LEFTSIZE = Q - FIRST \ RIGHTSIZE = LAST - Q \ IF LEFTSIZE < RIGHTSIZE \ etc
Thanks for all the effort that went into the article.
- Thanks for your comments, Chuck. I liked your idea of introducing a left size and right size, but I prefer a computation that is based on the inclusive bounds (upper - lower + 1), since this aligns more closely with the recursive call arguments. Deco 23:05, 9 October 2006 (UTC)
IIterative version
Iterative version pseudocode is a mess with many bugs (I don't dislike pseudocode, I mean line 6: " { Pop(L, r)/2;" and others [where "L" is written instead of "1"]).
The initial version was a slightly pseudocoded Pascal source file which in its original compiled and worked (tested against less than two elements, all elements equal, already sorted, reverse sorted, nearly sorted, as well as random) which for an algorithm notorious for mistypes (< for <=, etc) is a comfort. However someone disliked Pascal and transliterated it to pseudo-C, randomly using L and l for no apparent reason (just typing with or without shift, presumably) and someone since has mangled the pseudosource. What "Pop(L,r)/2;" is meant to mean is obscure to me. NickyMcLean 20:15, 30 November 2006 (UTC)
Second animation
I removed the animation on the right from the page. First of all it seems wrong, and second of all - it doesn't help understanding quicksort in the least. Fresheneesz 20:58, 15 December 2006 (UTC)
Basic C implementation is incorrect
Greetings;
I have tested the basic implementation and it is not correct. Just copy the code and give the function a randomly generated number arrays and you will notice that some of them input disappear in favor of others.
Why?
Simply because during the partition section, we have this:
if(left != right){
numbers[left] = numbers[right]; //numbers[left] disappears here, it should be swapped using a tmp
left++;
}
same for the right side of it..
So correct code is this:
if(left != right){
tmp = numbers[left]; numbers[left] = numbers[right]; numbers[right] = tmp; //swapping
left++;
Indentation Style
The code indentations on this page are despicably inconsistent. pick one method and stick with it!
do you want do indent thus:
int function() { doStuffHere(); return 1; }
thus:
int function() { doStuffHere(); return 1; }
or thus:
int function() { doStuffHere(); return 1; }
I don't want a tabbing-style argument here, but do want consistency. —The preceding unsigned comment was added by 69.248.112.19 (talk) 18:55, 11 February 2007 (UTC).
Java??
What on earth is the Java pseudocode doing at the third code block? Not only it's entirely redundant, inconsistent, messy (check the spacing) and frustratingly hard to grasp, it also does a terrible job of getting the point across. Either change it to C (if it really needs to be there) or just remove it (much better). --124.120.189.96 13:07, 8 March 2007 (UTC)
Common Lisp
Hello, here is a simple implementation of the Quick Sort algorithm in Common Lisp [1]:
(defun qsort (lst) (when lst (let* ((x (car lst)) (xs (cdr lst)) (lt (loop for y in xs when (< y x) collect y)) (gte (loop for y in xs when (>= y x) collect y))) (append (qsort lt) (list x) (qsort gte)))))
Visual Basic Quicksort
I thought I would add some visual basic code to use quicksort to sort a database by two fields. It shows how to use recursive calls in VB and Microsoft Access 2000/2003. I used this algorithm to sort a random deck of cards with a number and suite. The basic idea is that we declare our pivot as the last item in the list (N). Then start with the first item and item N-1, switch them if they are on both sides of the pivot if the first item is greater than the pivot and the N-1st item is less than the pivot, then increase/decrease the two counters. If not, the increase/decrease the counters if the number is on the correct side of the pivot. Then run the program recursively between the beginning and the (pivot -1) and (pivot + 1) and the end. The recursion stops when the number of records is 2 or less.
Option Compare Database Global Array1array(100), Array2array(100) As String Global counter As Integer Public Function Quick_Sort_Array1s() Dim TempArray1, TempArray2 As String Dim I As Integer Dim rs As DAO.Recordset Set rs = CurrentDb().OpenRecordset("tbl_Cards_Random") DoCmd.SetWarnings False DoCmd.RunSQL "DELETE * from tbl_Cards_Sorted" counter = 1 rs.MoveFirst Do Until rs.EOF Array1array(counter) = rs("Card") Array2array(counter) = rs("Suite") counter = counter + 1 rs.MoveNext Loop counter = counter - 1 Call Quicksort(1, counter) I = 1 Do Until I > counter DoCmd.RunSQL "INSERT INTO tbl_Cards_Sorted (Card, Suite)" & _ "VALUES ('" & Array1array(I) & "','" & Array2array(I) & "');" I = I + 1 Loop rs.Close DoCmd.SetWarnings True MsgBox "File completed Quicksort" End Function Sub Quicksort(a As Integer, b As Integer) Dim NumRec, J, K, L As Integer Dim TempArray1, TempArray2, Flag As String NumRec = b - a + 1 If NumRec <= 1 Then ElseIf NumRec = 2 Then If (Array1array(a) > Array1array(b)) Or (Array1array(a) = Array1array(b) And _ Array2array(a) > Array2array(b)) Then TempArray1 = Array1array(a) TempArray2 = Array2array(a) Array1array(a) = Array1array(b) Array2array(a) = Array2array(b) Array1array(b) = TempArray1 Array2array(b) = TempArray2 End If ElseIf NumRec > 2 Then J = a K = b - 1 Do Until J >= K If ((Array1array(J) > Array1array(b)) Or (Array1array(J) = Array1array(b) And _ Array2array(J) > Array2array(b))) And ((Array1array(K) < Array1array(b)) _ Or (Array1array(K) = Array1array(b) And Array2array(K) < Array2array(b))) Then TempArray1 = Array1array(J) TempArray2 = Array2array(J) Array1array(J) = Array1array(K) Array2array(J) = Array2array(K) Array1array(K) = TempArray1 Array2array(K) = TempArray2 J = J + 1 K = K - 1 Else If (Array1array(J) < Array1array(b)) Or (Array1array(J) = Array1array(b) And _ Array2array(J) <= Array2array(b)) Then J = J + 1 End If If (Array1array(K) > Array1array(b)) Or (Array1array(K) = Array1array(b) And _ Array2array(K) >= Array2array(b)) Then K = K - 1 End If End If Loop L = a Flag = "N" Do While L <= b And Flag = "N" If (Array1array(L) > Array1array(b)) Or (Array1array(L) = Array1array(b) And _ Array2array(L) > Array2array(b)) Then TempArray1 = Array1array(b) TempArray2 = Array2array(b) Array1array(b) = Array1array(L) Array2array(b) = Array2array(L) Array1array(L) = TempArray1 Array2array(L) = TempArray2 Flag = "Y" End If L = L + 1 Loop L = L - 1 Call Quicksort(a, L - 1) Call Quicksort(L + 1, b) End If End Sub
--Trust101 02:35, 1 May 2007 (UTC)
Here is Wikibooks and the Quicksort code repository. NevilleDNZ 05:33, 3 May 2007 (UTC)
Termination
"The algorithm always terminates because it puts at least one element in its final place on each iteration (the loop invariant)." I don't think this is the reason. It terminates because in each recursive call, the quicksort calls itself to sort a smaller array, so all recursive call branches must sooner or later hit the base case of sorting an array of size 1.
- The two statements are roughly equivalent in intention; to say it puts an element "in its final place" is to imply that it's "done" with it and will not process it in recursive calls, so that the sublists are smaller. Your statement is clearer however. Dcoetzee 12:14, 31 May 2007 (UTC)
Partition function for in place sorting incorrect
The partition function for in place sorting is incorrect if the pivot chosen is such that it isthe largest value.
If you have an input of 2,5,9,3,6,2,4
If the third input (9) is chosen as pivot after the partition function it will be: - 2,5,4,3,6,9,2
New pivot value = 9
However the last element 2 which is smaller than the pivot occurs to the right of the pivot value.
The correct Partition function should be: -
function partition(array, left, right, pivotIndex)
pivotValue := array[pivotIndex] swap( array, pivotIndex, right) // Move pivot to end storeIndex := left for i from left to right-1 if array[i] <= pivotValue swap( array, storeIndex, i) storeIndex := storeIndex + 1 if array[right] <= array[storeIndex] swap( array, right, storeIndex) // Move pivot to its final place return storeIndex else return right
- no, thats not the case. You forget that, if i is right-1, and the expression "array[1] <= pivotValue" is true, storageIndex gets incremented, so that storageIndex and right are the same values, so nothing gets changed. (I implemented the algorithm as it was before in C, see http://theclaw.dnsalias.org/misc/qsort_wikipedia.c) - so maybe the algorithm before wasn't wrong. (thats really important! I really got confused about this new piece of code, so others may be as well ;).) --Theclawen 19:34, 4 July 2007 (UTC)
- I'll simply revert it until somebody finds an example where the old partition-algorithm doesn't work --Theclawen 19:43, 4 July 2007 (UTC)
Optimizations
why isn't there a section about the various optimizations to the standard Quicksort. median of three pivot selection is listed, but what about the versions that finish up with introsort as a speed optimization? or otheroptimizations that have been tried? —Preceding unsigned comment added by 137.229.48.43 (talk) 21:41, 22 October 2007 (UTC)
Problems with style
The wording in some sections of this article seems silly. For example "But what makes random pivots a good choice?" is asking the reader a question, which is unprofessional in technical writing and unbecoming of an encyclopedia. The question is rhetorical and therefore redundant. The author then starts to answer the question in the next paragraph with "Suppose we sort the list and then divide it into four parts". The worse thing here is the plural pronoun in the first person, "we". The author is trying to explain some details of the algorithm by politely using the Royal Pronoun, and that works fine in a lecture hall or a tutorial. I think there are enough tutorials on this subject that effort should be made not to turn this article into one.
There are more examples of this writing style through out the article. I decided to write in the discussion rather than edit it myself because it’s more than likely some one will disagree with me. —Preceding unsigned comment added by 203.206.178.34 (talk) 02:24, 24 October 2007 (UTC)
- I agree. Wikipedia is not a manual or guidebook, and we should have a neutral point of view. Using the first person "you" or "we" is not acceptable in this case. See WP:MOS#Avoid_first-person_pronouns. SmileToday☺(talk to me , My edits) 14:28, 27 October 2007 (UTC)
- No argument from me - first person writing is a bad habit of mine. Dcoetzee 07:51, 11 April 2008 (UTC)
I would like to point out that the wording is inaccurate : "Reorder the list so that all elements which are less than the pivot come before the pivot and so that all elements greater than the pivot come after it (equal values can go either way)". The method works on descending sorts too, where greater values come before the pivot and lesser after. The sort can go either way. —Preceding unsigned comment added by 72.224.30.108 (talk) 15:39, 27 October 2007 (UTC)
- This isn't really an inaccuracy - when reverse sorting, typically a reversed comparator is used, so that "greater" elements become "less"; relative to the order being used, the lesser elements always end up first, which is after all the definition of sorting. Dcoetzee 07:51, 11 April 2008 (UTC)
Pseudocode doesn't handle the pivot
Does "select a pivot" mean that it's taken out of the array to partition? If yes, it needs to be re-added in the concatenate step. If no, the pivot is always included in the less-than-or-equal partition rather than being put in place, so it will never terminate. Mikel Ward 00:18, 2 November 2007 (UTC)
- Perhaps this would be better.
function quicksort(array) if length(array) ≤ 1 return array var list lesser, greater var int sameAsPivot select a pivot value pivot from array for each x in array if x > pivot then add x to greater if x < pivot then add x to lesser else add 1 to sameAsPivot lesser = quicksort(lesser) for 0 up to sameAsPivot add pivot to lesser greater = quicksort(greater) var list sorted for each x in lesser add x to sorted for each x in greater add x to sorted return sorted
Link explaining Omega doesn't help
Under Quicksort#Version with in-place partition, it links to the article Big-O notation for an explanation on the meaning of omega. However, I didn't find any such explanation in that article. Can someone add a short explanation in this article, or link to someplace where it's explained? Thanks, Tamuz (Talk) 18:16, 11 November 2007 (UTC)
- It's there, under Big-O_notation#Other_related_notations. Kinda buried though - wouldn't hurt to have a short parenthetical explanation. Dcoetzee 07:47, 11 April 2008 (UTC)
No swap needed
i have found it in a paper about gotos by donald knuth (structured programming with goto statements). this implementation is supposed to be found by edgster dijkstra (mentioned in knuth paper). it basically does in-situ rotation
... A) preparation lp = adress of left boundary rp = address of right boundary pivot = *rp; /* copy this _right_ guy off, thus making him safe place to copy */ /* and choose him as a pivot */
B) 2 "beautiful loops" while(lp < rp) { /* until there is no space to move */ /* find incorectly placed from left , *lp >= pivot*/ for(; *lp<pivot; ++lp) ; *rp = *lp;/* move to safe place */ /* lp is now safe place */
/* find incorectly placed from right, *rp <= pivot */ for(; *rp>pivot; --rp) ; *lp = *rp;/* move to safe place */ /* rp is now safe place */
/* move ahead of corrected */ ++lp; --rp; }
C) *rp = pivot; ...
- --- i believe Takuya Murata found it also, see http://en.wikipedia.org/wiki/User:TakuyaMurata/Barcelo's_method .
- --- if you want any other to be the pivot just add this before beggining of A)
swapvalues(p_wanted_pivot, rp); /* makes value *p_wanted_pivot to be on _right_ guy */
- --- could this be added ? i believe, that with some image(s)/animation it can has bigger teaching potential.
In-place partition algorithm doesn't work
The in-place partition algorithm doesn't work. I tested it. Moreover, it is inconsistent with the diagram -- which shows values greater than pivot being swapped, while algorithm says only values less than pivot are swapped (which doesn't even seem to make sense). Furthermore, the diagram is not a good choice because it avoids the ambiguous cases. for example, the first two swaps have a value that is greater than pivot on the left, and smaller than pivot on the right. What happens if the right side value is greater than the pivot?
April 10, 08 —Preceding unsigned comment added by Yahastu (talk • contribs) 03:59, 11 April 2008 (UTC)
- It works and is correct, and is consistent with the diagram. Your implementation probably contained an error. Could you show it? The diagram doesn't "avoid ambiguous cases" - it shows all the swaps that occur, as swaps with a right side value greater than the pivot are not performed (this is the point of the "less than pivot" test in the code, it's examining the right-side value, not the left-side value). In short, it looks like you're misunderstanding things and I'd appreciate feedback on how to make it less confusing. Dcoetzee 07:44, 11 April 2008 (UTC)
Can we change the code to show that the for loop is to right-1, not right?
This line in the example code through me for a small loop (no pun intended):
for i from left to right // left ≤ i < right
Is there some reason this is shown as "right" instead of "right-1" ?
12.188.106.66 (talk) 22:50, 30 April 2008 (UTC)
- It used to say that. Somebody screwed it up, and somebody patched it instead of changing it back. I'll fix it... (*sigh*) Dcoetzee 23:20, 30 April 2008 (UTC)
Data Structure: Array
Why does the square on the right says that Quicksort's datatype is an array? It's perfectly possible to use quicksort for sorting arrays (only the implementation would be different). - João Jerónimo (talk) 03:52, 11 May 2008 (UTC)
- It's conventionally arrays, primarily because quicksort requires random access to make good pivot choices. That said, the box is vague and just putting "Data Structure" next to it doesn't explain anything. So I changed it to "Varies". Dcoetzee 08:17, 11 May 2008 (UTC)
Why QuickSort is (on average) faster than MergeSort and HeapSort?
They are both O(nlogn), and the worst case of quicksort is O(n^2), which is rather slow. Can anyone tell me why QuickSort is so fast in reality? Thanks! Visame (talk) 23:23, 18 June 2008 (UTC)
- Its good cache behaviour, mainly. The partition phase makes a single linear scan through the array, which makes good use of the cache (it only triggers a cache miss once every k reads and once every k writes, where k is the cache line size). Heapsort's accesses are essentially random; almost every one of them triggers a cache miss. Mergesort has good cache performance, but (in typical implementations) has more overhead, because it needs to allocate and fill auxiliary arrays, then copy the result back to the original array, whereas partition is in-place. On the other hand, mergesort generalizes to k-way mergesort which is much better for disk-based sorting, whereas quicksort does not generalize in this manner. There are more modern cache-aware and cache-oblivious sorting algorithms such as funnelsort that make even better use of the cache hierarchy than quicksort and perform better. Dcoetzee 00:47, 19 June 2008 (UTC)
Error in first pseudocode?
Imagine feeding an array like [4,4,4,4] -- what happens? Or better yet, [4,5,5,8,3]. —Preceding unsigned comment added by Atypic (talk • contribs) 19:34, 19 June 2008 (UTC)
- I think you're right. Elements equal to the pivot should be appended to one of the sub-arrays or they'll get lost.
if x ≤ pivot then append x to lessOrEqual
- R4p70r (talk) 20:33, 10 July 2008 (UTC)
- Just noticed this as well--this is a clear bug. I'm going to correct it. Also, the second if-statement is unnecessary; an else statement is faster.
- Jnoring (talk) 18:45, 16 July 2008 (UTC)
- That's exactly what it used to say a few edits ago. User:205.228.12.236 broke it on May 19th in this edit. It really drives me crazy how people continually break the pseudocode, when it was already correct years ago. Dcoetzee 18:03, 17 July 2008 (UTC)
- "Which anyone can edit" - and anyone does, usually with good faith. All the time. But alas, not always to positive benefit. Perhaps there should be some sort of protection scheme. There is, and the editing elite allowed to mess with this are not always well-regarded. Programming styles differ, and there is no consensus. The fault lies within the programming community. One might add a plaintive note to the pseudocode "Please don't mess with this" or similar, but it will forestall only the thoughtful, who are being worn down. The advantage of an actual textbook (such as Algorithms + Data Structures = Programmes) is that there is one author, one style, and whatever disputes there may have been during preparation, the resulting ink splotches on actual paper can't be fiddled with. The descriptions and examples of QuickSort versions there are as I recall both simple and clearly expressed.NickyMcLean (talk) 21:31, 17 July 2008 (UTC)
- I'm half-tempted to create a guideline advising no substantive pseudocode changes without prior review on the talk page, but that wouldn't be very wiki. I once put extensively verified code on the red-black tree article, and I have no idea whether it's accurate anymore. I think what it comes down to is that changes to code and other fragile artifacts will just have to be more regularly and strictly scrutinized. One good way to do this is by putting the code into a template, so that it has its own separate history and can be independently watched. I'm going to propose this at Wikipedia:Village pump (proposals) and would like to solicit your comments there. Dcoetzee 00:01, 18 July 2008 (UTC)
- For all the talk of the ill benefit of editing, the first psuedocode appears to be still wrong, in that it once again doesn't correctly handle the equals case correctly. The first psuedocode as it stands recurses infinitely for some arrays when you pick a pivot from the existing values in the array (such as
pivot=array[0]
, or[length(array)/2]
, because you will get a 'less' set that is 100% the same and try to quicksort that, eternally. (Credit to tstout for noticing this.) The simplest answer to fix the psuedocode appears to be a third list where same-as-pivot values are stored, and concatenating it in the middle of the return. Wiser heads than I should determine exactly how to fix this on-page, as I've never edited a Wikipedia page before. (Later edit: ...or make it clear that 'select a pivot value' means 'remove the selected element', which is not clear at all from the pseudocode.)--74.203.63.66 (talk) 02:08, 19 July 2008 (UTC)
- For all the talk of the ill benefit of editing, the first psuedocode appears to be still wrong, in that it once again doesn't correctly handle the equals case correctly. The first psuedocode as it stands recurses infinitely for some arrays when you pick a pivot from the existing values in the array (such as
- I'm half-tempted to create a guideline advising no substantive pseudocode changes without prior review on the talk page, but that wouldn't be very wiki. I once put extensively verified code on the red-black tree article, and I have no idea whether it's accurate anymore. I think what it comes down to is that changes to code and other fragile artifacts will just have to be more regularly and strictly scrutinized. One good way to do this is by putting the code into a template, so that it has its own separate history and can be independently watched. I'm going to propose this at Wikipedia:Village pump (proposals) and would like to solicit your comments there. Dcoetzee 00:01, 18 July 2008 (UTC)
here C# code for that
please add it. i would do that by myseld, but im sure no matter ill do, its won't be according to your rules. here it is, any way:
static class QuickSort { public static void Sort(int[] arr) { Sort(arr, 0, arr.Length - 1); }
private static void Sort(int[] arr, int start, int end) { if (start < end) { int pivoPosition = Seperate(arr, start, end); Sort(arr, start, pivoPosition - 1); Sort(arr, pivoPosition + 1, end); } }
private static int Seperate(int[] arr, int start, int end) { int pivo = arr[start]; int s = start + 1, e = end; while (s <= e) { if (arr[s] <= pivo) s++; else if (arr[e] > pivo) e--; else { int temp = arr[s]; arr[s] = arr[e]; arr[e] = temp; s++; e--; } } arr[start] = arr[e]; arr[e] = pivo; return e; } }
- Thanks for the effort, but we really don't need implementations in N different languages, the pseudo-code should suffice. That's why all the language-specific examples have been moved to the Wikibooks article. Oli Filth(talk|contribs) 18:52, 7 August 2008 (UTC)
Please tone down the claims made about "Median of Medians" pivot selection
I don't agree with the claim that using the quoted pseudomedian selection subroutine "[maintains] the asymptotically optimal run time complexity of Θ(nlogn) (by preventing worst case partitions)".
If all the elements in the input array are equal, then the running time is proportional to the square of the number of elements, because values equal to the pivot are all put in the same partition. Admittedly that is the fault of the partitioning subroutine outlined in the article and coded in the routine.
But, even when the partitioning subroutine handles duplicates well, I strongly doubt the claim that selecting pseudomedians results in Θ(nlogn) running time, in the worst case.
At each level of selection, the number of inputs increases by a factor of 5, but the lower bound on the final rank of the selected value increases by a factor of only 3. For a quicksort with a good partitioning subroutine, using the quoted pseduomedian selection routine to choose its pivots, I would expect a worst case complexity (in terms of comparison count) somewhat closer to
n ^ ( 2 - log(3)/log(5) )
and although the shuffling of the elements that takes place in the selection routine might well lower the worst-case complexity somewhat, I would be very surprised indeed by a proof that the worst case was Θ(nlogn). James Barbetti (talk) 10:25, 3 September 2008 (UTC)
- No, see the article selection algorithm. The algorithm does not find a pseudomedian, it finds the actual median; the proof is nontrivial and was the subject of a publication. Duplicates are an issue but the partitioning algorithm could be modified to deal with this, or we could assume there are a constant number of each list value. But this is an impractically slow algorithm in practice and I'm skeptical that we even need code to demonstrate it. Dcoetzee 18:02, 26 September 2008 (UTC)
- I was not talking about the "median of medians" algorithm outlined in the "Selection Algorithm" article (and yes, the algorithm in that other article is a linear method for finding the median, as shown by Rivest and Tarjan in 1973). I was talking about the subroutine actually quoted in the quicksort article - and I said so. The selection subroutine quoted in the quicksort article finds a pseudomedian (the comments next to it even say so). As shown by the following (C++ code):
- int notthetruemedian()
- {
- int n = 10000;
- int *x = new int[n];
- for (int i = 0; i < n ; ++i)
- {
- x[i] = (i % 10)/2;
- }
- int answer = findMedianOfMedians(x, 0, n-1); /* n/5 copies of: 0,1,2,3,4... so correct answer (median value) should be *2*, shouldn't it? */
- delete [] x;
- return answer; /*but it won't be. It'll be either 1 or 3, if n is a multiple of 10. */
- }
- James Barbetti (talk) 12:41, 6 November 2008 (UTC)
- Oh, I agree about that. I have previously called for the removal of that code, which I believe is incorrect and irrelevant. Dcoetzee 02:56, 31 January 2009 (UTC)
Stable or Unstable?
Near the top of the page it indicates that most efficient implementations are not stable, but later on in the discussion it calls the algorithm stable. Please clarify this, as it's very confusing to readers. —Preceding unsigned comment added by 24.86.192.195 (talk) 17:46, 8 September 2008 (UTC)
- No, the algorithm is not stable, and is not said to be stable. The only place, where you find the stability claim is this comment
- This version is also a stable sort (considering that the "for each" method retrieves elements in original order, and the pivot selected is the last among those of equal value).
- to the explicit list implementation of the quicksort. Please note, however, that the claim bases explicitly on two important assumptions I underlined above, and implicitly on one even more important, that data are given in lists. Lists are dynamic structures to keep ordered data, that allow data to be added at the beginning or appended at the end or inserted somewhere inside a list—with no change to the rest of the structure (see List (computing) article).
- They also allow concatenating two or more lists into one list in a simple step, that is without copying all the list's items from one place to another, which is crucial for effectiveness of this algorithm — see the expression in the
return
instruction. (This comment added 05:24, 2 October 2008 (UTC).) - To offer such flexibility list is usually implemented as unsorted array of data with additional array of indices, that keeps the data order, or as a dynamically linked list, which is a set of data randomly put in the computer memory with accompanying 'pointers' that link each item with the next (and possibly with the previous) item on the list. Either way the presence of those additional indices or pointers goes beyond requirements of the original quicksort, which was designed for arrays.
- So, if you have a 'list' of data, you can easily implement a stable quicksort for it; however the original quicksort, an algorithm to sort arrays of data with no additional ordering indices or links is UNSTABLE.
- CiaPan (talk) 14:03, 18 September 2008 (UTC)
- The important thing here isn't the data structure - it's trivial to implement an out-of-place stable quicksort for arrays, by using a stable out-of-place partition operation (just count the number of elements in each partition, and keep two pointers into the output array while copying elements first from the left partition, then the right partition). Typically, in-place implementations are unstable, while out-of-place implementations are stable - as far as I know there is no in-place stable quicksort algorithm. The article used to say something like this. Dcoetzee 07:31, 2 October 2008 (UTC)
- Also, incidentally, I think the stable version of quicksort is much easier to describe using the version with the equal partition, to avoid the caveat about the pivot choice. Dcoetzee 07:33, 2 October 2008 (UTC)
Parallel Quicksort gives Linear speedup?
- The page currently claims that quicksort gives a linear speedup and also claims that given n processors, sorting can be done in n time. This is self contradictory. If it was a linear speedup, sorting could be accomplished in log n time with n processors. The article goes on to say that this is in fact possible. Basically, I think the claim that quicksort gives a linear speedup is confusing and should either be moved to the discussion of David Powers's paper or simply removed. —Preceding unsigned comment added by 70.41.37.167 (talk) 19:28, 4 October 2008 (UTC)
- I added the original claim, and it's wrong. :-P I wasn't counting preprocessing, which is dishonest. Feel free to remove it. Dcoetzee 02:55, 31 January 2009 (UTC)
Correctness
The correctness of the overall algorithm follows from inductive reasoning: for zero or one element, the algorithm leaves the data unchanged; for a larger data set it produces the concatenation of two parts, elements less than or equal to the pivot and elements greater than it, themselves sorted by the recursive hypothesis.
- I don't think this statement is right. I could apply this argument to any algorithm which leaves one element unchanged and claim by induction my algorithm is correct. You must prove the algorithm actually sorts at some stage. —Preceding unsigned comment added by 86.24.195.98 (talk) 16:24, 21 October 2008 (UTC)
But it does! The sorting property follows from these words:
- 'concatenation of two parts, elements less than or equal to the pivot and elements greater than it.'
This implies sorting of two intervals of elements' values: the current pivot separates them. Recursively pivots chosen in following steps will separate sub-intervals, eventually sorting the whole set. --CiaPan (talk) 06:26, 22 October 2008 (UTC)
In-place operation is essential to Quicksort
Although the discussion and analysis of Quicksort is all very well, I think that it is important to emphasize that, without an in-place implementation, Quicksort is practically no better than any other sort. To state this another way, there is no point implementing a quicksort algorithm over any other sort unless it is in place. In fact, Quicksort is theoretically worse than other algorithms, as it has a worst case time of . It is only used in practice because of its low memory usage, (no other arrays are needed), and faster constant performance (data copying is not necessary). The current article shows several extremely simple non-in-place implementations which miss the reason for the complexity of the in-place version cojoco (talk) 23:18, 15 March 2009 (UTC)
- Yes, these versions are not really a good idea for implementation - they're for exposition of concepts. This could use some explanation. Dcoetzee 21:46, 21 April 2009 (UTC)
the C swap function is a misguided hack.
Here are two C implementations of swap
void swap(int * array, int a, int b){
array[a] -= array[b];
array[b] += array[a];
array[a] = array[b] - array[a];
}
void swap2(int * array, int a, int b){
int tmp = array[a];
array[a] = array[b];
array[b] = tmp;
}
As you can see the swap2 which uses a temporary variable produces less assembly.
It also writes out to main memory less often which can result in improved performance.
This result was produced using gcc 4.01 with
cc swap.c -save-temps -03;
.globl _swap
_swap:
pushl %ebp
movl %esp, %ebp
subl $8, %esp
movl 12(%ebp), %eax
sall $2, %eax
addl 8(%ebp), %eax
movl (%eax), %edx
movl 16(%ebp), %eax
sall $2, %eax
addl 8(%ebp), %eax
movl (%eax), %eax
subl %eax, %edx
movl 12(%ebp), %eax
sall $2, %eax
addl 8(%ebp), %eax
movl %edx, (%eax)
movl 16(%ebp), %eax
sall $2, %eax
addl 8(%ebp), %eax
movl (%eax), %edx
movl 12(%ebp), %eax
sall $2, %eax
addl 8(%ebp), %eax
movl (%eax), %eax
addl %eax, %edx
movl 16(%ebp), %eax
sall $2, %eax
addl 8(%ebp), %eax
movl %edx, (%eax)
movl 16(%ebp), %eax
sall $2, %eax
addl 8(%ebp), %eax
movl (%eax), %edx
movl 12(%ebp), %eax
sall $2, %eax
addl 8(%ebp), %eax
movl (%eax), %eax
subl %eax, %edx
movl 12(%ebp), %eax
sall $2, %eax
addl 8(%ebp), %eax
movl %edx, (%eax)
leave
ret
.globl _swap2
_swap2:
pushl %ebp
movl %esp, %ebp
subl $24, %esp
movl 12(%ebp), %eax
sall $2, %eax
addl 8(%ebp), %eax
movl (%eax), %eax
movl %eax, -12(%ebp)
movl 16(%ebp), %eax
sall $2, %eax
addl 8(%ebp), %eax
movl (%eax), %edx
movl 12(%ebp), %eax
sall $2, %eax
addl 8(%ebp), %eax
movl %edx, (%eax)
movl 16(%ebp), %eax
sall $2, %eax
movl %eax, %edx
addl 8(%ebp), %edx
movl -12(%ebp), %eax
movl %eax, (%edx)
leave
ret
- There's no need to go into performance details - the "optimization" of swap should not have been made because the code is for exposition and it makes it less clear. User:147.251.208.254 made this change about a month ago and it slipped under the radar. Dcoetzee 21:48, 21 April 2009 (UTC)
But the Raison d'être for the "optimization" has been removed and has shown to only be obfuscatory. - Anon —Preceding unsigned comment added by 71.233.237.153 (talk) 05:08, 22 April 2009 (UTC)
Dispute on the history?
The "History" section says:
The quicksort algorithm was developed by C. A. R. Hoare in 1962 while in the Soviet Union
However, this contradicts Hoare's own words,
Around Easter 1961, a course on ALGOL 60 was offered in Brighton, England [...] It was there that I wrote the procedure, immodestly named QUICKSORT
— C.A.R. Hoare, The Emperor’s Old Clothes
Perhaps Hoare was misremembering? Its impossible for me to evaluate since the 1963 claim is made without attrribution. I just wanted to raise the topic. — Adam Di Carlo (talk) 19:43, 7 June 2009 (UTC)
updated the C code
i added a little more code to the C example.
i felt that the original example provided on the wiki was elegant and good for explanatory purposes, however, i felt it would be prudent to include a C example that could be copy-pasted into a source file.
i added an optimization to the original code's partitioning logic because i noticed it copied a ton of elements unnecessarily.
i also pulled a variable out of the if() block and made it static to save some memory.
using the first element as the pivot is a clever, elegant solution to the pivot selection problem; however, i wanted demonstrate that this solution could be adapted to an arbitrarily selected pivot very easily. hence, i used a bit of 'forbidden' preprocessor magic to keep the code usable while still conveying this point.
hope my edits are okay with everyone and thank you all for this page, it's very informative!—Preceding unsigned comment added by Bayard (talk • contribs) 13:39, 7 March 2005 (UTC)
i cleaned up the C++ code (made the pivot more clear) and added an optimization from the C code.—Preceding unsigned comment added by Bayard (talk • contribs) 00:45, 8 March 2005 (UTC)
In-place partition
This article states that "The disadvantage of the simple version above is that it requires Ω(n) extra storage space". This comes immediately after the actual C code in the previous section. But, that C code is in-place. What is not in-place is the pseudocode in the previous section (that appends each of the n items to be partitioned into one of three list objects). I found this confusing; the C code seems misplaced because it really doesn't match the pseudocode in light of how the text immediately following it makes it sound like it is not in-place.—Preceding unsigned comment added by 69.170.132.57 (talk) 22:09, 21 October 2006 (UTC)
C code minor change
I've moved the q_sort() function above the function quickSort(), thus making it a bit more C confirmant (q_sort() calls quickSort() , and the previous snippet wouldn't compile without a previous definition (which is not presented) of quickSort() )—Preceding unsigned comment added by Stdazi (talk • contribs) 11:38, 5 January 2007 (UTC)
Minor C code optimization:
if (left < pivot) q_sort(numbers, left, pivot-1); if (right > pivot) q_sort(numbers, pivot+1, right);
should be
if (left < pivot - 1) q_sort(numbers, left, pivot-1); if (right > pivot + 1) q_sort(numbers, pivot+1, right);
no need to run q_sort again for single items.—Preceding unsigned comment added by Stafil (talk • contribs) 19:47, 21 January 2007 (UTC)
for i from left to right // left ≤ i < right -1
I tried using the loop for i from left to right // left ≤ i < right -1 and my array was not sorted properly, it looked like only part of the array was sorted. I took out the - 1 part then my array sorted perfectly. It looks like if you use - 1 the loop process all the way to the element 2 before the pivot. I noticed there was a discusion that for i from left to right // left ≤ i < right is the way the article used to be and it through the writer of that discussion for a loop. The same thing happened to me with the - 1 in loop condition. —Preceding unsigned comment added by 132.228.195.207 (talk) 16:23, 13 October 2009 (UTC)
- I've added a comment to the pseudo code ("// left ≤ i < right") about this. This is an unfortunate side effect of pseudo-code: ambiguity. Rami R 13:04, 14 October 2009 (UTC)
Linear time removal of duplicates
- Assume that there are no duplicates as duplicates could be handled with linear time pre- and post-processing, or considered cases easier than the analyzed.
How can you remove the duplicates in pre processing in linear time? Best algorithm I can think of is to sort the array first, then look for runs. That's of course O(n log n). If there is such an algorithm, it'd be worthy of it's own article and linking to it from this one perhaps? Themania (talk) 00:31, 6 November 2009 (UTC)
- No, there is no linear time duplicate removal on an unsorted array. This statement is vague, but it is quite correct that duplicates do not substantially affect the analysis. This may be less trivial than suggested here. Dcoetzee 10:05, 6 November 2009 (UTC)
Python
There seems to be a new move in Wikipedia to remove all real language examples of algorithms. I think that this is an enormous mistake, since pseudocode examples necessarily have a computational model behind them. Most of them in fact resemble Pascal. Having different models of computation is important for understanding. Examples of algorithms using procedural as well as functional paradigms (for example) are highly instructive.
But, alas, it is what it is. People with more time to argue than I do will perform the edits that correspond to their biases.
def QuickSort(l):
if l == []:
return [] # Empty lists are sorted by definition
left = [] # Less than the split element
right = [] # Greater than or equal to the split element
split = l.pop()
middle = [split] # The split element is in the middle
for item in l:
if item < split:
left.append(item)
else:
right.append(item)
return QuickSort(left) + middle + QuickSort(right)
—Preceding unsigned comment added by 76.193.116.39 (talk) 16:45, 24 October 2009 (UTC)
- I've removed the python example. It doesn't add anything beyond what is given by the pseudo-code, and the claim that it represents a different approach seems like original research (not to mention off-topic: what does the claim really have to do with quicksort?). Rami R 08:56, 10 February 2010 (UTC)
- Language specific examples do help to understand an algorithm. Examples help people to run the algorithm in a debugger and trace what's going on. In fact, I am pretty sure, that most people who come to this page are looking for a version of quicksort in their language of choice. --this unsigned comment was added 26 April 2010 by 66.194.72.10 (talk)
Partial qsort
Something that might be worth covering is the possibility of modifying quicksort to only sort part of a list. The change to the algorithm is trivial:
quicksort(array, left, pivotNewIndex - 1) quicksort(array, pivotNewIndex + 1, right)
Becomes:
if (pivotNewIndex > offset) quicksort(array, left, pivotNewIndex - 1, offset, limit) if (pivotNewIndex < offset + limit) quicksort(array, pivotNewIndex + 1, right, offset, limit)
Quicksort can now be instructed to avoid fully sorting most of the list. This yields impressive speedups in cases where only part of the list is needed; e.g. for paginating a large list of elements.
The second part of this is described in this note, and also this MSDN article, but I'm finding it difficult to get an explicit mention of the fully generalized partial qsort.
Fweeky (talk) 16:05, 30 January 2009 (UTC)
- This is already mentioned briefly in Selection algorithm, but is certainly worth a mention here. Perhaps we ought to have a whole article on partial sorting. Dcoetzee 02:53, 31 January 2009 (UTC)
a lazy sort is a partial sort —Preceding unsigned comment added by 168.251.194.19 (talk) 22:34, 20 January 2010 (UTC)
Correctness of the partition function?
Hello,
can somebody confirm the partition function is actually correct? When I look at the CLRS version and turn it into actual code it works, if I do the same with the Wikipedia version it doesn't.
// CLRS version in pseudo code function partition(A, p, r) x = A[r] i = p - 1 for j = p to r - 1 if A[j] ≤ x i = i + 1 exchange A[i] with A[j] exchange A[i + 1] with A[r] return i + 1
// CLRS version in Java public static int partition(int[] A, int p, int r) { int x = A[r]; int i = p - 1; for (int j = p; j < r; j++) { if (A[j] <= x) { i = i + 1; exchange(A, i, j); } } exchange(A, i + 1, r); return i + 1; }
// Wikipedia version in Java public static int partition(int[] array, int left, int right, int pivotIndex) { int pivorValue = array[pivorIndex]; swap(array, pivotIndex, right); int storeIndex = left; for (int i = left; i < right; i++) { if (array[i] <= pivotValue) { swap (array, i, storeIndex); storeIndex = storeIndex + 1; } } swap(array, storeIndex, right); return storeIndex; }
Results:
Input: [86, -28, -15, 70, -115] Wiki-Output: [86, -28, -15, -115, 70] CLRS-Output: [-115, -28, -15, 70, 86]
—Preceding unsigned comment added by 24.215.230.83 (talk) 17:35, 31 December 2009 (UTC)
- Wikipedia's code is incorrect
- The in-place quicksort code currently on wikipedia is incorrect. Consider what happens when you try to sort [1,3,2]. (Your result will still be [1,3,2].)
- Slythfox (talk) 07:25, 21 April 2010 (UTC)
- The entry has been split by CiaPan (talk) to discuss both parts separately.
- Really? Where is the error?
- Let's see how it works. Assume the array is indexed from 1 to 3. We start with:
quicksort( [1,3,2], 1, 3)
Then
pivotIndex := 1 + (3-1)/2 = 2
and we call:
partition( [1,3,2], 1, 3, 2)
pivotValue := array[2] = 3
and the first swap makes
array := [1,2,3] - The loop scans array items from left = 1 to right-1 = 2. They are both less than the pivotValue, so each of them gets swapped with itself (storeIndex is incremented along with i, the loop control variable), thus the whole loop changes nothing in the array. However as storeIndex gets incremented on every pass of the loop, it is finally equal 3. Then the last swap swaps the third item of the array with itself (storeIndex=3 and right=3).
The first partition call leaves the array sorted and returns storeIndex equal 3. - After return to the quicksort the pivotNewIndex is set to the returned value of 3, then the recursive call of quicksort( [1,2,..], 1, 2) swaps twice elements no.1 and 2, so it effectively doesn't change the array.
- All remaining calls handle one-item or empty subarrays, so they do not affect the ordering. Finally the array gets sorted.
- Can you show, how you got the final order of [1,3,2]? --CiaPan (talk) 08:46, 21 April 2010 (UTC)
- Notice how the CLRS code initially sets the storeIndex (i in CLRS code) to right - 1 instead of right. It also does the storeIndex (i in CLRS code) before swapping. And lastly, CLRS and returns swaps storeIndex + 1 instead of storeIndex.
- Slythfox (talk) 07:25, 21 April 2010 (UTC)
- You don't read carefully enough. Both samples are equivalent. In fact they are almost the same. The only two differences are
- the CLRS code assumes the rightmost item of the subarray being the pivot, while the wikipedia code allows the caller function to specify the pivot's index, and
- the CLRS keeps its i less by 1 than Wiki's storeIndex — it initializes the variable with p–1 and increments before each swap in the loop (so, after the loop terminates, the variable is too small by 1, so we must add 1 to it before the final swap and before returning its value), while Wiki's code initializes the corresponding variable with the left value and increments it after each swap in a loop, so after the loop it is immediately ready for the final swap and return.
- These differences affect the code efficiency but have no meaning for the algorithm correctness — although they're written different, on same input both routines handle same data in the same way (if we simplify the wiki code to always use pivotIndex := right) and consequently both routines give same results. That means both are correct (or none of them). --CiaPan (talk) 20:58, 21 April 2010 (UTC)
- Goodness, I never considered #1. I implemented both the CLRS partition code and the wiki's partition code with some (but clearly not enough) consideration to how the storeIndex value was used when it came to recursion for each.
- Slythfox (talk) 06:29, 22 April 2010 (UTC)
- You don't read carefully enough. Both samples are equivalent. In fact they are almost the same. The only two differences are
Images
Could someone please consider using this? http://corte.si/posts/code/visualisingsorting/index.html 84.182.107.93 (talk) 16:54, 9 February 2010 (UTC)
Parallelization
I think the parallelization section could use some more details. One thing that seems to be missing is how you'd combine the results from the p sublists. I'm guessing you'd do something similar to merge sort, but that doesn't seem right, because in the case where p = n, you'd end up with a linear insertion sort, which runs in O(n^2), which is actually slower than the standard non-parallel quicksort. Anyway, more details would be appreciated. Danielx (talk) 02:56, 28 April 2010 (UTC)
- There's no 'combine'—once the array got partitioned it has a structure of
[ items less than or equal to X | item X | items greater than X ]
where X is the pivot. At that point the pivot is in its final position and no item from the left subarray needs moving to the right of the pivot and no item from the right subarray needs moving to the left of it. So the two subarrays, although not sorted yet, are combined already. No further 'combining' is needed, just sorting subarrays. --CiaPan (talk) 20:20, 28 April 2010 (UTC)
left + (right-left)/2
I just restored this code and added a small explanation of why it's needed. I know I've seen this problem described, I think by some Google blog entry on algorithms, but I don't remember where. I want to add "this is a somewhat theoretical problem, since it only arises when you are binary-searching an array with at least half the number of items in it that your integer type can represent, which means either that your integers are much shorter than your pointers; or your array doesn't actually fit in your RAM, in which case you're probably encountering disk seek latencies and should therefore use a B+-tree instead, you douchebag; or your array consists of single bytes, in which case you should probably encode it as a 256-word table instead of gigabytes of "aaaaaa"." But I think that would be "original research".
It may be that the paragraph I added would be better somewhere else.
Kragen Javier Sitaker (talk) 17:48, 28 August 2010 (UTC)
- "Note the left + (right-left)/2 expression. (left + right)/2 would seem to be adequate, but in the presence of overflow, [blah]"
- How would (left + right)/2 seem adequate? BS. That would only work if left was zero, in which case, it equals right/2, which is the same as left + (right-left)/2 (for left = 0). True, right/2 might look correct at first. But since left is not always zero, neither this nor (left + right)/2 works. Additionally, imagining the latter to be adequate is absurd if you have any knowledge of arrays and indices. --84.62.36.217 (talk) 15:15, 27 September 2010 (UTC)
- (left + right)/2 is absolutely correct in context. If we neglect integer overflow then left + (right-left)/2 = left/2 + right/2 = (left + right)/2 which are both obviously the middle of the array being sorted. I suppose if you for some reason suppose that / doesn't do integer division you might have problems, but otherwise I can't see your issue. Perhaps if you explained the issue and a suggestion for how to fix it without spewing vitriol we might improve the article. bou·le·var·dier (talk) 15:50, 27 September 2010 (UTC)
- This is probably the link you were looking for: googleresearch.blogspot.com/2006/06/extra-extra-read-all-about-it-nearly.html 67.162.90.113 (talk) 20:21, 2 November 2010 (UTC)
Simple Pseudocode Error?
The concatenation appears to put the lesser table, the pivot record and the greater table together. However, just before it, the pivot record is included in the lesser table. That yields a double record for the pivot, no? —Preceding unsigned comment added by 75.27.121.68 (talk) 18:15, 20 September 2010 (UTC)
- Do you mean the part which says
select and remove a pivot value pivot from array
...? --CiaPan (talk) 06:14, 24 September 2010 (UTC)
- Right, this is something that was right until it was changed in August. I changed it back a few days ago. As CiaPan mentions, the original pivot is removed from the array when it's selected. The equals case ensures that duplicates of the pivot element in the array are still sorted - if the equals case is not placed into either the lesser or greater sets, the duplicates of the pivot element disappear from the sorted array. This is something that was alluded to in the text before the algorithm but had been changed in the algorithm itself. bou·le·var·dier (talk) 08:03, 27 September 2010 (UTC)
Avoiding Unnecessary Swaps
Regarding the in-place partition function..
Question 1: Should "if array[i] ≤ pivotValue" be "if array[i] < pivotValue" instead? Why swap equal elements -- that's pointless. After all, if elements equal to pivotValue stay to the right of the pivot - that's still a valid partitioning result.
Question 2: Should "swap array[i] and array[storeIndex]" be "if (i != storeIndex) swap array[i] and array[storeIndex]" -- why swap with itself? That's even more pointless. —Preceding unsigned comment added by 131.107.0.98 (talk) 15:42, 25 October 2010 (UTC)
- Answer 1: that doesn't matter, actually. Usually we don't have many equal items in the array, so skipping equal items gives no significant gain.
Generally the quicksort works quite poorly on arrays with lots of equal items. Those items lead to unbalanced partitioning results (often it is 'one and the rest') which causes a rapid growth of recursion depth. If you need to sort such arrays and want to use quicksort for it, consider some more sophisticated partitioning routine, which would give three sub-arrays: left one with items less than the pivot, central part with (possibly many) items equal to the pivot, and the right one with items greater than the pivot. - Answer 2: That is even more pointless. In optimistic case, partitioning routine would split the array into halves, which means approx. half of items are bigger than the pivot. So it's 1:2 chance the first item will get 'swapped' with itself, same 1:2 for the second one and so on... The expected number of self-swapped items for this case is 2.
For less optimistic case it might be bigger, anyway it is still a small number. Your change inserts additional code which is executed for every item swapped during partitioning to avoid a few self-swaps. That will effectively make the routine slower.
Your modification would bring some important speed-up if your input data are already sorted or 'almost' sorted. But in that case you can use other algorithms, which work better on such data than a very general quicksort. - CiaPan (talk) 21:54, 29 October 2010 (UTC)
Follow-up 1: So, changing from '<=' to '<' doesn't have any effect if all elements are different (obviously), and improves performance in presence of equal elements.. Why is it not a change worth making? Isn't most of quicksort analysis focused on how to improve its performance in the worst case? It seems to me that the argument for "Oh, quicksort isn't good for XYZ, so it's not worth improving it for XYZ" is like throwing out the baby with the bathwater. This is an optimization with 0 cost, but with benefits. Why does it not matter?
Follow-up 2: storeIndex would be == i, when the array is already partitioned/sorted appropriately. So, again, this is about improving the worst-case performance, avoiding a large number of pointless swaps. It all comes down to the cost of a CPU-register compare operation vs. a Swap (two memory reads&writes), which may be orders of magnitude apart. One would have to run some measurements to know, but dismissing this optimization outright is premature.
As a general comment, choosing a more appropriate sorting algorithm based on some euristics, is a different discussion entirely. —Preceding unsigned comment added by 67.243.31.51 (talk) 19:58, 31 October 2010 (UTC)
- Answer 1
Yes, strictly speaking, you're right in that the change makes some improvement. What you miss is the fact, that the improvement is negligible compared to the huge loss of efficiency the routine suffers from equal elements. You want to consider the worst case – well, here it is: the worst case for Quicksort is an array of equal elements. In that case the routine performs one initial swap, then scans the given interval to be partitioned, and finally makes another swap. Your modification saves n−1 swaps, but can't help for necessary n−1 comparisons nor for the resulting one-to-(n−1) partitioning result. As a result the total cost in the worst case drops from to , anyway it is still . That's why I said it gives no significant gain. - Answer 2
If you consider things on the CPU internals level, you should also take the jumps prediction into account — additional conditional instruction makes another point where the CPU instruction pipeline may temporarily stop for re-filling; so the if is not just a register operation, but also a possible waste of time.
Another efficiency factor is a CPU cache: once a relatively small portion of data (an element of the array) has been read for comparison, it is already in the cache; swapping it with itself means overwriting it twice with the same data (for copying from A[storeIndex] to A[i] and for copying from A[i] to A[storeIndex]). Those writes do not change the memory contents, so they might well execute entirely in the cache; actually nothing would be written back to the main memory. The same applies to the 'temp' variable used during swap (provided that the cache works in the 'copy-back' mode and that 'cache misses' do not conflict with that particular location too often, which would force flushing it to the main memory). As a result you add instructions, which must be read and executed, in effort to reduce non–existent costs. - CiaPan (talk) 22:32, 2 November 2010 (UTC) (and 19:56, 4 November 2010 (UTC))
- Since we're dealing with detail, I dispute your supposition of nothing being written back to main memory for the equivalent of A[i]:=A[j]; when i = j. When something is to be stored in the location corresponding to A[i] the buffering scheme will receive the value, and mark the location as "changed" so as later to write from the buffer to the main memory, etc. For this to not happen, the buffer/cache controller would have to be checking that the new value for A[i]'s location is different from the existing value, and only set "changed" if so. This is possible, but I doubt that it is done. For every write to memory. Computer hardware/firmware does not embody transcendent understanding, to wave hands at in awe. Similarly, how is the expression compiled? If say to a single instruction, Move x,y where x and y are addresses (or the names of registers, or the names of registers containing the addresses of the data to be moved) it is possible that within the microcode for the Move opcode a no-action would ensue if x = y. But the code may be more complex, as when an element of A is not a computer word.
- Otherwise, idiot swaps take place. But code to check for and evade this effort on every swap is itself extra effort. It is better to evade these timewasters at a higher level... NickyMcLean (talk) 20:38, 4 November 2010 (UTC)
Their partition algorithm does not work on all data sets
The comparison "<" should probably be replaced by a "<=", but even that does not fix it.
In particular, if the set is: [ 3, 5, 8, 5, 2, 1, 9, 7, 4]
Then the 5 will incorrectly remain in index 1.
Does anyone have a correct algorithm which correctly accounts for duplicates of the pivot? —Preceding unsigned comment added by 75.85.57.215 (talk) 18:58, 9 March 2010 (UTC)
- I'm not sure which comparison you are referring to. "<" doesn't exist in any of the pseudo-codes, except for one comment. An "imaginary" run seemed to sort the array correctly. Rami R 14:55, 10 March 2010 (UTC)
- no, it doesn't. [9,8,5,7,6,5,4,5,3,2,1] with pivot =5, index = 5, results in [5,1,4,5,3,2,5,7,6,8,9]. —Preceding unsigned comment added by 95.65.28.117 (talk) 21:41, 11 March 2010 (UTC)
- And this is a correct result of partitioning: the array has been reordered so that it contains the pivot (5) at some position (index 6) and all items less than or equal to the pivot in left subarray (indices 0 through 5) and all items greater than the pivot to the right from it (indices 7 through 10). Perfect partitioning.
You probably forgot, that the single partitioning does not complete sorting. To sort the array you need now to partition recursively both subarrays [5,1,4,5,3,2] and [7,6,8,9], then the resulting sub-sub-arrays, and so on until you step down to single-item parts. Then the task will be done. --CiaPan (talk) 21:41, 21 April 2010 (UTC)
- And this is a correct result of partitioning: the array has been reordered so that it contains the pivot (5) at some position (index 6) and all items less than or equal to the pivot in left subarray (indices 0 through 5) and all items greater than the pivot to the right from it (indices 7 through 10). Perfect partitioning.
- no, it doesn't. [9,8,5,7,6,5,4,5,3,2,1] with pivot =5, index = 5, results in [5,1,4,5,3,2,5,7,6,8,9]. —Preceding unsigned comment added by 95.65.28.117 (talk) 21:41, 11 March 2010 (UTC)
Okay, I've implemented the pseudo-code in Java:
public class Quicksort{
public static void swap(int[] array, int i, int j){
int temp=array[i];
array[i]=array[j];
array[j]=temp;
}
public static int partition(int[] array, int left, int right,
int pivotIndex){
int pivotValue = array[pivotIndex];
swap(array,pivotIndex,right); // Move pivot to end
int storeIndex = left;
for(int i=left;i<right;i++) // left ? i < right
if(array[i]<=pivotValue){
swap(array,i,storeIndex);
storeIndex++;
}
swap(array,storeIndex,right); // Move pivot to its final place
return storeIndex;
}
public static void quicksort(int[] array, int left, int right){
if(right > left){
int pivotIndex = (left+right)/2;
int pivotNewIndex = partition(array, left, right, pivotIndex);
quicksort(array, left, pivotNewIndex - 1);
quicksort(array, pivotNewIndex + 1, right);
}
}
public static void main(String[] args){
int[] array={3, 5, 8, 5, 2, 1, 9, 7, 4};
quicksort(array,0,array.length-1);
for(int i=0;i<array.length;i++)
System.out.print(" "+array[i]);
System.out.print("\n");
}
}
This code is no different than the pseudo-code, and it will correctly sort the array. You've probably implemented the pseudo-code wrong. Rami R 08:49, 12 March 2010 (UTC)
This partition function does not work correctly when pivotIndex is set to zero. --this unsigned comment was added 25 April 2010 by 67.183.108.57 (talk)
- Can you give an example? Rami R 10:17, 27 April 2010 (UTC)
--
I agree with the original poster of this thread, the algorithm as written does not correctly handle duplicate pivot elements.
I have written a simple working version in C++ that showed up a few issues in the algorithm as written.
A couple of points:
1. Notably, 'storeIndex' can be incremented past the end of the segment, in which case the final swap will do the wrong thing. This must be checked for.
2. The pivot element cannot be excluded from the sort result (ie. the part stating to recursively sort two subarrays: [0, pivotIndex-1] and [pivotIndex+1, end] is wrong. It should read [0, pivotIndex-1] and [pivotIndex, end]. Modifying the real code and observing the unsorted results below can verify this.
int partition(int* start, int* end, size_t pi, int pv) { swap(start[pi], *(end-1)); int* store = start; for (int* i = start; i<end; ++i) { if (*i <= pv) { swap(*i, *store); ++store; } } if (store >= end) store = end -1; swap(*store, *(end - 1)); return store - start; } void sort(int* start, int* end) { size_t size = end - start; if (size < 2) return; size_t pi = size / 2; size_t pv = start[pi]; pi = partition(start, end, pi, pv); sort(start, start + pi); sort(start + pi, end); }
-- Matthew Herrmann (Dec 1, 2010)
--
- It's working, that's probably a problem with your implementation, like the way you're handling boundaries. I've done several tests using the following Python 3 script; note that the quicksort is implemented exactly as in the pseudo-code:
from random import randint
def partition(array, left, right, pivotIndex):
pivotValue = array[pivotIndex]
array[pivotIndex], array[right] = array[right], array[pivotIndex]
storeIndex = left
for i in range(left, right): # left <= i < right
if array[i] <= pivotValue:
array[i], array[storeIndex] = array[storeIndex], array[i]
storeIndex += 1
array[storeIndex], array[right] = array[right], array[storeIndex]
return storeIndex
def quicksort(array, left, right):
if right > left:
pivotIndex = int(left + (right - left) / 2)
pivotNewIndex = partition(array, left, right, pivotIndex)
quicksort(array, left, pivotNewIndex - 1)
quicksort(array, pivotNewIndex + 1, right)
# Testing:
num_tests = int(input('Number of tests: '))
size = int(input('Length of array: '))
limit = int(input('Items ranging from 0 to ...? '))
failed = False
for i in range(num_tests):
a = []
for j in range(size):
a.append(randint(0, limit))
b = sorted(a)
quicksort(a, 0, len(a) - 1)
if a != b:
failed = True
break
if failed:
print('Got/should be:\n{0}\n{1}'.format(a, b))
else:
print('Sorted correctly.')
Too much tests in the complex version (one function )
I'm talking about the last function example on the page, right before the "Implementation issues".
while leftIdx <= pivot and rightIdx >= pivot while array[leftIdx] < array[pivot] and leftIdx <= pivot leftIdx = leftIdx + 1
I think that the last test of the inner while ("leftIdx <= pivot") is unnecessary.
The first condition "array[leftIdx] < array[pivot]" can be interpreted as "stop whenever you find something that is not smaller than the pivot value".
We know from the outer while that leftIdx starts smaller or equal to pivot, and we increment leftIdx by 1 only. I see two ways it can end:
- there's something not smaller than the pivot value
- we've reached the pivot, and the condition "array[leftIdx] < array[pivot]" is no longer met.
In the two cases, the loop will exit. Thus, the second test "leftIdx <= pivot" will never yield false.
The same is true for the second inner while.
I think it's worth modifying the page for this, or did I miss something ?
Modification to Simple version
I have made a minor change to the code for the simple version. The original state of lines 2-4 was:
create empty lists 'less' and 'greater'
if length('array') ≤ 1
return 'array' // an array of zero or one elements is already sorted
I have rearranged these lines as follows:
if length('array') ≤ 1
return 'array' // an array of zero or one elements is already sorted
create empty lists 'less' and 'greater'
The change is a small one, but it seems to me like this will save some memory in the base case since the lists will not have to be allocated. Is it okay to leave this modification or can anyone see something wrong with it? —Entropy (T/C) 17:58, 1 November 2011 (UTC)
In-place/complex algorithms
Originally from my talk page. bou·le·var·dier (talk) 02:19, 11 March 2011 (UTC)
I think you missed my point. I completely understand the potential advantages of using a more advanced technique to select the pivot. However, two things should be taken into consideration here: 1) the recommendation in the pseudo-code (in the form of a comment) is an arbitrary choice and it's as bad as picking the last element; and 2) most implementations of quicksort do not attempt to select a pivot in a particularly smart way (and that includes textbooks in the subject). The current article has more code than what is needed to describe an in-place version of quicksort and it's harder to read because of that. If we really want to keep that parameter (for generality sake), then I think we should elaborate more on the choice of the pivot (in the pseudo-code itself).
I also find that the split in "Simple version" and "Complex version" is not a good one. Perhaps Complex version should be renamed to In-place version, which is really what that section talks about.
Finally, the comment about the overflow, while correct, is completely out of place and adds a lot of clutter without adding any useful information about quicksort itself. (Thiagohirai (talk) 18:03, 10 March 2011 (UTC))
- Bringing this to article talk because it's a good point and I agree. We could do better in the algorithms we provide and our discussion of them. Right now we give four psuedocode algorithms:
- The basic quicksort, with O(n) additional space complexity
- An in-place partition algorithm
- An in-place quicksort
- Another in-place quicksort, with no discussion
- I see no reason to keep the 4th algorithm - it is provided naked with no discussion and it's not even obvious to me what merit it has apart from not having a separate partition routine. I do agree we can move the overflow discussion away to the section about pivot selection (and clarify it, it's ugly prose), and leave the selection in algorithm 3 to something like "select a pivot index // see #Whatever the section is". And yes, with these changes I think it's reasonable to change "complex version" to "in-place version". If there's no contention I'll do all this in a couple of days. bou·le·var·dier (talk) 02:19, 11 March 2011 (UTC)
- Support rename to "In-place version" or similar heading.
- Support deletion of the 4th algorithm
- Suggest using a "pivotSelect(array, left, right)" call in the example code
- Support containment of the overflow issue to discussion on pivot selection
- Glrx (talk) 19:13, 13 March 2011 (UTC)
Sorry for the delay. I've just made these changes as I outlined above. I'd appreciate someone fact-checking the small bits of prose I added (in particular the bit about termination). bou·le·var·dier (talk) 14:31, 19 March 2011 (UTC)
On the subject of in-place, the synopsis currently states: "Although quicksort is usually not implemented as an in-place sort, it is possible to create such an implementation." The reference cited actually states the opposite, so this needs to be changed at any rate, but from what I gather it was originally changed (from stating that QS can be done in-place) because, when the recursion is factored in, the amount of storage used is not constant; this change was done due to the text of the In-place_algorithm article. Whether the present article classifies QS as in-place doesn't matter to me, but either way the text of the introduction needs to be changed. I leave this to somebody who feels strongly about making a case for whether QS is, in fact, in-place. Ocvailes (talk) 19:33, 8 April 2011 (UTC)
Optimization
NickyMcLean wrote on 18 August 2010:
- A separate insertion sort of each small segment as they are identified adds the overhead of starting and stopping many small sorts, but, avoids wasting effort comparing keys across the many segment boundaries, which keys will be in order due to the workings of the quicksort process.
Is there really any 'effort comparing keys across boundaries' to avoid? As far as I understand the code, insertion sort performs a linear scan for every new item to find a right place for it, and stops the scan as soon as it finds a key not greater than the new one. Every quick-sorted segment contains keys not less than keys in the previous segment, so each such linear scan terminates at the segment boundary (if not earlier). The basic concept of the quick sort algorithm is partitioning the array into sub-arrays, which do not mix ever after – once the segment is defined, all its elements will remain in it; they may move fore and back during following partitioning, but are never swapped between segments. The insertion sort obviously doesn't need to make such swaps, either, so there's no need to compare keys across segments' boundaries.
I think the whole underlined part is simply false. The single insertion sort invoked on the whole almost-sorted array will perform exactly the same number of comparisions as all insertions sorts on separate segments together. The only advantage of multiple soon invocation is better cache performance, and the only advantage of the single late invocation is less call and return overhead. --CiaPan (talk) 19:10, 24 March 2011 (UTC)
- Huh? Consider two segments, say having elements 1-4 and 5-7. Both segments are yet to be sorted internally, but, all elements of the second segment follow all elements of the first segment. This is the basis of partitioning. An insertionsort across the whole span 1-7 will compare elements 4 and 5, whereas separate insertion sorts for the two segments will not. In both cases, there will be no difference in the number of swaps, but there will be a difference in the number of comparisons. NickyMcLean (talk) 21:14, 24 March 2011 (UTC)
- Of course, you were perfectly right. I'm sorry for my temporary blindness. --CiaPan (talk) 20:54, 22 November 2011 (UTC)
in-place
"Quicksort can be implemented as an in-place sort, requiring only O(log n) additional space." This statement is contradicting the article about in-place algorithms, in which "in-place" is defined as using only a constant amount of extra storage space. 148.87.19.202 (talk) 19:31, 25 November 2011 (UTC)
- The partition step can be in place -- that is, requiring only O(1) additional space. I changed the lead to make that clear. The In-place algorithm article also allows an informal designation of "in-place" that uses more than O(1) (but presumably << O(n)) space. Glrx (talk) 19:46, 28 November 2011 (UTC)
Quick-sort with Hungarian (Küküllőmenti legényes) folk dance
This video is a great visualization of quicksort. This and similar methods are very useful in mathematics & computer science education.
- The article has other visualizations. Is it really the best visualization for QS? Folk dance does not belong in the article. Glrx (talk) 03:18, 22 December 2011 (UTC)
expensive "in place" partition for stable sort
I'm have some technical trouble with [this edit]. The algorithm uses O(log n) stack space to do the stable partition instead of O(1) space for an unstable partition. On top of that, the stable partition makes a lot of moves (is it O(n log n) moves at each level for n log-squared n moves total?). The partition step is also reminiscent of a merge sort step. Is the definition of QS being stretched too much? Does making a stable QS algorithm throw out too many of the benefits of QS? I'm leaning toward saying QS is unstable, but then having a section about an academically interesting stable version. Glrx (talk) 20:18, 26 December 2011 (UTC)
Choice of Pivot
In the "choice of pivot" section, the article talks about taking the median of the first, middle, and last elements as the pivot. Wouldn't this violate the comparison sort property? — Preceding unsigned comment added by Shades97 (talk • contribs) 16:51, 27 May 2012 (UTC)
- The basic property of comparison sort is that it compares items in pairs and the comparison result indicates which of two items should go first in a list. Of course the ordering relation must be transitive.
Pivot selection just makes 2 or 3 comparisons on three items chosen from the (sub)array to finally use one of them as a pivot. I can't see how that would break the algorithm property... --CiaPan (talk) 06:31, 28 May 2012 (UTC)
Three-Way partitioning strategies (Quick3)
There is no mention of the various ways a quicksort can implement a 3 way partitioning strategy. Examples are: Djikstra, Bentley/McIlroy, and more recently, Dual Pivot, which is officially implemented by JDK7. http://grepcode.com/file/repository.grepcode.com/java/root/jdk/openjdk/7-b147/java/util/DualPivotQuicksort.java?av=f
Just figured the maintainer might be interested in mentioning these in this article.
Regards
Why remove the pivot? ("simple" version)
In many languages, that would actually involve copying the remainder of the array. Instead, simply place the pivot value in the "<=" array. — Preceding unsigned comment added by 68.183.23.57 (talk) 16:23, 17 January 2013 (UTC)
- If you happen to chose the largest item of an array as a pivot value, you will end the first patritioning with the empty 'right' part and the 'left' part equal the whole array. That would cause the infinite
looprecursion. Removing the pivot item causes the next level of recursion to process one item less and thus guarantees the recursion to stop (somewhere). --CiaPan (talk) 18:41, 26 February 2013 (UTC)
Inefficient "In-Place" Code?
Why is the In-Place code not coded the way the all the animated examples are? (by iterating from each end and swapping elements larger/smaller than pivot and inserting pivot between them when the iterators meet/pass) This way would use less comparisons and much less swaps in most cases than the current listed code. 66.176.138.89 (talk) 02:23, 14 March 2013 (UTC) T.M.
- I didn't check 'all animated examples' but the first one amused me: it displays animations with two pointers walking from both ends of an array to each other, but it contains partitioning loop corresponding to that in our article:
for i = 2:n, if a[i] < a[1], swap a[++k,i]
CiaPan (talk) 16:54, 14 March 2013 (UTC)
- Yes, that's what I'm speaking of. It is similiar, but walking from both ends and swapping "> pivot" elements on the right with "< pivot" elements on the left seems to half the number of swaps needed in most cases. It is similiar, but just seems more efficient and has always been the way I've coded quicksorts. If you look at the "In Place" code, it's has two iterators going left to right (i and storeIndex) causing unnecessary swaps. I could add the code myself but I wanted to be sure it would be right to change it. 66.176.138.89 (talk) 21:05, 14 March 2013 (UTC) T.M.
- No, it is not worth replacing.
- the simple algorithm used here makes same number of comparisions as that with two iterators walking opposite to one another (each item gets compared with the pivot in both algorithms);
- indeed it makes about twice more swaps on average data: each item less than the pivot gets swapped once, while the double-iterator algorithm swaps each item less than the pivot found by the right iterator and each item greater than the pivot found by the left iterator AND every item equal to the pivot, if the keys are not unique; so simple routine may make up to ~N swaps while complex routine may make up to ~N/2 swaps, the difference is in what is the worst case (choosing the greatest item as a pivot vs. having an array of equal keys);
- but it is much simpler;
- both algorithms may make the worst partitioning possible i.e. 1:(N–1) if we choose the least or the greatest item as a pivot;
- you can skip swapping equal items in the double-iterator algorithm to achieve small improvement on average data, but that leads to worst partitioning and O(n2) time on degenerate case of all keys equal.
- Generally there's no general algorithm, which is best for all possible cases. Every algorithm has its own advantages and its own worst or best cases; for every specific case there may exist a specifically tailored algorithm which will handle the case in best time. In addition every algorithm may be implemented in different manners, in different languages, on different platforms with different environment limitations – and it's always the implementor's responsibility to recognize all conditions (like integer overflows and zero wrapping, memory locality etc).
I think Wikipedia need not give 'best' algorithms, but it should show 'good enough' examples and discuss their main properties. The 'simple' algorithm here is fast enough for most applications, is short enough to be understood without long studying and contains probably minimum possible number of implementing traps. That makes it IMHO the best example of qsort. We should admit it's not the only possible implementation, and we may present the more sophisticatd approach, but I think we should not replace it with more complicated version. The simpler example, the more enlightening. --CiaPan (talk) 00:52, 15 March 2013 (UTC)- I think the problem is one of clarity in the actual article. If someone learning quicksort were to refer to the in-place algorithm in the pseudocode, then look at the animated GIFs that demonstrate quicksort, they'd realize that none of the pseudocode on the page actually matches up with them. I'm all for presenting alternative ways to do it, but it's just confusing this way. Shouldn't the animated pictures show the method described in the pseudocode?
- The pictures are all over the place on the page, too. I think it could use some organization. 74.102.66.11 (talk) 20:03, 19 March 2013 (UTC)
- NO, the animated pictures should not show the method described in the pseudocode! It's the pseudocode, which should describe the method shown on pictures.
The problem is the pseudocode is NOT the quicksort. Quicksort- works in-place on the input array, which can be directly addressed in random order, so its item can be easily swapped, and
- due to the previous, does not actually create or concatenate any temporary 'lists'.
- Describing it in terms of 'lists', 'appending' and 'concatenating' results in an algorithm similar to the merge sort, which obviously is not the Hoare's quicksort. --CiaPan (talk) 06:30, 8 May 2013 (UTC)
- NO, the animated pictures should not show the method described in the pseudocode! It's the pseudocode, which should describe the method shown on pictures.
- No, it is not worth replacing.
An optimization that isn't anymore.
This article repeats the old advice of leaving small subarrays unsorted and then performing a single insertion sort pass at the end. This was correct advice back in the days when CPUs didn't have fast memory caches, but these days it is a disaster. It is much better now to sort the small subarrays immediately they are produced, since they are probably already in the cache. Leaving them until later will mean they have to be fetched again from the main memory chip (assuming we are sorting a very big array). It can make a large difference in practice. I'm not immediately changing the text since I don't have a reference handy, but I know that this is reliably published and the article needs to be fixed. McKay (talk) 04:45, 9 August 2013 (UTC)
- Refs: LaMarca 1997, Cole 2004. The price of missing the cache is greater than when those papers were written, but I can't find a more recent comparison. McKay (talk) 07:30, 9 August 2013 (UTC)
- Why not state both: old practice with k-sorted blocks and single pass and modern cache-conscious calls to insertion sort small blocks. Glrx (talk) 20:38, 9 August 2013 (UTC)
Wrong In-place Version Code
Shouldn't the line in the in-place version pseudocode
storeIndex := left
be
storeIndex := pivotIndex
?
It seems from the graphic that's where the swap occurs, but I'm too reticent to go ahead and make the change myself. 172.249.46.244 (talk) 21:54, 4 December 2013 (UTC)
- No. storeIndex starts at the left and advances only when an element is less than the pivot. Glrx (talk) 00:10, 6 December 2013 (UTC)
Worst case behaviour triggered by [nearly] all-equal input data
If there are many elements with the same value, chances are that one of them will be selected as the pivot. While partitioning, if we put all of them in the same side, we are provoking quicksort's worst case behaviour.
I suppose that the current in-place partition method was inspired by the great "Introduction to Algorithms" (3rd edition). If so, prease see exercises 7.1-2 and 7.2-2, which cover this issue. Note that, in Section 7.4.2 ("Expected running time"), they warn: "We assume throughout that the values of the elements being sorted are distinct."
I see several optinos here:
- Add a big warning indicating the flaw of the current partition method
- Fix the current partition method as exercise 7.1-2 suggests
- Resort back to the good old Hoare partition method, with the additional precaution of leaving the pivot in the center and excluding it in the recursive calls
Quicksort is famous for its speed. Otherwise we would use heapsort or smoothsort. Therefore, I would rather choose the third option, maybe in combination with the first one. The current partition method looks simple, but it makes more swaps and it won't be so simple after we fix it.
I was in a bookstore yesterday, checking the quicksort code in every book on algorithms. Interestingly, most of them had this wrong. This calls for a big warning. Como (talk) 07:59, 5 September 2013 (UTC)
- Agreed, this is one of the most important problems of quicksort but many textbooks and courses simply skip over it. Sedgewick and Bentley both wrote about this. I added a subsection to the article; I'll add the references later. QVVERTYVS (hm?) 20:09, 9 January 2014 (UTC)
Needs reformatting
The image description under section Algorithm overlaps with the simple algorithm. Otherwise a stellar article. OceanicMeerkat (talk) 00:15, 28 January 2014 (UTC)
What does "a.o." mean?
In the "History" section there appears:
- "Quicksort gained widespread adoption, appearing a.o. in Unix as the default library sort function" . . .
What does "a.o." mean? I've been a UNIX hacker for long enough to wonder if somebody misspelled "a.out". (No, because "a.out" wouldn't make any sense in such a usage.)
Accordingly I suggest that "a.o." is overly obscure whatever it was intended to mean.
And I am curious about what was intended. — Preceding unsigned comment added by 134.134.137.71 (talk • contribs)
- "Among others". Actually the grammar may be a bit off: the intended meaning is that quicksort appeared in Unix and other systems, but it seems to say quicksort and other sorting algorithms appeared in Unix. (Also true because
sort(1)
was not a quicksort, IIRC, but not intented.) QVVERTYVS (hm?) 10:02, 13 March 2014 (UTC)
Choice of Pivot - Median
The text says:
..., choosing the middle index of the partition or (especially for longer partitions) choosing the median of the first, middle and last element of the partition for the pivot (as recommended by Sedgewick) [ref]
I've added "Clarification needed" because according to the Median article, the median of three elements is the middle element; therefore, according to the text, there would be no difference between such a Median method and the middle index method. Would someone please expand the article to better explain what that means? --pgimeno (talk) 20:40, 6 April 2014 (UTC)
- The median of [3, 1, 2] is 2. The middle element is 1. QVVERTYVS (hm?) 21:32, 6 April 2014 (UTC)
- Guess I was confusing indexes and values. Thank you, it makes sense now. --pgimeno (talk) 15:50, 7 April 2014 (UTC)