Wikipedia:Reference desk/Archives/Computing/2022 March 24

Computing desk
< March 23 << Feb | March | Apr >> March 25 >
Welcome to the Wikipedia Computing Reference Desk Archives
The page you are currently viewing is a transcluded archive page. While you can leave answers for any questions shown below, please ask new questions on one of the current reference desk pages.


March 24

edit

Getting from the choose function to a combination

edit

I'm building a combinatorical Monte Carlo simulation and I'm trying to think of a neat way of mapping 1, 2, 3, ...,   to (1,2, ..., k), (1, 2, ..., k+1), ....., (n-k+1, n-k+2, ..., n) or some other logical ordering. Doing a single RNG call for 1 to   and reading out the corresponding combination would be faster than the way I'm doing it now. Does such a thing exist? I sort of remember reading about it a while ago. 93.136.96.43 (talk) 02:39, 24 March 2022 (UTC)[reply]

You want a random subset of 1, 2, 3, ...,  , right? Bubba73 You talkin' to me? 03:00, 24 March 2022 (UTC)[reply]
I want to choose k different random numbers from 1, 2, 3, ..., n in the fastest way possible. If I could do that by selecting one random number between 1 and "n choose k", that seems like a reasonable candidate. 93.136.96.43 (talk) 04:32, 24 March 2022 (UTC)[reply]
It is likely that this method will give some duplicates. Is that OK? Bubba73 You talkin' to me? 05:54, 24 March 2022 (UTC)[reply]

If you want to disallow duplicates, what you want is a random subset. See https://stackoverflow.com/questions/136474/best-way-to-pick-a-random-subset-from-a-collection Bubba73 You talkin' to me? 06:07, 24 March 2022 (UTC)[reply]

Not all solutions presented at stackoverflow pick the subset with a uniform probability of   If you start with an array of length   containing   different values in any order, the following algorithm will reorder them to produce a random subset of size   in the first   elements of the array. By repeating this, each next execution will produce another random pick. The following pseudo code assumes the elements are stored in a[0], a[1], ..., a[n−1]. It uses a function rindex(i) which returns a random index in the range 0 to i−1, and a void function swap whose meaning is hopefully obvious.
for i = 0, ..., k−1:
r = rindex(n-i) + i
if ri:
swap(a[i], a[r])
 --Lambiam 11:29, 24 March 2022 (UTC); correction applied 16:59, 24 March 2022 (UTC)[reply]
I think OP is asking about enumerating k-combinations as described in Combinatorial_number_system#Applications. The most efficient or fastest way to do this may be different depending on whether you want to traverse all the k-combinations in order (in which case you can make a very fast iterator) or only visit a few with random access (in which case that solution won't work for you). --Amble (talk) 19:37, 24 March 2022 (UTC)[reply]
If you need to visit only a few with random access, you should be able to do something based on  . For example:
function i2subset(n,k,i)
ofs = choose(n-1,k)
if i > ofs
return union({n},i2subset(n-1,k-1,i-ofs))
else
return i2subset(n-1,k,i)
You will of course need to check for k=n, k=0, k>n, etc. and you might want to unroll the recursion. --Amble (talk) 19:55, 24 March 2022 (UTC)[reply]
I don't think that he wants to enumerate all combinations - he wants to select a certain number of random ones. Combinatorial Algorithms by Nijenhaus and Wilf has a Fortran routine for generating random subsets. Many years ago I translated it into Pascal, that I have used, but it is ugly - it uses GOTO statements (from the direct translation). This must be in Knuth and others. Bubba73 You talkin' to me? 20:10, 24 March 2022 (UTC)[reply]
I don't think OP wants to list all possible combinations, but he does want the bijection from the first   integers to the set of combinations, as described in Combination#Enumerating_k-combinations. --Amble (talk) 20:28, 24 March 2022 (UTC)[reply]
I agree. A random subset will do what I think the OP wants. This is the NIST page on getting a random subset. And here is Knuth's algorithm in Rosetta Code. Bubba73 You talkin' to me? 23:44, 24 March 2022 (UTC)[reply]
Knuth provided a proof that his shuffling algorithm is fair. It is also O(n). So, his algorithm for this is simple. Shuffle. Then get the first k items. You have a random subset in O(n) time. 97.82.165.112 (talk) 13:56, 25 March 2022 (UTC)[reply]
The one I gave, if repeated, is O(k) per run.  --Lambiam 00:13, 26 March 2022 (UTC)[reply]
I agree, but I don't agree with the use of "+i" when fetching the random index. It should be fair. For every index, 0 through k-1, the swap should be 0 through n-1. What you did is specifically discussed by Knuth in TAoCP and shown to make the shuffle unfair. Many people (not just you) implement his shuffle as you did, which is why he discusses it and shows a mathematical proof. He does not cover shortcutting the shuffle after k iterations. In a full shuffle, each index (0 to k) has a chance of being swapped with another index n times. You only let them swap a maximum of k times (if you didn't offset the random index). I know that we are talking about very minor discrepencies. I am only saying that I would not offset the random index by i. It is more calculation (subtraction and addition) and it doesn't make the result as fair, though by a negligible degree. Personally, I wouldn't check for i not equal to r either if the langage has a fast swap. The control structure overhead could be more expensive than swapping. But, that is language dependent. 97.82.165.112 (talk) 19:10, 26 March 2022 (UTC)[reply]
We need to draw a sequence of indices of length  . If each of the   rounds draws a random index in the range  , there are   possible sequences. Each sequence uniquely determines a subset, but some subsets are obtained by more than one sequence. Since all   subsets are to have the same probability, the value   needs to evenly divide   But if   we have   while   and   It follows that not all indices can be drawn from the same range. (This was precisely why I warned above that not all solutions presented at stackoverflow pick the subset with a uniform probability; they fell into the same trap.)  --Lambiam 00:52, 27 March 2022 (UTC)[reply]
Perhaps the OP can clarify whether the unbiased random subset meets their needs, or whether they really need a recipe the bijection. --Amble (talk) 20:09, 25 March 2022 (UTC)[reply]
It’s worth noting that the bijection between integers and combinations may not be a good way to generate random combinations. The original set does not have to get very large before   will overflow commonly supported integer data types like uint64. For example, n=100 and k=50 will need ~92 bits. To support even moderately large sets you’ll need an arbitrary-width integer library. On the other hand, an algorithm like Lambiam’s will very easily handle even much larger sets. —Amble (talk) 17:06, 26 March 2022 (UTC)[reply]
In case anyone is interested, here is an (informal but intuitive) proof that the algorithm I presented above is correct. We start with an algorithm in set-theoretic notation whose correctness is hopefully obvious. Given is a set   of size   and the task is to form a random subset   of size   while replacing   by   Then this will do:
 
 
 
 
To implement this, we represent the pair   and   by an integer i and an array a, where i    a[i] a[n−1]  and  a[0] a[i−1]  So the first i elements of the array represent  , while the remainder form   Since the elements of   are initially stored in array a, setting i equal to zero sets   to   and establishes the representation invariant. Picking a random element of   means, by the representation, picking one of  a[i] a[n−1]  at random, which is achieving by picking a random index r from the set  i n−1  The swap of a[r] and a[i], together with the subsequent incrementing of i, then implements the transfer of the chosen element from   to    --Lambiam 09:25, 29 March 2022 (UTC)[reply]
I have not read previous responses so far, but there's this: Wikipedia:Reference desk/Archives/Mathematics/2016 June 19#to index a permutation — a combination is equivalent to a permutation of 000...000111...111, but rather than return a list of bits you'd modify this algo to return a list of where the 1s land. —Tamfang (talk) 02:12, 31 March 2022 (UTC)[reply]