Re: Help with generating a random matrix with row and column constraint



From: Ruchika Chhetri
: Hi All,
:
: I would greatly appreciate if someone could help me with
: the following problem.
:
: I am trying to generate a 300X5 matrix of uniformly distributed
: random numbers with the following constraints.
:
: 1)All the elements in the matrix should be
: one of the following numbers : 0,1,2,3,4,5.
: 2)The sum of all the elements in each column should be 100
: 3)The sum across the row should be <= 5
: 4)I want to generate all these numbers
: as independent of each other as possible.
:
: I understand that at one stage these numbers will be depending
: on one another to meet the constaint but is it possible
: to simulate the data in such a way that these numbers can be
: independent to a large extent.
:
: I am trying to write a code to generate the data set in Matlab,
: but am struck with the algorithm. I shall be very thankful
: if someone can suggest an algorithm that can be coded in Matlab.
:
: Thankyou,
:
: Best Regards,
:
: Ruchika

Perhaps the simplest method is to generate the columns independently,
and then test whether the row constraint is satisfied.
If the row constraint fails, discard everything and try again.
This will produce samples that are uniform in the combinatoric sense.
Giving preference to combinations that use more of the bigger numbers
will require a more complicated algorithm.

The coefficient of x^100 in (1 + x + x^2 + x^3 + x^4 + x^5)^300
is the number of column possibilities.

To generate a column, since 299 fences partition the 100 marbles
among the 300 bins in (399 choose 100) ways, randomly choose either
a fence or a marble according to the probability calculation:
P(fence) = f/(f+m)
P(marble) = m/(f+m)
where f is the number of unchosen fences,
and m is the number of unchosen marbles.
Afterwards, if any bin has more than 5 marbles in it,
discard the column and try again.
This seems to happen only about 10% of the time,
or about 40% of the time in a set of 5 columns.

Satisfying the row constraint seems to require more than 500 retries
on average, given 5 valid columns for each try.

GWBASIC program:

1 RANDOMIZE TIMER
10 DIM A(299,4),C(5)
20 FOR J=0 TO 4: F=299: M=100
30 IF RND(1)<F/(F+M) THEN F=F-1: GOTO 40 ELSE M=M-1: A(F,J)=A(F,J)+1
40 IF F+M THEN 30
50 NEXT J
60 FOR J=0 TO 4: FOR I=0 TO 299: IF 5<A(I,J) THEN E=E+1: GOTO 130
70 NEXT I: NEXT J: T=T+1
100 FOR I=0 TO 299: R=0: FOR J=0 TO 4: R=R+A(I,J): NEXT J
110 IF R<6 THEN NEXT I: S=S+1: GOSUB 200: IF S>=20 THEN 400
130 FOR I=0 TO 299: FOR J=0 TO 4: A(I,J)=0: NEXT J: NEXT I: GOTO 20
199 END
200 FOR J=0 TO 4
210 FOR I=0 TO 299: A=A(I,J): C(A)=C(A)+1: NEXT I
220 NEXT J
230 PRINT S;T,T/S,E/T;
240 FOR I=0 TO 5: PRINT C(I);: C(I)=0: NEXT I: PRINT
250 RETURN
399 END
400 IF ""=INKEY$ THEN 400
410 PRINT: FOR I=0 TO 299
420 PRINT " ";: FOR J=0 TO 4: PRINT CHR$(48+A(I,J));: NEXT J
430 IF I<>149 THEN 460
440 IF ""=INKEY$ THEN 440
450 PRINT
460 IF 9=I MOD 10 THEN PRINT
470 NEXT I

# OK tries avg. tries >5 in bin 0's 1's 2's 3's 4's 5's

1 390 390 .3461539 1110 298 75 16 1 0
2 395 197.5 .3518987 1101 315 68 15 1 0
3 989 329.6667 .3842265 1121 286 69 20 4 0
4 1436 359 .3983287 1097 327 62 8 5 1
5 1490 298 .3946309 1126 272 80 20 2 0
6 2066 344.3334 .4070668 1120 291 63 22 3 1
7 2682 383.1429 .4011932 1121 286 68 22 3 0
8 3617 452.125 .3995023 1126 276 74 20 4 0
9 4227 469.6667 .3986279 1129 273 73 19 6 0
10 4659 465.9 .3983688 1114 299 62 23 2 0
11 6492 590.1818 .3935613 1098 322 64 14 2 0
12 6611 550.9167 .3947966 1121 284 70 24 1 0
13 6777 521.3077 .3938321 1128 269 81 19 3 0
14 7101 507.2143 .3954373 1138 264 67 23 7 1
15 8791 586.0666 .3971107 1116 290 74 18 2 0
16 9304 581.5 .3948839 1126 281 68 18 6 1
17 9325 548.5295 .3944236 1126 278 73 17 5 1
18 9730 540.5556 .3942446 1124 275 80 19 2 0
19 10554 555.4737 .3925526 1130 266 83 16 5 0
20 11321 566.05 .3930748 1123 280 77 15 4 1

00110 01211 00000 00100 00001 00032 00000 00000 10040 20000
10110 01000 01200 01000 00100 00000 00000 00210 00011 00001
00100 20000 00102 01000 00001 01000 01000 10000 00000 01000
00000 00000 00110 01010 10101 00120 12000 00120 00010 10000
00001 01110 10000 00000 00000 10002 00000 01100 01110 20100
01001 00000 01003 11201 01100 10000 30000 01000 01000 01100
00000 00000 00200 00102 00000 10000 02000 21011 01012 00000
00111 00100 00300 01101 11000 00000 12000 00000 10200 01000
00010 00000 00100 00001 20000 00000 01010 00000 00101 00200
00000 00000 10001 10200 00000 00111 00000 31000 01000 01001
00010 13000 11001 10000 21100 20000 00000 00000 00000 00001
00110 00200 10001 10001 00001 00010 00000 01001 02101 00000
00001 00100 00010 00020 00100 00211 00000 11200 00000 00001
00200 00001 10000 00000 01200 21010 00010 00000 00001 00000
00000 22100 01000 02100 00010 10000 10020 00000 10000 00000
00102 00030 00002 10110 01100 00000 00200 01000 00001 01101
00010 00010 10000 00010 20110 00000 01003 10000 11000 01010
00010 10000 00000 10100 00020 00000 21000 02001 00011 01102
00000 11100 00020 31000 00000 00000 00000 00201 01000 20100
00000 01020 00120 01011 00010 00002 01000 00020 10100 00000
00000 02010 10001 00000 00020 00010 00201 00101 01000 00000
01010 00005 00100 01011 30010 00010 00000 00001 00000 00000
11000 00103 00000 00110 00001 00002 00021 10201 01000 00010
00031 20000 41000 03000 11100 00014 00010 20000 00000 00010
00000 00001 00000 10000 00000 10000 02111 00002 00000 01210
11100 00000 01000 00000 02000 00000 00021 00100 00000 20000
00100 10000 00100 00002 00000 11000 10110 20001 01010 10201
01100 00011 00010 00000 00030 00100 21100 00030 01002 01002
01000 00000 00210 12100 00000 01100 00040 10000 01000 00000
02002 00000 10010 00001 20000 00000 00000 01000 00000 01001

.