Re: Logistic Regression



Thanks to everyone who replied to this thread.

It seems clear that I need to look into this further. I have isolated
one problem that fails to converge, which has the frequencies of the 18
most common words and punctuation symbols as independent variables.
Going through large amounts of debugging output, the algorithm seems to
start fine. In a reasonably small number of iterations, it gets to a
point where all the 0 rows are assigned a probability very close to 0,
and the 1 rows are assigned about 0.95-0.96. Then it suddenly stops
converging and, more or less, goes crazy. The sum of the absolute
values of the updates of the b matrix stop dropping, and wildly
fluctuate. The probabilities that were getting quite close to being
correct go crazy. The sizes of the coefficients get to about 1000 max
before the whole process seems to break down, but after that get very
large. (XXXe+10 etc)

Today I checked the relevant sections of Alpaydin's book (Intro to
Machine Learning). Alpaydin gives different algorithms for fitting a
logistic regression model than I used. There's also a recommendation
that the weight matrix is initialised to random numbers in the range
-0.01 to 0.01.

I modified my program to start with random weights between 0 and 1. In
two trials, the first trial failed to converge. The second trial
converged quickly to a good solution and stopped. So, it is possible to
fit a logistic model to that data.

I'm not sure what form I should use to send data to people. I attach
the X and Y matrices here for the problem mentioned above in CSV.

I'll try and run this through R's polr() function for fitting logistic
(and other) models. However, given that I can fit a model, even if not
every time, I'd imagine that R or SPSS would have no difficulty fitting
a model. I'll also check out the Classificaiton Society.

Cheers,

Ross-c

x0,x1,x2,x3,x4,x5,x6,x7,x8,x9,x10,x11,x12,x13,x14,x16,x17,x18
1,0.0720183,0.037871,0.0308037,0.0299751,0.0304528,0.0216796,0.0219038,0.0190281,0.0145538,0.0135887,0.0102062,0.00898767,0.0101477,0.00995272,0.00728177,0.00978701,0.00770093,0.00687235
1,0.074273,0.0328923,0.0305713,0.0331134,0.0316545,0.018303,0.0219282,0.0211103,0.0152525,0.0173635,0.0107983,0.00916255,0.00560363,0.00551521,0.00879782,0.00522785,0.0102236,0.0124673
1,0.0700869,0.0384123,0.0295585,0.0193777,0.0396019,0.0176619,0.0239991,0.0191261,0.0173073,0.0175132,0.0111874,0.00893388,0.00236788,0.00974605,0.00769847,0.00266529,0.00642873,0.0148479
1,0.0759439,0.0404344,0.0306368,0.0258251,0.0333168,0.0154446,0.0227014,0.0187249,0.0155142,0.0138088,0.01025,0.0101369,0.00684783,0.00924065,0.00735249,0.00684783,0.00617783,0.00937117
1,0.0647153,0.035388,0.0301734,0.00170839,0.0362259,0.0379669,0.0222578,0.0190851,0.0171489,0.0137973,0.0112591,0.00940426,0.00770401,0.00792366,0.00792366,0.0088592,0.00752504,0.00975407
1,0.0769764,0.0340408,0.0323392,0.0279275,0.0321253,0.022772,0.0195217,0.0194504,0.0168421,0.0186149,0.0111567,0.00925143,0.00642913,0.0109428,0.00753971,0.0065514,0.00770273,0.0103824
1,0.0779948,0.0347583,0.0354439,0.0319094,0.0364722,0.0159814,0.0213516,0.0183199,0.0146559,0.0141456,0.0112509,0.00982648,0.00960557,0.00561404,0.00774692,0.00882859,0.00785356,0.00652813
1,0.0809327,0.0305512,0.0383111,0.027119,0.0338934,0.0229455,0.0215272,0.0185364,0.0143715,0.0171524,0.0101851,0.00868973,0.0057503,0.00392066,0.00765279,0.00470051,0.00886541,0.0136773
1,0.0805173,0.0273328,0.0318756,0.0341889,0.0309518,0.0251569,0.0211562,0.0189574,0.013758,0.017041,0.00942143,0.0094596,0.00611553,0.00432134,0.00881064,0.00419918,0.00918475,0.0132541
1,0.0667417,0.0303387,0.0310891,0.0308972,0.0292829,0.0183062,0.0230442,0.0189781,0.0159678,0.0196499,0.0105754,0.00838525,0.00749524,0.00221629,0.00893496,0.00655289,0.0110989,0.0121721
1,0.055775,0.0524985,0.0415416,0.0302791,0.0270236,0.0179315,0.0173415,0.0199859,0.0242633,0.00582614,0.0113573,0.0108094,0.0125267,0.0107568,0.00893412,0.0122107,0.00620542,0.00742754
1,0.042524,0.05554,0.0441282,0.0196517,0.0248897,0.0121896,0.0203444,0.0213531,0.0265547,0.00700023,0.011266,0.0138546,0.0150213,0.0131376,0.0100507,0.0159085,0.00398624,0.00606444
1,0.0449953,0.0444433,0.0558275,0.036642,0.0186439,0.0190605,0.0197688,0.0204354,0.0232372,0.0133528,0.0106656,0.0135194,0.0135715,0.0136965,0.0109676,0.0120821,0.0102593,0.00520779
1,0.0476545,0.0499341,0.0498566,0.0295107,0.0244708,0.0167481,0.0206094,0.0202373,0.0240056,0.00945956,0.00956812,0.0137551,0.0174769,0.00983174,0.0111034,0.0159882,0.00834303,0.00114755
1,0.0446411,0.0418778,0.0524321,0.0415785,0.0206197,0.0159212,0.0195024,0.0207893,0.0209489,0.0122302,0.0112525,0.013916,0.0136168,0.0107338,0.0107039,0.012719,0.0114022,0.00682335
1,0.0476168,0.0503946,0.047891,0.0424188,0.022342,0.0233077,0.0194091,0.0179904,0.021579,0.0123513,0.0105272,0.00897732,0.0108252,0.0128162,0.00768974,0.0112544,0.0106226,0.00380314
1,0.0480272,0.0497692,0.0443951,0.0336382,0.0261301,0.0217838,0.0161571,0.0206167,0.0231774,0.0107743,0.0114363,0.0120721,0.0137967,0.0185959,0.00922394,0.0124641,0.00756903,0.00485149
1,0.0507474,0.0440578,0.0546267,0.0425406,0.0192034,0.0220645,0.019295,0.0200485,0.0209547,0.0123814,0.0101006,0.0120454,0.0140411,0.00722926,0.0119639,0.0126767,0.0107624,0.00290189
1,0.0477706,0.0472193,0.0464802,0.0381112,0.0266102,0.0165625,0.0158985,0.0216114,0.0225761,0.0116514,0.00930856,0.0108746,0.0102858,0.0144702,0.00957166,0.0101855,0.0104612,0.00898282
1,0.055857,0.0390281,0.048471,0.0475185,0.0203907,0.0216332,0.021081,0.0196038,0.0200732,0.0151584,0.00993995,0.00967764,0.0128115,0.0104784,0.0125768,0.00813143,0.0141782,0.00706841


y
0
0
0
0
0
0
0
0
0
0
1
1
1
1
1
1
1
1
1
1

.



Relevant Pages

  • Re: Update Query; But keep rest of field...
    ... Left$ and Right$ act like a scissors, counting from left or right, and cutting the string at that point, returning just that portion of the string. ... Comparing "Fitting ABS" with Left$will tell you if the beginning matches the eleven characters in "Fitting ABS". ... Will take "Fit ABS", and add everything except the first eleven characters, of tblItem.to it. ... Using the Design View to create a query, ...
    (microsoft.public.access.queries)
  • Re: Update Query; But keep rest of field...
    ... Using the Design View to create a query, ... UPDATE tblItem SET tblItem.itemMfgItemNumber = "Fit ABS" ... The goal is to find all items that START with "Fitting ABS" and change them ... Fitting PVC Sch40 1" 90 ELL SxS ...
    (microsoft.public.access.queries)
  • Re: Singular matrix in Invert_RtR
    ... Singular matrix in Invert_RtR ... The problem is that Gnuplot's fit uses a procedure known as Marquard-Levenberg algorithm, ... There is an alternative fitting algorithm, Simplex by Nelson & Mead, which is much more stable and can also minimise Chi^2 or the median of residuals in those cases where minimising the sum of squares is statistically inappropriate. ... Unfortunately, Simplex can not directly calculate the standard deviations for the parameters, that is probably the reason why many scientific fitting programs do not use it. ...
    (comp.graphics.apps.gnuplot)
  • Re: how can I alter the x-data of a cfit object after fitting?
    ... These three numbers correspond to the coefficients ... improve the quality of the fit. ... I am fitting histogram data with a Gaussian, ... calibration first and then feed the data to the fitting ...
    (comp.soft-sys.matlab)
  • Re: Fitting Nonlinear Complex Data(How do you do this?)
    ... Matlab documentation provides a 'curve fitting guide' where ... "For fitting curves in a case involving complex data set, ... I saw the help on how to fit nonlinear parameters I did not ...
    (comp.soft-sys.matlab)