Re: Cross-validation & ridge regression R-programming
From: Michael LT (ewtc82_at_yahoo.com)
Date: 02/07/05
- Next message: KT: "Taking the average of an average"
- Previous message: Alkan Sousal: "eigenvalue of expectation vs expectation of eigenvalue"
- In reply to: Marc Schwartz: "Re: Cross-validation & ridge regression R-programming"
- Next in thread: Marc Schwartz: "Re: Cross-validation & ridge regression R-programming"
- Reply: Marc Schwartz: "Re: Cross-validation & ridge regression R-programming"
- Messages sorted by: [ date ] [ thread ]
Date: Mon, 7 Feb 2005 13:46:29 +0000 (UTC)
Thanks Marc. I really appreciate you for the direction of this
discussion. No offence, but I really think this forum has helped me
tremendously to learn from other people's thoughts.
Anyway, I've tried all the functions you suggested before. May I ask,
if you don't mind, if there is any special package or command I have
to use to plot the graph of prediction test error versus the degree of
freedom for the test?
Thanks a bunch.
On Sun, 06 Feb 2005 20:52:08 GMT, Marc Schwartz wrote:
>Michael LT wrote:
>> Thanks Paul for your reply. Do appreciate it.
>>
>> Anyway,
>> I received a mail stating the unhappiness of me trying to put up an
>> assignment question for help.
>
>For the record, that e-mail came from me, as a query for
clarification.
>I did so, largely because of the involvement of R in the query.
>
>As I noted privately, this forum is not for the posting of homework
>problems. Both the nature of your query and the fact that it was
posted
>from The Math Forum (as noted in your post headers) provided some
clues.
>
>In the case of TMF, the increasing traffic eminating from there has
>become an issue here and it has been communicated back to the folks
at
>TMF. They have indicated that they will be making some changes
sometime
>this Spring to the organization of their site to reduce such traffic.
>
>I am not an academician, but there are many others here who are, who
>would be more resistant in responding.
>
>Hence, I also pointed out to you that besides it being a weekend,
that
>this is a likely reason for the lack of direct replies.
>
>> However, to clear up some doubts about
>> this, I was just quite unsure and confused about some concepts that
>> are required for the use of cross-validation and ridge-regression.
>> Thus, I asked for the MOTIVATIONS towards solving those problems.
>
>As was noted, the motivation is that presumably your professor wishes
>you to learn about methods to deal with colinearity in regression
models
>(the use of ridge regression being one way to deal with it) and how
to
>validate regression models using one (k-fold CV) of several
techniques
>to do so.
>
>> On
>> the R-programming part, I just needed someone to verify if I am in
the
>> right track, and if I'm not, to point out the mistakes and I'll
>> re-think of the next step of action.
>
>As I also mentioned in my e-mail, I referred you to the r-help e-mail
>list, which is the primary forum for R specific assistance. This
forum
>is for non-software specific queries.
>
>While there are R users who read and post here (me being one of
them),
>you will avail yourself of more focused and timely support by
utilizing
>R specific resources.
>
>Just be acutely aware that on the r-help list, there will be a
similar
>hesitancy to responding to homework related queries, so you may wish
to
>consider how you post your questions, should you decide to do so.
>
>> This post is by no means a way for me to get an answer from anyone.
>> It's just to clarify doubts that I may have with this certain
>> problems. Afterall, this is a forum for us to share whatever
knowledge
>> we have.
>> I think this is a non-issue, cos I am here in this forum to learn
from
>> other people's advice. I don't believe in having an easy way out,
but
>> I do believe that someone out there can help me clarify some doubts
>> that I may have. So, please try to understand the situation right
>> here. I'm not here to get answers for assignments, I'm here to
learn
>> of the motivation towards solving a problem. Hope it clears the air
>
>Well, I think that there area of motivation has been covered, but it
>seems strange to me that your professor didn't provide it within the
>context of whatever class it is you are taking.
>
>I'll provide some pointers on R, and will leave it to you to proceed
>from there. I'll also point you to both the documentation provided by
R
>Core, including An Introduction to R, as well as the contributed
>documentation by users, available from the main R site under
>"Documentation". There are also search engine resources there to
enable
>you to search the e-mail list archives, which will also be helpful.
>
>First, with respect to randomly subsetting the dataset, see the
sample()
>function (in the base R installation) which would allow you to
extract
>training and test observations. The function has an option to define
>whether you want the random number generation to be done with or
without
>replacement.
>
>See ?sample for more information and examples.
>
>While I do not engage in the use of ridge regression (just don't have
>the need), my brief read of the lm.ridge function is that it seems to
be
>primarily used for the selection of 'lambda' by using the plot() and
>select() functions, which are shown in the examples in ?lm.ridge.
>
>lm.ridge however, does not appear to provide any model diagnostics or
>prediction methods which would be required to take you to the next
step
>of model validation.
>
>So, one option would be to create your own function to generate the
>predicted values of your dependent variable based upon the
coefficients
>returned from lm.ridge, once you have determined the value(s) of
lambda
>to use.
>
>Another option would be to use the ols() function, which is in Frank
>Harrell's 'Design' package and is available for downloading and
>installation from CRAN. The ols() function has an argument called
>'penalty' which would enable you to specify the lambda value from
>lm.ridge as an argument. ols() provides more complete information
with
>respect to model fit and also has a predict method, which is
important
>for the k-fold CV part of the process.
>
>For the k-fold CV part of the process itself, you have at least two
>options (there are others available for differing types of models).
>
>The first would be the crossval() function, which is in the
'bootstrap'
>package on CRAN. bootstrap provides R ports of Robert Tibshirani's
>original S functions, the port having been done by Fritz Leisch and
now
>being maintained by Kjetil Halvorsen.
>
>The second option would be the errorest() function, which is in the
>'ipred' package, also on CRAN. ipred provides a variety of functions
by
>Andrea Peters and Torsten Hothorn for various modeling, prediction
and
>validation methods.
>
>Of the two, I would probably point you towards the second.
>
>HTH,
>
>Marc Schwartz
- Next message: KT: "Taking the average of an average"
- Previous message: Alkan Sousal: "eigenvalue of expectation vs expectation of eigenvalue"
- In reply to: Marc Schwartz: "Re: Cross-validation & ridge regression R-programming"
- Next in thread: Marc Schwartz: "Re: Cross-validation & ridge regression R-programming"
- Reply: Marc Schwartz: "Re: Cross-validation & ridge regression R-programming"
- Messages sorted by: [ date ] [ thread ]
Relevant Pages
|