Re: data scaling and validation for learning classifier
- From: Greg Heath <heath@xxxxxxxxxxxxxxxx>
- Date: Thu, 23 Apr 2009 06:48:37 -0700 (PDT)
On Apr 22, 4:23 pm, Tim <timlee...@xxxxxxxxx> wrote:
Hi,
I now scale the features before feeding them to classifier. The
scales are computed on training set and stored and then applied to
test set.
OK
I also tune some parameter of the classifier using cross
validation.
This requires a separate hold out set that is neither used
for training nor testing.
I was wondering which way is proper regarding the order
between cross validation and data scaling:
1. first scale the features in the whole training set,
No.
then do the
cross validation. When testing use the same scales on the the training
set.
Not clear. Separate validation and testing XVALs? OR
validation and testing within one XVAL experiment?
2. in each step of cross validation, just before feeding the shrinking
What does shrinking mean ???
training set to training, do the scaling, and the specific scales only
apply to the corresponding validation set. After cross validation, do
another scaling on the original training set when apply the tuned the
parameter to train on the original training set. These scales will be
applied to the testing set.
Very unclear.
I think this question also applies to any kind of preprocessing
transformation besides scaling.
Thanks and regards!
f-fold XVAL:
1. Randomly partition the data into f subsets
2. At each stage
a. Combine f-2 subsets for training (i.e., determining
scale factors and regression coefficients)
b. Use 1 holdout subset for validation (i.e., tuning model
topology and learning algorithm parameters)
c. Use the remaining holdout subset for testing (estimating
performance parameters).
3. Obtain the summary stats (e.g., min,median,mean,stdv,max)
of the f performance estimates.
Therefore, at each stage,
1. There is a separate scaling using parameters estimated
from the training subset.
2. There may be multiple validation trials to determine
topology and learning algorithm parameters.
Notice that, for each of the f test subsets, there are
f-1 ways to choose a validation subset. Some experimenters
just make sure that the f pair selections are unique. Others
use all f*(f-1) pair selections to try to obtain more precision.
I haven't seen any comparisons of the two techniques.
However, Warren Sarle has suggested averaging over
M separate repartitioned f-fold XVAL experiments with
f unique val/tst combinations instead of using f*(f-1)
combinations in one XVAL experiment.
Hope this helps.
Greg
.
- Follow-Ups:
- References:
- Prev by Date: Hierarchical Bayesian across SKUs for measurement of Price elasticity
- Next by Date: Distribution model for reaction times
- Previous by thread: data scaling and validation for learning classifier
- Next by thread: Re: data scaling and validation for learning classifier
- Index(es):