How to adapt two tests so they have the same difficulty level?



I am a student of linguistics working on my thesis experiment and I
have a problem analyzing my data. I hope there's somebody out there
who knows more about this subject than I do and can help me with some
advice.

I designed an experiment testing around 250 subjects in two listening
conditions (X & Y). I used a within-subjects design: so each subject
did two listening tests, one in condition X and one in condition Y. The
conditions, order of the tests and test versions (A&B) were crossed so
there were basically 4 versions, each subject taking one of these
versions:
1: first test - test version A - condition X
second test - test version B - condition Y
2: first test - test version B - condition Y
second test - test version A - condition X
3: first test - test version A - condition Y
second test - test version B - condition X
4: first test - test version B - condition X
second test - test version A - condition Y

However, after the tests had been administered I found out that test
version A & B are not of the same difficulty which is essential to my
analysis since I am concerned with finding out if there's any
difference between condition X and condition Y and not between the two
tests versions A & B: they are supposed to test exactly the same thing.
When I analyze different kinds of groups of subjects, the amount of
subjects taking one of the 4 ways of administering the tests is not the
same so the difference between test version A & B will interfere with
my results.
I have to be able to do calculate the difference between the scores a
person had for condition X and Y and use that to conduct further
statistical testing.

My question: how can I make test version A & B more similar regarding
their level of difficulty after the tests have been administered?

I think I can use z-scores, but I am not sure what implication that has
for the rest of my analyses. Can I still calculate the difference
between condition X and Y using z-scores and use this value as the
basis for further analysis?
Are there any other ways of adapting two tests so they will become more
similar?

It would be great if you can spare the time to help me out. I know some
of the basis of statistics but it would help me if you can explain your
answer in layman terms so I can follow it.
Thanks!
-Chrissy

.



Relevant Pages

  • Re: Return code and batch files...
    ... echo first test = %errorlevel% ... echo second test = %errorlevel% ... Copyright Microsoft Corporation. ...
    (microsoft.public.scripting.wsh)
  • Re: Jaffers Strokeplay
    ... But after seeing the first test and second test, ... with his stylish / wristy stroke play. ... I feel that he is the most stylish player in this team. ...
    (rec.sport.cricket)
  • Return code and batch files...
    ... echo first test = %errorlevel% ... echo second test = %errorlevel% ... echo third test = %errorlevel% ...
    (microsoft.public.scripting.wsh)