Re: ping Jörg: on what 'raw data' is



On Fri, 11 Dec 2009 17:18:49 -0800, Jon Kirwan
<jonk@xxxxxxxxxxxxxxxxxxx> wrote:

On Fri, 11 Dec 2009 16:05:52 -0800, Joerg <invalid@xxxxxxxxxxxxxxx>
wrote:

Jon Kirwan wrote:
Hi. I just came across (for entirely different reasons) two articles
by someone I very slightly know, Bob Grumbine. His blogs neatly
address the discussion you and I had regarding raw data.

This one nicely discusses _some_ of the problems:
http://moregrumbinescience.blogspot.com/2009/11/data-set-reproducibility.html

As does this one:
http://moregrumbinescience.blogspot.com/2009/11/where-is-surface.html

On both links, a poster, EliRabett, makes some interesting comments.

On the first, EliRabett writes, "Any auditor who demanded all of the
data would be fired summarily. A good audit provides reasonable
certainty that the records are in good shape without tying up the firm
forever. McIntyre's 'audit' demands all of the records at the start.
The purpose is to burden the scientist. He then yells and screams
about every little dot and jot almost 99% of which is due to his not
understanding what was being done. At the end maybe one or two points
remain. October's Briffa Fest is an excellent example."

On the second, EliRabett writes, "What the surface of a liquid is, is
by itself an interesting question. With the exception of liquids that
have vanishing vapor pressure such as gallium and mercury (at room
temperature), it is hard to say, as the molecules in the vapor can
interact with the molecules in the bulk over several atomic distances.
Ron Shen did a lot of early work on this with a very imaginative
technique. Since the molecules on the surface are in an anisotropic
environment, only they can participate in non-linear sum frequency
generation." And he then goes on to add an interesting comment from a
paper that I won't quote here. But if you don't see the relevance of
EliRabett's comment to the blog written by Bob, then you missed
understanding some of the difficulties that face instrumentation
designers and those scientists who must then also interpret their raw
data.

Regardless of your view, at the very least I hope you find these two
blogs interesting to read.

They are interesting. However, I don't think Eli has ever witnessed the
audit of a business. I have, many times. The auditors used me at times
to translate stuff for them because I spoke Dutch and they didn't on
days when the South African auditor was not there. I was amazed how fast
and how complete they crunched through tons of data.

I'm not going to argue that. I have no real knowledge here. I wasn't
thinking so much about the details of his being wrong about some other
specialty, as about the general thrust. Earlier, I talked about the
fact that scientists replicate, not by duplicating the steps of
another, but instead by being creative about another cross-cutting
approach that attacks a similar problem. His comments, though they
don't specifically say so, are really addressed in that vein.

Furthermore, do we know McIntyre demanded _all_ the data at once? Many
other sources say very different things. Just one example:

http://jennifermarohasy.com/blog/2009/08/raw-temperature-data-no-longer-available/

Again, I was more focused on having you read Bob's article. I
selected Eli's, only as a segue to the article.

Quote: "... Steve McIntyre was denied access to specific data files at
the Climate Research Unit ...". Specific does not mean all.

I wouldn't know.

Secondly, the assertion that they do that to "burden the scientist" is
not a proper fact statement, it is a clear case of judgement and the
writer had better back that up. I don't see where he did. That lessens
my interest in a certain piece of writing rather dramatically because in
my eyes it makes it lose credibility, whether it's an answer to a blog
or whatever.

Well, we are talking cross-purposes.

Now tell me what you think about what Bob had to say in those articles
rather than focusing upon Eli. You make me sorry I even brought him
up.

We had earlier discussed the concept of 'raw data.' Bob talks a
little about that subject. What did you think?

Just drop Eli into a bucket for all I care.

Jon

Hi Jon,
Hope you don't mind my stepping in.

There was a lot of good information in that blog, much of it more
telling than you might think because of the assumptions behind it.

First, there were all the data sets, many stored in a binary format
that was probably unique to this researcher. He worried about being
able to use that data later, as he could easily forget how to
translate it to a usable format if he lost or used the incorrect data
translator. The data was represented by the blogger as already
'evaluated' to throw out outliers and probably bad data. What ever
original observations they were based on didn't seem to be kept. This
concerns me, as that means we are never dealing with 'reality' (i.e.
the actual measurments) but instead we are working on a representation
of the representation. Every data set is actual a filtered data set,
with the filters being assumed to be correct. Those filters should be
the first thing documented, and should be very reproducible.

Next, there were all his concerns about platform. Now in software
design, this is a common problem, and many of the comments were about
how these problems have already been solved in professional software
circles. My concerns here were that the algorithms seemed so platform
sensitive, requiring exact builds to get the same results. if your
algorithms are that platform sensitive, it indicates serious stability
problems in those underlying algorithms.

Finally, on auditing, an auditor needs access to ANY data he desires,
and it is a huge red flag if some is not available. Now, a business
auditor doesn't review every single data point, but does review a very
large and carefully sampled set of them. When there are parts not
available, it becomes pretty much impossible to verify anything.

Charlie
.



Relevant Pages

  • Re: ping Jörg: on what raw data is
    ... EliRabett's comment to the blog written by Bob, ... he isn't always sure that each and every slight ... raw data must be subjected to the light of theory. ...
    (sci.electronics.design)
  • Re: "Blueprinting" VW heads
    ... deciding to explore doing this to my own engine. ... I would like to clean up the casting imperfections ... The link brings up all posts on RAMVA by veeduber (AKA Bob Hoover) ... Also, at Bob Hoover's Blog, once you get to the link below, use your ...
    (rec.autos.makers.vw.aircooled)
  • Re: New Puter At Work Came With 2007 and I Hate It!
    ... Hi Bob! ... Jensen Harris' blog. ... FWIW, even if you're now 'Ribbon free', while the blog from the Office UI team has a lot of pre-release Office 2007 Ribbon/articles ... The panel discussion link on User Experience design includes the 'Chief Engineer' for Firefox design for ...
    (microsoft.public.office.misc)
  • Re: Office 2007 toolbar... again
    ... Bob, re the missing comments, see the post I just wrote a few minutes ago. ... I've posted to the Mix08 blog. ... Mix08 conference this Friday in Las Vegas and the topic is the ribbon:) and that blog topic is open for comments and it appears ... in a rush to get a project status sheet together for an unexpected meeting called by a senior mgr today one of my folks who ...
    (microsoft.public.office.misc)
  • Only 2 DVDs for the 4 Horseman set....
    ... RAW Match Announced, Bischoff, New JR Blog, Horsemen DVD, More ... Johnny Rodz is expected to appear on the TNA PPV this Sunday. ... WCW Nitro announcer Tony Shiavone read the results of Mick Foley's first WWE ...
    (rec.sport.pro-wrestling)