Re: Big Table - Google's offer in large scale data handling
- From: "zzbunker@xxxxxxxxxxxx" <zzbunker@xxxxxxxxxxxx>
- Date: Fri, 13 Feb 2009 20:09:54 -0800 (PST)
On Feb 2, 3:55 pm, Ian Parker <ianpark...@xxxxxxxxx> wrote:
http://209.85.229.132/search?q=cache:zEsK8QInfUYJ:labs.google.com/pap....
This is a rare find. I have decided to make it a separate posting
rather than simply putting it in highlights.
The British Government has budgeted to spend £18bn UK on IT systems in
the next few years. What is large databases could be purchased off the
shelf. Big Table is an Internet Wide database that we can all purchase
from Google. It is in the state of, you approach Google, buy the data
starage and the number of virtual servers you need and they will do
the rest. The problem with other commercial database systems is that
they are not infinitely scalable. This one is. Big Table has been used
for such things as Google Earth.
At the moment the database just has 3 keys. It has a string (row)
which is the main key. It has a column which it recommends you use to
describe the type of data you are storing, and it contains a time
stamp.
This database can be used to store any type of information. In fact
any database is describable in terms of this simple retrieval system,
although Google is planning to give us a database similar to such
things as Oracle. The only difference being is that on Oracle you are
responsible for the hardware. On Google you simply buy terabytes and
virtual servers. Google are responsible for the functioning of all
real hardware, the backing up of data etc. etc. All you have to do is
buy a terabyte and start entering data. You can get a second terabyte
seamlessly when that one fills up.
Are there political and ethical questions? Yes and no. More complete
data can be collected about any individual. In one sense the ethical
question concerns the fact that Google works and is light years ahead
of any government system. How much data should be collected about us?
Remember that Google provides a timestamp. If we are dealing (say)
with criminal convictions the time stamp will force erasure, non
access etc. The expiration of convictions will simply be more
transparent. No real ethical question there beyond collecting
information at all. We need to decide politically how data is to be
used.
The second political question concerns the position of Google. Google
is a private enterprise company that is increasingly acquiring a
monopoly position. Sure the Google board (at least so far) have not
abused their position. If Google has this technology which no one else
possesses, it is a monopoly. Governments, as we have seen, are light
years behind. Clearly if Google is to be a monopoly supplier of this
technology regulations have to be in place to ensure competition.
There are two possible approaches.
A1) Force Google to sell its technology to the Government (say) and
encourage other people to set up large databases. I do not consider
this to be a very good approach as it would fragment effort. It would
mean breaking up the extremely powerful Google research team which is
the fact the goose that is laying golden eggs.
There is no reason for that, since the govenemnt invented that
database technology, not Google. Which is also why the people
with economic brains invented Optical Computers, Parallel
Processing,
XML, USB HDTV, CD,, DVD, Post Ford Batteries, and Post McDonald;s
Holograms,
rather than the Governemt, IBM, Intel, or Google.
A2) Impose price controls on Google's database technology. Force them
to sell space to competitors in such areas as machine translation and
search engine technology. This I would regard as a far better
approach.
I have the greatest respect for Larry Page, Sergi Brin and the person
who has just married into the company. The white bikini, which is what
she wore when she got married is interested in DNA and Biotech. I am
NOT accusing them of having done anything wrong, unlike Microsoft who
have used many unfair business practices. What I am saying though is
that the capitalist system demands competition. The trio all seem to
have a touch of idealism about them. They cannot however be binding on
their successors who could behave just like Microsoft.
The same competitive rules hold with specific products. Google need to
be rewarded fairly for what they have done to Biotech. However their
databases should be opened up via APIs to all comers. Perhaps the
rewards should come though the application of some sort of patent
system. Legally holders of patents are obliged to make them public. A
patent is invalid if it is simply sat upon. This should apply both to
their dictionaries and statistics in language translation and to DNA.
A royalty to Google for cancer drugs would be in order (assuming the
DNA database was a part of the discovery), but keeping the information
in house would not be.
http://cordis.europa.eu/ictresults/index.cfm/section/news/tpl/article...
Cancer in to a large extent due to genes which is why I have
specifically mentioned it.
In terms of translation I have been looking at ways to translate
Arabic - English myself. I feel that basically Google Translate is a
phrasebook on the "Big Table". It has outperformed anything else
though with monotonous consistency. It could be made considerably
better in the following way.
1) Take account of Arabic grammar. In Arabic the adjective comes after
the noun. Sometimes machine translations put the adjective after the
noun in English. This indicates a glorified phrasebook..
2) Use a feedback mechanism by which search engine technology feeds
into the translation. Look at the discussion on the Stephan Boltzmann
lawhttp://sites.google.com/site/aitranslationproject/deepknowled
feeding translation results back into the translation would improve
the standard of translation.
If Google does this AI will advance in ways that perhaps Google have
not imagined.
- Ian Parker
.
- References:
- Big Table - Google's offer in large scale data handling
- From: Ian Parker
- Big Table - Google's offer in large scale data handling
- Prev by Date: Re: Time Dilation with distance
- Next by Date: Re: Time Dilation with distance
- Previous by thread: Re: Big Table - Google's offer in large scale data handling
- Next by thread: Looking for sientists willing to confirm the existance of The Santa Clause using the sientific group think
- Index(es):
Relevant Pages
|