Re: A weird requirement - unable to think of anything...



On Oct 14, 10:14?pm, Legend <rahul...@xxxxxxxxx> wrote:
On Oct 14, 1:40 am, "mensana...@xxxxxxxxxxx" <mensana...@xxxxxxx>
wrote:





On Oct 13, 11:20?pm, Legend <rahul...@xxxxxxxxx> wrote:

Suppose that I have some data:

12,30
12,45
2,3
7,8
3,9
30, 8
45,54
56,65

Where (a,b) indicates that a is connected to b. I want to get all
connected nodes to one point. For instance, the output of the above
example should be something like:

Group 1
2,3
3,9
Group 2
12,30
12,45
30,8
7,8
Group 3
45,54
Group 4
56,65

The order is not important as long as the whole group stays together.
Reason why they are grouped like that:

1. 2 is connected to 3 and 3 is connected to 9 and so we put all the
three, i.e. 2,3,9 into one group.
2. 12 is connected to 45 and 12 is also connected to 30 so we put
these in the same group but 30 is connected to 8 and 8 is connected to
7 so ultimately we put all these into the same group.
3. 45 and 54 are connected but not related to any other numbers so we

Incorrect, 45 is connected to group 2.

put them into another group
4. 56 and 65 are connected but not related to any other numbers so we
put them into another group

I am unable to figure out an algorithm for this. Can someone guide me?

You can do it with sets.

# Python

def link_sets(the_sets):
done = False
outer_limit = len(the_sets)-1
inner_limit = len(the_sets)
if inner_limit == 1: return []
m = 0
n = 1
while not done:
i = the_sets[m][0].intersection(the_sets[n][0])
if i:
return [m,n]
n += 1
if n == inner_limit:
m += 1
n = m + 1
if m == outer_limit:
return []

the_data = [(12,30),(12,45),(2,3),(7,8),(3,9),(30,8),(45,54),(56,65)]

the_sets = []

for d in the_data:
temp_set = set(d)
if not the_sets: # when list is empty
the_sets.append([temp_set,[d]])
else:
last_set = len(the_sets)
found = False
done = False
current = 0
while not done:
ts = the_sets[current][0]
i = ts.intersection(d)
if i: # the data matches an existing set
found = True
done = True
data_list = the_sets[current][1]
data_list.append(d)
set_union = [ts.union(d),data_list]
the_sets[current] = set_union
else:
current += 1
if current == last_set:
done = True
if not found: # doesn't match, create new set
the_sets.append([temp_set,[d]])

# first pass complete

done = False

# second pass, see if any of the sets intersect

while not done:
to_consolidate = link_sets(the_sets)
if to_consolidate:
data_list = the_sets[to_consolidate[0]]
[1]+the_sets[to_consolidate[1]][1]
set_union = [the_sets[to_consolidate[0]]
[0].union(the_sets[to_consolidate[1]][0]),data_list]
the_sets[to_consolidate[0]] = set_union

the_sets.pop(to_consolidate[1])
else:
done = True

for s in the_sets: print s # the set followed by list of data points

## [set([54, 7, 8, 12, 45, 30]), [(12, 30), (12, 45), (30, 8), (45,
54), (7, 8)]]
## [set([9, 2, 3]), [(2, 3), (3, 9)]]
## [set([56, 65]), [(56, 65)]]

When I'm trying to run this program, it gives me a syntax error...

SyntaxError: invalid syntax
set_union = [the_sets[to_consolidate[0]]
set_union: command not found

Is that program supposed to work or is there something else to change?

It has to be Python 2.4 or later. It should work,
the ## lines at the end are actual program output.

The error message looks to be a broken line, Google
warpped the source code, even when you view original.

Try this version, I've kept the line length to about
40. Unfortunately, the Google cocksuckers no longer
feel it necessary to allow you to preview your post.

BTW, I have no idea whether this is an efficient
algorithm for large data sets, but my eyes glazed
over reading the other replies.

def link_sets(the_sets):
done = False
outer_limit = len(the_sets)-1
inner_limit = len(the_sets)
if inner_limit == 1: return []
m = 0
n = 1
while not done:
i = the_sets[m][0].intersection \
(the_sets[n][0])
if i:
return [m,n]
n += 1
if n == inner_limit:
m += 1
n = m + 1
if m == outer_limit:
return []
the_data = [(12,30),(12,45),(2,3), \
(7,8),(3,9),(30,8), \
(45,54),(56,65)]
the_sets = []
for d in the_data:
temp_set = set(d)
if not the_sets:
the_sets.append([temp_set,[d]])
else:
last_set = len(the_sets)
found = False
done = False
current = 0
while not done:
ts = the_sets[current][0]
i = ts.intersection(d)
if i:
found = True
done = True
data_list = the_sets[current][1]
data_list.append(d)
set_union = [ts.union(d), \
data_list]
the_sets[current] = set_union
else:
current += 1
if current == last_set:
done = True
if not found:
the_sets.append([temp_set,[d]])
# first pass complete
done = False
# second pass, see if any of the
# sets intersect
while not done:
to_consolidate = link_sets(the_sets)
if to_consolidate:
data_list = \
the_sets \
[to_consolidate[0]] \
[1] + the_sets \
[to_consolidate[1]][1]
set_union = \
[the_sets \
[to_consolidate[0]][0] \
.union(the_sets \
[to_consolidate \
[1]][0]),data_list]
the_sets[to_consolidate[0]] = set_union
the_sets.pop(to_consolidate[1])
else:
done = True
for s in the_sets:
print s

.


Quantcast