Re: Quantales, Dioids and Formal Language (was: Power Monoid)
- From: Rock Brentwood <markwh04@xxxxxxxxx>
- Date: Wed, 5 Aug 2009 16:59:06 -0700 (PDT)
A few minor points of correction and some further commentary:
On Aug 3, 8:55 pm, Rock Brentwood <markw...@xxxxxxxxx> wrote:
Denoting the least upper bound of a subset A of M by sum(A), the
result is a structure called a "dioid" (i.e. a semiring with unit in
which + is idempotent -- a + a = a). Because of the closure property,
the sum extends to an infinitary idempotent operation, A |-> sum(A).
The structure that is a dioid is P(M), itself; not the generic monoid
M.
The construction of algebras, this way, through an adjunction is a
standard exercise in category theory that amounts to building a "T-
algebra" from a given adjunction. The dioids D would, themselves, be
monoids -- like the generic garden variety monoid M described above --
but would also have a partial ordering with the above-mentioned least
upper bound property.
The categories D(A) (the notation is not standard or settled on, so
there was a slippage in the article), would be the T-algebras
corresponding to the functor A: Monoid -> Dioid.
In the "algebraic approach" papers, if I recall, the properties listed
in the article were called:
A0 * A(M) is a subset of P(M)
A1 * Every finite subset of M is in A(M)
A2 * A(M) is closed under set-wise products
A3 * A(M) be closed under unions from A(A(M))
A4 * A(M) be closed under homomorphisms
A5 * closure under A-substitutions
A property, A6, was listed in the paper:
A6 * Surjectivity -- a surjective monoid homomorphism f: M -> N yields
a surjective map f:A(M) -> A(N).
This is the closest -- within this framework -- one gets to the
"inverse morphism" property of classical formal language and automata
theory.
A7 * Finite generativity
was mentioned in the papers but never developed. The "context-free"
and "turing" functors, C and T respectively, both satisfy finite
generativity, by the very natural of context-free and general
grammars.
(The notion of grammars, itself, had to be generalized to monoids.
This is an obvious notion, but seems to be absent in the literature.)
A concrete construction for R -> C is obtained as follows:
(1) Take a R-closed Kleene algebra K
(2) Enlargen it to its free extension K[D2], where D2 = { u0, u1, v0,
v1 }
(3) Subject K[D2] to the relations r2:
k d = d k, k in K, d in D2
u0 v0 = 1 = u1 v1
u0 v1 = 0 = u1 v0
v0 u0 + v1 u1 = 1
The relation to the Chomsky-Schuetzenberger Theorem is as follows. For
a given "alphabet" X, one extends X by adding in the set D2. When
expressed in terms of generic monoids, this amounts to extending the
free monoid X* freely to X*[D2] = (X union D2)* (this isomorphism, in
turn, requires that D2 be disjoint from X).
For generic monoids, one would extend M to M[D2].
The theorem states that C X* is "represented" within R X*[D2] by
specialized regular expressions that I end up calling the "Chomsky-
Schuetzenberger kernel" of the original context free language in C X*.
This generalizes to a "representation" of C(M) in R(M[D2]).
The representation is in quotes because it's not a strict equality of
sets of related structures or a strict embedding. Rather, the result
is only obtained by a map of the form
L in P(M[D2]) |-> {{L}} = h(L intersect L(D2))
which takes an arbitrary subset of M[D2] and performs the following
operation on it:
(1) Intersects it with the context-free language L(D2) given by the
grammar
S -> u0 S v0 | u1 S v1 | x (for x in X) | S S | e (the empty word)
(2) "Erases" the operator symbols -- h: M[D2] -> M maps D2 to {e} and
its restriction on M is the idfentity I_M.
The relation to the algebraic representation is that the mapping may
be described effectively by an equivalence relation over P(M[D2])
generated by the relations which mimick r2, itself:
{u0v0} = {e} = {u1v1}, {u0v1} = {} = {u1v0}
{v0u0, v1u1} = {e}
{dx} = {xd}, for x in X, d in D.
This equivalence almost captures the relation between a Chomsky-
Schuetzenberger kernel L in R(M[D2]) and its map {{L}} -- the proviso
has to do with the unmatched v's on the left, and unmatched u's on the
right.
This observation, in turn, leads to what might be termed a "normal
form theorem".
Through this equivalence, a close relation between the rational
subsets R(M[D2]) and the context-free subsets C(M) can be established.
In fact, there is a "normal form" representation of the elements of R(M
[D2]) as finite sums of the form
sum(q1) sum(p1) + sum(q2) sum(p2) + ... sum(qn) sum(pn)
where the q's and p's reside in P(P(M[D2])) as restricted subsets of
the form:
q's in R(C(M)[v0,v1])
p's in R(C(M)[u0,u1]).
That is, the q's are regular expressions formed of context-free
subsets of M, along with v0 and v1; while the p's are regular
expressions formed of the context-free subsets of M along with u0 and
u1.
A matrix representation of P(M[D2]) within the infinite dimensional
matrix algebra over P(M) (noting that P(M) is closed under infinite
sums) can be given.
A kind of "Schur's" lemma allows one to show, through such a
representation, that the only elements of P(M[D2]) that commute with
the elements of D2 are elements from P(M).
The only "normal form" that commutes with the elements of D2, are thus
elements of C(M).
In the Chomsky-Schuetzenberger Theorem, the expression of a context-
free language L in C(M) as a "regular language" in R(M[D2]) was the
closest the classical theory came to devising an extension of the then-
standard theory of regular expressions up the Chomsky Hierarchy to
Chomsky type 2. These observations complete that process -- nearly 40
years after that step was taken.
A standard representation for a language L given by a context-free
grammar G = (Q, X, H, S) over a monoid M can be given. For reference,
a grammar G over a monoid M is defined by a set of variables Q, a
distringuished subset X of M (which is not really a necessary
ingredient in the definition), a relation H over the free extension <X>
[Q] of the closure of <X> in M, and a "starting configuration" S in <X>
[Q].
The relation H is closed to a relation =>, by closing it under
products (i.e. if (a,b) in H then cad => cbd) and under reflexivity
and transitivity. The subset of M derived by the grammar is the set L
(G) = { m in M: S => m }. The subset H is assumed to be finite.
For a context-free grammar the elements of H comprise "one-step" rules
of the form q -> b, where q is a single variable in Q, and b an
arbitrary member of <X>[Q]. One can then write the Chomsky-
Schuetzenberger kernel of L(G) as
u(0) u(S) (H' + X')* v(0)
where ()* denotes the Kleene star. The expressions H' and X' are,
respectively,
H' = sum (v(q) u(bn) ... u(b2) u(b1): (q, b1 b2 ... bn) in H)
X' = sum v(x) x: x in X
The elements u(z), v(z) for z in X union Q union {0} are encoded by a
suitable binary code in u0, u1, v0, v1, such that v(z) is the reversal
of u(z), for each z.
This particular representation mimicks the process of a top-down
parser. Other expressions can be devised to mimick other parsing or
generation strategies (e.g. an "LR" expression suitable for LR
parsers).
All the above is done within the setting of the "concrete" C-algebras
-- that is, the members of the cateoory D(C) which have the form C(M)
for some monoid M. The tricky part is generalizing it to cover all the
algebras in the category D(C) and to show that the construction
mentioned in the previous article, when starting with a R-algebra K,
actually DOES land you to a result K' that resides in the category D
(C).
.
- References:
- Quantales, Dioids and Formal Language (was: Power Monoid)
- From: Rock Brentwood
- Quantales, Dioids and Formal Language (was: Power Monoid)
- Prev by Date: Quantales, Dioids and Formal Language (was: Power Monoid)
- Next by Date: number theory stuckness
- Previous by thread: Quantales, Dioids and Formal Language (was: Power Monoid)
- Next by thread: number theory stuckness
- Index(es):
Relevant Pages
|