Characteristic samples

Characteristic samples is a concept in the field of grammatical inference, related to passive learning. In passive learning, an inference algorithm is given a set of pairs of strings and labels , and returns a representation that is consistent with . Characteristic samples consider the scenario when the goal is not only finding a representation consistent with , but finding a representation that recognizes a specific target language.

A characteristic sample of language is a set of pairs of the form where:

  1. if and only if
  2. if and only if

Given the characteristic sample , 's output on it is a representation , e.g. an automaton, that recognizes .

Formal Definition

edit

The Learning Paradigm associated with Characteristic Samples

edit

There are three entities in the learning paradigm connected to characteristic samples, the adversary, the teacher and the inference algorithm.

Given a class of languages   and a class of representations for the languages  , the paradigm goes as follows:

  • The adversary   selects a language   and reports it to the teacher
  • The teacher   then computes a set of strings and label them correctly according to  , trying to make sure that the inference algorithm will compute  
  • The adversary can add correctly labeled words to the set in order to confuse the inference algorithm
  • The inference algorithm   gets the sample and computes a representation   consistent with the sample.

The goal is that when the inference algorithm receives a characteristic sample for a language  , or a sample that subsumes a characteristic sample for  , it will return a representation that recognizes exactly the language  .

Sample

edit

Sample   is a set of pairs of the form   such that  

Sample consistent with a language

edit

We say that a sample   is consistent with language   if for every pair   in  :

  1.  
  2.  

Characteristic sample

edit

Given an inference algorithm   and a language  , a sample   that is consistent with   is called a characteristic sample of   for   if:

  •  's output on   is a representation   that recognizes  .
  • For every sample   that is consistent with   and also fulfils  ,  's output on   is a representation   that recognizes  .

A Class of languages   is said to have charistaristic samples if every   has a characteristic sample.

edit

Theorem

edit

If equivalence is undecidable for a class   over   of cardinality bigger than 1, then   doesn't have characteristic samples.[1]

Proof

edit

Given a class of representations   such that equivalence is undecidable, for every polynomial   and every  , there exist two representations   and   of sizes bounded by  , that recognize different languages but are inseparable by any string of size bounded by  . Assuming this is not the case, we can decide if   and   are equivalent by simulating their run on all strings of size smaller than  , contradicting the assumption that equivalence is undecidable.

Theorem

edit

If   is a characteristic sample for   and is also consistent with  , then every characteristic sample of  , is inconsistent with  .[1]

Proof

edit

Given a class   that has characteristic samples, let   and   be representations that recognize   and   respectively. Under the assumption that there is a characteristic sample for  ,   that is also consistent with  , we'll assume falsely that there exist a characteristic sample for  ,   that is consistent with  . By the definition of characteristic sample, the inference algorithm   must return a representation which recognizes the language if given a sample that subsumes the characteristic sample itself. But for the sample  , the answer of the inferring algorithm needs to recognize both   and  , in contradiction.

Theorem

edit

If a class is polynomially learnable by example based queries, it is learnable with characteristic samples.[2]

Polynomialy characterizable classes

edit

Regular languages

edit

The proof that DFA's are learnable using characteristic samples, relies on the fact that every regular language has a finite number of equivalence classes with respect to the right congruence relation,   (where   for   if and only if  ). Note that if  ,   are not congruent with respect to  , there exists a string   such that   but   or vice versa, this string is called a separating suffix.[3]

Constructing a characteristic sample

edit

The construction of a characteristic sample for a language   by the teacher goes as follows. Firstly, by running a depth first search on a deterministic automaton   recognizing  , starting from its initial state, we get a suffix closed set of words,  , ordered in shortlex order. From the fact above, we know that for every two states in the automaton, there exists a separating suffix that separates between every two strings that the run of   on them ends in the respective states. We refer to the set of separating suffixes as  . The labeled set (sample) of words the teacher gives the adversary is   where   is the correct lable of   (whether it is in   or not). We may assume that  .

Constructing a deterministic automata

edit

Given the sample from the adversary  , the construction of the automaton by the inference algorithm   starts with defining   and  , which are the set of prefixes and suffixes of   respectively. Now the algorithm constructs a matrix   where the elements of   function as the rows, ordered by the shortlex order, and the elements of   function as the columns, ordered by the shortlex order. Next, the cells in the matrix are filled in the following manner for prefix   and suffix  :

  1. If  
  2. else,  

Now, we say row   and   are distinguishable if there exists an index   such that  . The next stage of the inference algorithm is to construct the set   of distinguishable rows in  , by initializing   with   and iterating from the first row of   downwards and doing the following for row  :

  1. If   is distinguishable from all elements in  , add it to  
  2. else, pass on it to the next row

From the way the teacher constructed the sample it passed to the adversary, we know that for every   and every  , the row   exists in  , and from the construction of  , there exists a row   such that   and   are indistinguishable. The output automaton will be defined as follows:

  • The set of states is  .
  • The initial state is the state corresponding to row  .
  • The accepting states is the set  .
  • The transitions function will be defined  , where   is the element in   that is indistinguishable from  .

Other polynomially characterizable classes

edit
  • Class of languages recognizable by multiplicity automatons[4]
  • Class of languages recognizable by tree automata[5]
  • Class of languages recognizable by multiplicity tree automata[6]
  • Class of languages recognizable by Fully-Ordered Lattice Automata[7]
  • Class of languages recognizable by Visibly One-Counter Automata[8]
  • Class of fully informative omega regular languages[9][10]

Non polynomially characterizable classes

edit

There are some classes that do not have polynomially sized characteristic samples. For example, from the first theorem in the Related theorems segment, it has been shown that the following classes of languages do not have polynomial sized characteristic samples:

  •   - The class of context-free grammars Languages over   of cardinality larger than  [1]
  •   - The class of linear grammar languages over   of cardinality larger than  [1]
  •   - The class of simple deterministic grammars Languages[1]
  •   - The class of nondeterministic finite automata Languages[1]

Relations to other learning paradigms

edit

Classes of representations that has characteristic samples relates to the following learning paradigms:

Class of semi-poly teachable languages

edit

A representation class   is semi-poly   teachable if there exist 3 polynomials  , a teacher   and an inference algorithm  , such that for any adversary   the following holds:[2]

  •   Selects a representation   of size   from  
  •   computes a sample that is consistent with the language that   recognize, of size bounded by   and the strings in the sample bounded by length  
  •   adds correctly labeled strings to the sample computed by  , making the new sample of size  
  •   then computes a representation equivalent to   in time bounded by  

The class of languages that there exists a polynomial algorithm that given a sample, returns a representation consistent with the sample is called consistency easy.

Polynomially characterizable languages

edit

Given a representation class  , and   a set of identification algorithms for  ,   is polynomially characterizable for   if any   has a characteristic sample of size polynomial of  's size,  , that for every  ,  's output on   is  .

Releations between the paradigms

edit

Theorem

edit

A consistency-easy class   has characteristic samples if and only if it is semi-poly   teachable.[1]

Proof
edit

Assuming   has characteristic samples, then for every representation  , its characteristic sample   holds the conditions for the sample computaed by the teacher, and the output of   on every sample   such that   is equivalent to   from the definition of characteristic sample.

Assuming that   is semi-poly   teachable, then for every representation  , the computed sample by the teacher   is a characteristic sample for  .

Theorem

edit

If   has characteristic sample, then   is polynomially characterizable.[1]

Proof
edit

Assuming falsely that   is not polynomially characterizable, there are two non equivalent representations  , with characteristic samples   and   respectively. From the definition of characteristic samples, any inference algorithm   need to infer from the sample   a representation compatible with   and  , in contradiction.

See also

edit

References

edit
  1. ^ a b c d e f g h De La Higuera, Colin (1997). "[No title found]". Machine Learning. 27 (2): 125–138. doi:10.1023/A:1007353007695.
  2. ^ a b Goldman, Sally A.; Mathias, H.David (April 1996). "Teaching a Smarter Learner". Journal of Computer and System Sciences. 52 (2): 255–267. doi:10.1006/jcss.1996.0020. ISSN 0022-0000.
  3. ^ Oncina, J.; García, P. (January 1992), Inferring Regular Languages in Polynomial Updated Time, Series in Machine Perception and Artificial Intelligence, vol. 1, WORLD SCIENTIFIC, pp. 49–61, doi:10.1142/9789812797902_0004, ISBN 978-981-02-0881-3, retrieved 2024-05-21
  4. ^ Beimel, Amos; Bergadano, Francesco; Bshouty, Nader H.; Kushilevitz, Eyal; Varricchio, Stefano (May 2000). "Learning functions represented as multiplicity automata". Journal of the ACM. 47 (3): 506–530. doi:10.1145/337244.337257. ISSN 0004-5411.
  5. ^ Burago, Andrey (1994). "Learning structurally reversible context-free grammars from queries and counterexamples in polynomial time". Proceedings of the seventh annual conference on Computational learning theory - COLT '94. New York, New York, USA: ACM Press. pp. 140–146. doi:10.1145/180139.181075. ISBN 0-89791-655-7.
  6. ^ Habrard, Amaury; Oncina, Jose (2006), Sakakibara, Yasubumi; Kobayashi, Satoshi; Sato, Kengo; Nishino, Tetsuro (eds.), "Learning Multiplicity Tree Automata", Grammatical Inference: Algorithms and Applications, vol. 4201, Berlin, Heidelberg: Springer Berlin Heidelberg, pp. 268–280, doi:10.1007/11872436_22, ISBN 978-3-540-45264-5, retrieved 2024-05-20
  7. ^ Fisman, Dana; Saadon, Sagi (2022), "Learning and Characterizing Fully-Ordered Lattice Automata", Automated Technology for Verification and Analysis, Cham: Springer International Publishing, pp. 266–282, doi:10.1007/978-3-031-19992-9_17, ISBN 978-3-031-19991-2, retrieved 2024-05-20
  8. ^ Berman, Piotr; Roos, Robert (October 1987). "Learning one-counter languages in polynomial time". 28th Annual Symposium on Foundations of Computer Science (SFCS 1987). IEEE. pp. 61–67. doi:10.1109/sfcs.1987.36. ISBN 0-8186-0807-2.
  9. ^ Angluin, Dana; Fisman, Dana (2022), Constructing Concise Characteristic Samples for Acceptors of Omega Regular Languages, arXiv:2209.09336, doi:10.1007/978-3-319-11662-4_10
  10. ^ Angluin, Dana; Fisman, Dana; Shoval, Yaara (2020), "Polynomial Identification of $$\omega $$-Automata", Tools and Algorithms for the Construction and Analysis of Systems, Cham: Springer International Publishing, pp. 325–343, doi:10.1007/978-3-030-45237-7_20, ISBN 978-3-030-45236-0