paper-reading-group

Syntax and Structure in NLP

Overview


  1. Syntax in NLP
  2. LSTMs and Syntax (Linzen et al. 2016, TACL)
  3. Probing word representations (Hewitt & Manning 2019, EMNLP)
  4. BERTology (Jawahar et al. 2019, ACL)

Syntax in NLP


Syntax refers to the rules and grammar of the language whereas the Semantics refer to the meanings assigned with the units of the language. In Computational Linguistics, Syntax follows a hierarchical structure.

Ex: Words > Phrases > Clauses > Sentences > Paragraphs …

In NLP we use Dependency Grammar to get tree-like representations of the syntax at sentence-level. The dependency-tree represents words of a sentence connected to each other with some kind of “relation”. The superior word in the tree is called the “governor” and the inferior word in the tree is called the “dependent”. The top-most governing word in the sentence/phrase is called the “head” or the “ROOT” of that particular sentence/phrase.

syntax

Example Dependency Parsed Tree. {N: Noun, V: Verb, D: Determiner, A: Adjective, NP: Noun Phrase}

As far as deep learning is concerned, we have the following ways to deal with tree-like structures:

  1. Recursive Neural Networks (Earlier)
  2. Graph Neural Networks (More Recent)

But the most commonly used NLP models are still sequential in nature!

LSTMs and Syntax


  1. Recurrent NNs are intuitively good with sequential patterns (maybe good for semantics) (can capture things like n-gram co-occurrence)
  2. It needs to be inspected if Recurrent NNs can learn knowledge which is hierarchical in nature (like the syntax dependencies in natural language)
  3. (Linzen et al. 2016, TACL) was one of the earliest thorough work which inspected the ability of Recurrent NNs to capture such syntax-sensitive dependencies.

Assessing the Ability of LSTMs to Learn Syntax Sensitive Dependencies (Tal Linzen, Emmanuel Dupoux, Yoav Goldberg)


The goal of this work was to probe the ability of LSTMs to learn natural language hierarchical (syntactic) structures from a corpus without syntactic annotations (sentences from Wikipedia)

Subject-Verb Agreement Task

sva_results

Probing Word Representations

Designing and Interpreting Probes with Control Tasks (John Hewitt & Percy Liang)

probe_example

The problems with probing:

  1. How strong and complex should your probe be to inspect a particular linguistic property in your word representations?
  2. How much training data should you use while probing? (with enough data your probe would just memorize everything and give good results)
  3. The performance from the probing task, is it due to the hidden knowledge from the word representations or did your probe learn the knowledge pattern itself?

The most naive approach to handle these problems is to train the probe on random inputs and check probe’s standalone capacity to learn the linguistic property. But to handle these problems efficiently, one can define “Selectivity” of the probe to monitor the linguistic properties of word representations in a better way.

                selectivity = linguistic acc  control acc

Selectivity puts linguistic task accuracy in the context of the probe’s ability to memorize arbitrary outputs for word types.

BERTology

What Does BERT Learn about the Structure of Language? (Ganesh Jawahar, Benoît Sagot, Djamé Seddah)

bert_phrase

bert_probe

  1. SentLen: Length of sentence
  2. WC: Word occurrence detection
  3. Treedepth: What is the depth of dependency tree
  4. TopConst: Sequence of top level constituents in the syntax tree
  5. BShift: Word Order detection
  6. Tense: Tense of the sentence
  7. SubjNum: Subject number in the main clause
  8. ObjNum: Object number in the main clause
  9. SOMO: Sensitivity to random replacement of noun/verb
  10. CoordInv: Sensitivity to random swapping of coordinated clausal conjuncts

bert_sva

bert_tpdn

Concluding Thoughts:

  1. What do the current language models learn?
  2. How do the current language models learn?
  3. How is it different from human language learning?
  4. Must read paper: How Can We Accelerate Progress Towards Human-like Linguistic Generalization? (T. Linzen; ACL 2020)