
Noam Chomsky on Languages. Source: Shameem. 2018.
Any language is a structured medium of communication whether it is a spoken or written natural language, sign or coded language, or a formal programming language. Languages are characterised by two basic elements – syntax (grammatical rules) and semantics (meaning). In some languages, the meaning might vary depending upon a third factor called context of usage.
Depending on restrictions and complexity present in the grammar, languages find a place in the hierarchy of formal languages. Noam Chomsky, celebrated American linguist cum cognitive scientist, defined this hierarchy in 1956 and hence it's called Chomsky Hierarchy.
Although his concept is quite old, there's renewed interest because of its relevance to Natural Language Processing. Chomsky hierarchy helps us answer questions like “Can a natural language like English be described (‘parsed’, ‘compiled’) with the same methods as used for formal/artificial (programming) languages in computer science?”
What are the different levels in the Chomsky hierarchy?

Chomsky Hierarchy Levels. Source: Fitch. 2014.
There are 4 levels – Type-3, Type-2, Type-1, Type-0. With every level, the grammar becomes less restrictive in rules, but more complicated to automate. Every level is also a subset of the subsequent level.
What are the common terms and definitions used while studying Chomsky Hierarchy?
What are the corresponding language characteristics in each level?

Languages, Automaton, Grammar, Recognition. Source: Hauser and Watumull 2016, fig. 1.
Under Type-3 grammar, we don't classify entire languages as the production rules are restrictive. However, constructs describable by regular expressions come under this type.
For instance, rule for naming an identifier in a programming language – regular expression with any combination of case-insensitive letters, some special characters and numbers, but must start with a letter.
Context-free languages classified as Type-2 are capable of handling an important language construct called nested dependencies. English example – Recursive presence of “If <phrase> then <phrase>” – “If it rains today and if I don’t carry an umbrella, then I'd get drenched”. For programming languages, the matching parentheses of functions or loops get covered by this grammar.
In Type-1 languages, placing the restriction on productions α → β of a phrase structure that β be at least as long as α, they become context sensitive. They permit replacement of α by β only in a ‘context’, [context] α [context] → [context] β [context].
Finally, Type-0 languages have no restrictions on their grammar and may loop forever. They don’t have an algorithm enumerating all the elements.
What are the type of Automaton that recognizes the grammar in each level?
Can you give a quick example for each type of grammar/language?
In which level of the hierarchy do formal programming languages fall?
Reading a text file containing a high-level language program and compiling it as per its syntax is done in two steps.
Finite state models associated with Type-3 grammar are used for performing the first step of lexical analysis. Raw text is aggregated into keywords, strings, numerical constants and identifiers in this step.
In the second step, to parse the program constructs of any high level language according to its syntax, a Context-Free Grammar is required. Usually these grammars are specified in Backus-Naur Form (BNF).
For example, to build a grammar for IF statement, grammar would begin with a non-terminal statement S. Rules will be of the form:
S → IF-STATEMENT
IF-STATEMENT → if CONDITION then BLOCK endif
BLOCK → STATEMENT | BLOCK;
Conventionally, all high-level programming languages can be covered under the Type-2 grammar in Chomsky’s hierarchy.
Python language has a unique feature of being white-space sensitive. To make this feature fit into a conventional CFG, Python uses two additional tokens ‘INDENT’ and ‘DEDENT’ to represent the line indentations.
However, just syntactic analysis does not guarantee that the language will be entirely ‘understood’. Semantics need to match too.
Where can we place natural languages in the hierarchy?
Natural languages are an infinite set of sentences constructed out of a finite set of characters. Words in a sentence don’t have defined upper limits either. When natural languages are reverse engineered into their component parts, they get broken down into four parts - syntax, semantics, morphology, phonology.
Tokenising words and identifying nested dependencies work as explained in the previous section.
Part-of-Speech Tagging is a challenge. “He runs 20 miles every day” and “The batsman scored 150 runs in one day” – the same word ‘runs’ becomes a noun and verb. Finite state grammars can be used for resolving such lexical ambiguity.
Identifying cases (subjective - I, possessive - Mine, objective - Me, etc) for nouns varies across languages: Old English (5), Modern English (3), Sanskrit and Tamil (8). Each case also has interrogative forms. Clear definition of cases enables free word order. The CFG defined for these languages take care of this.
Natural languages are believed to be at least context-free. However, Dutch and Swiss German contain grammatical constructions with cross-serial dependencies which make them context sensitive.
Languages having clear and singular source text of grammar are easier to classify.
Are there any exceptional cases in natural languages that make its classification ambiguous?
NLP practitioners have successfully managed to assign a majority of natural language aspects to the regular and CFG category. However, some aspects don't easily conform to a particular grammar and require special handling.
What are the important extensions to Chomsky hierarchy that find relevance in NLP?

Mildly Context Sensitive Languages. Source: Jäger and Rogers. 2012.
There are two extensions to the traditional Chomsky hierarchy that have proved useful in linguistics and cognitive science:
1928
Avram Noam Chomsky is born on December 7, 1928. Decades later, Chomsky is credited with the creation of the theory of generative grammar, considered to be one of the most significant contributions to the field of linguistics made in the 20th Century.
1936
Turing machines, first described by Alan Turing in 1936–7, are simple abstract computational devices intended to help investigate the extent and limitations of what can be computed.