Introduction to Quantitative Linguistics

You are here

Credits
6
Types
  • MIRI: Elective
  • MDS: Elective
  • MEI: Elective
Requirements
This subject has not requirements, but it has got previous capacities
Department
CS
Quantitative linguistics is a branch of linguistics that is primarily concerned about statistical patterns of language (the so-called linguistics laws), their explanation and theory construction. The course is relevant to anybody interested in how languages (and animal communication) are and why.
This course covers a myriad of statistical laws of language (beyond the scope of traditional courses on information retrieval or natural language processing), how to analyze them and their origins.
A fundamental working hypothesis is that these laws emerge from the need to reduce the cognitive effort of speakers or listeners. This course makes emphasis on potential explanations in terms of general principles of cognition in humans and other species. The course covers the mathematical and computational models that have been developed to explain these regularities. During this journey, students will enrich their current knowledge with concepts and tools from linguistics, biology, cognitive science, information theory and multidisciplinary physics under the hawk-eye perspective of philosophy of science.
The course is relevant to researchers interested in squeezing linguistic data as well as evaluating or adapting algorithms, machine learning methods,...based on the real statistical properties of language and the underlying theory. As these regularities are often the result of reducing the cognitive effort of language users, the course is also relevant to researchers interested in developing resources or systems that are easier to use or understand by humans or interested in developing language processing tools that exploit the real constraints of the human brain.

Teachers

Person in charge

  • Ramon Ferrer Cancho ( )

Weekly hours

Theory
2.5
Problems
0.5
Laboratory
1
Guided learning
0
Autonomous learning
7.11

Objectives

  1. Know the foundations of science and the scientific method. Understand the difference between hypothesis and theory, between modeling and understanding, between describing and explaining, between manifestation and principle. Understand the value of prediction and the types of prediction.
    Related competences: CTR6,
  2. Learn about the statistical laws of language and their origins.
    Related competences: CTR4, CTR6, CTR7,
  3. Know and understand the principles of organization of languages and other communication systems
    Related competences: CTR6, CTR7,
  4. Know the mathematical foundations of quantitative linguistics. Know basic probability theory and information theory.
    Related competences: CTR6, CTR7,
  5. Know the statistical analysis methods of quantitative linguistics.
    Related competences: CTR4,
  6. Learn how to write a scientific article. Know how to distinguish between a laboratory report and a research paper.
    Related competences: CTR3, CTR4, CTR6, CTR7, CTR9,

Contents

  1. Introduction to Quantitative Linguistics
    What is quantitative linguistics? Overview of linguistic laws, key concepts and research problems in quantitative linguistics.
  2. Law of abbreviation and problem of compression
    The law of abbreviation in humans and other species. Methods of analysis of the law of abbreviation. Introduction to information theory. Predictions of optimal coding.
  3. Information theory
    Classic information theory and extensions for natural communication systems.
  4. Theory of power laws
    Relationships between power laws. Inference of power laws. Power-law analysis methods.
  5. Models of Zipf's law for word frequencies
    Debowski's bounds. Classic models. Zipfian optimization models of communication.
  6. The statistical structure of symbolic sequences
    Word returns. Correlations in symbolic sequences. Persistence and antipersistence. n-gram models. Generative models.
  7. Dependency syntax
    Introduction to dependency syntax. Formal constraints on syntactic dependency structures.
  8. Word order theory
    Word order principles. Predictions. Ordre de subjecte (S), complement directe (O) and Verb (V).
  9. Theory construction
    The scientific method. A general theory. Closing.

Activities

Activity Evaluation act


Introduction

Introducció a la lingüística quantitativa. Introducció a l'assignatura
Objectives: 1 2
Contents:
Theory
3h
Problems
0h
Laboratory
0h
Guided learning
0h
Autonomous learning
5.3h

Law of abbreviation and the problem of compression


Objectives: 4 5
Contents:
Theory
3.8h
Problems
1h
Laboratory
3.5h
Guided learning
0h
Autonomous learning
6.7h

Information theory


Objectives: 3 4
Theory
4h
Problems
1h
Laboratory
0h
Guided learning
0h
Autonomous learning
7.1h

Theory of power laws


Objectives: 2 4
Contents:
Theory
4h
Problems
1h
Laboratory
3.5h
Guided learning
0h
Autonomous learning
7.1h

Models of Zipf's law for word frequencies


Objectives: 1 2 3 4
Contents:
Theory
4h
Problems
0.8h
Laboratory
0h
Guided learning
0h
Autonomous learning
7.1h

The statistical structure of symbolic sequences


Objectives: 2 3 4 5
Contents:
Theory
4h
Problems
1h
Laboratory
3h
Guided learning
0h
Autonomous learning
7.1h

Dependency syntax


Objectives: 2 5
Contents:
Theory
4h
Problems
1h
Laboratory
0h
Guided learning
0h
Autonomous learning
7.1h

Word order theory


Objectives: 3 4 5
Contents:
Theory
4h
Problems
1h
Laboratory
3.5h
Guided learning
0h
Autonomous learning
7.1h

Theory construction


Objectives: 1 3
Contents:
Theory
3h
Problems
0h
Laboratory
0h
Guided learning
0h
Autonomous learning
5.3h

Research project


Objectives: 1 2 3 4 5 6
Theory
0h
Problems
0h
Laboratory
0h
Guided learning
0h
Autonomous learning
36h

Teaching methodology

The theory sessions will be done primarily by the professor using either the blackboard or projected slides.

The lab work will be done in front of the computer. Students are expected to be working on their assignment, and the professor will explain all that is necessary to follow the class in the beginning of the session. Each lab session will be accompanied by a thorough guide describing the work that needs to be done.

The research project will be carried out under the supervision of the professor.

All the material relevant for the course will be available from Racó or the course's website .

Evaluation methodology

Grading is done by means of exams, reports on various tasks (labs and a research project) throughout the course.

There will two partial exams which count toward 30% of the score. Students are expected to hand in 4 lab work reports about two weeks after its corresponding lab session, which count toward 30% of the final grade. Finally, students will have to deliver a research project by the end of the course that accounts for 40% of the final grade. The research project is the most important activity and must be understood as a course project (not as one more lab). Labs must be understood as a training for the research project.

The formula to compute the final grade is therefore

0.3 * (P1 + P2) + 0.3 * ( L1 + L2 + L3 + L4) + 0.4 * RP

where P1 is the score of the first partial exam, P2 is the score of the 2nd partial exam, Li stands for the grade for i-th lab and RP is the grade of the research project.

Bibliography

Basic:

Complementary:

Web links

Previous capacities

Programming
Basic probability and statistics