Credits
6
Types
- MIRI: Elective
- MDS: Elective
- MEI: Elective
Requirements
This subject has not requirements
, but it has got previous capacities
Department
CS
This course covers a myriad of statistical laws of language (beyond the scope of traditional courses on information retrieval or natural language processing), how to analyze them and their origins.
A fundamental working hypothesis is that these laws emerge from the need to reduce the cognitive effort of speakers or listeners. This course makes emphasis on potential explanations in terms of general principles of cognition in humans and other species. The course covers the mathematical and computational models that have been developed to explain these regularities. During this journey, students will enrich their current knowledge with concepts and tools from linguistics, biology, cognitive science, information theory and multidisciplinary physics under the hawk-eye perspective of philosophy of science.
The course is relevant to researchers interested in squeezing linguistic data as well as evaluating or adapting algorithms, machine learning methods,...based on the real statistical properties of language and the underlying theory. As these regularities are often the result of reducing the cognitive effort of language users, the course is also relevant to researchers interested in developing resources or systems that are easier to use or understand by humans or interested in developing language processing tools that exploit the real constraints of the human brain.
Teachers
Person in charge
- Ramon Ferrer Cancho ( rferrericancho@cs.upc.edu )
Weekly hours
Theory
2.5
Problems
0.5
Laboratory
1
Guided learning
0
Autonomous learning
7.11
Competences
Transversals
Objectives
-
Know the foundations of science and the scientific method. Understand the difference between hypothesis and theory, between modeling and understanding, between describing and explaining, between manifestation and principle. Understand the value of prediction and the types of prediction.
Related competences: CTR6, -
Learn about the statistical laws of language and their origins.
Related competences: CTR4, CTR6, -
Know and understand the principles of organization of languages and other communication systems
Related competences: CTR6, -
Know the mathematical foundations of quantitative linguistics. Know basic probability theory and information theory.
Related competences: CTR6, -
Know the statistical analysis methods of quantitative linguistics.
Related competences: CTR4, -
Learn how to write a scientific article. Know how to distinguish between a laboratory report and a research paper.
Related competences: CTR3, CTR4, CTR6, CTR9,
Contents
-
Introduction to Quantitative Linguistics
What is quantitative linguistics? Overview of linguistic laws, key concepts and research problems in quantitative linguistics. -
Law of abbreviation and problem of compression
The law of abbreviation in humans and other species. Methods of analysis of the law of abbreviation. Introduction to information theory. Predictions of optimal coding. -
Information theory
Classic information theory and extensions for natural communication systems. -
Theory of power laws
Relationships between power laws. Inference of power laws. Power-law analysis methods. -
Models of Zipf's law for word frequencies
Debowski's bounds. Classic models. Zipfian optimization models of communication. -
The statistical structure of symbolic sequences
Word returns. Correlations in symbolic sequences. Persistence and antipersistence. n-gram models. Generative models. -
Dependency syntax
Introduction to dependency syntax. Formal constraints on syntactic dependency structures. -
Word order theory
Word order principles. Predictions. Ordre de subjecte (S), complement directe (O) and Verb (V). -
Theory construction
The scientific method. A general theory. Closing.
Activities
Activity Evaluation act
Theory
4h
Problems
1h
Laboratory
3.5h
Guided learning
0h
Autonomous learning
7.1h
Theory
4h
Problems
1h
Laboratory
0h
Guided learning
0h
Autonomous learning
7.1h
Theory
4h
Problems
1h
Laboratory
3.5h
Guided learning
0h
Autonomous learning
7.1h
Theory
3h
Problems
0h
Laboratory
0h
Guided learning
0h
Autonomous learning
5.3h
Teaching methodology
The theory sessions will be done primarily by the professor using either the blackboard or projected slides.The lab work will be done in front of the computer. Students are expected to be working on their assignment, and the professor will explain all that is necessary to follow the class in the beginning of the session. Each lab session will be accompanied by a thorough guide describing the work that needs to be done.
The research project will be carried out under the supervision of the professor.
All the material relevant for the course will be available from Racó or the course's website .
Evaluation methodology
Grading is done by means of exams, reports on various tasks (labs and a research project) throughout the course.There will two partial exams which count toward 30% of the score. Students are expected to hand in 4 lab work reports about two weeks after its corresponding lab session, which count toward 30% of the final grade. Finally, students will have to deliver a research project by the end of the course that accounts for 40% of the final grade. The research project is the most important activity and must be understood as a course project (not as one more lab). Labs must be understood as a training for the research project.
The formula to compute the final grade is therefore
0.4 * (P1 + P2) + 0.2 * ( L1 + L2 + L3 + L4) + 0.4 * RP
where P1 is the score of the first partial exam, P2 is the score of the 2nd partial exam, Li stands for the grade for i-th lab and RP is the grade of the research project.
Bibliography
Basic
-
Linguistic laws in biology
- Semple, S., Ferrer-i-Cancho, R. & Gustison, M.,
Trends in Ecology and Evolution,
2022.
https://doi.org/10.1016/j.tree.2021.08.012 -
Human behavior and the principle of least effort.
- Zipf, George K. ,
Addison-Wesley Press,
1949.
-
Quantitative linguistics, an invitation
- Best, Karl-Heinz; Rottmann, Otto,
Ram-Verlag,
2017.
ISBN: 9783942303514
https://discovery.upc.edu/discovery/fulldisplay?docid=alma991004215239706711&context=L&vid=34CSUC_UPC:VU1 -
Adaptive languages: an information-theoretic account of linguistic diversity
- Bentz, Christian,
2018.
ISBN: 9783110557770
https://discovery.upc.edu/discovery/fulldisplay?docid=alma991005066679206711&context=L&vid=34CSUC_UPC:VU1 -
Statistical universals of language: mathematical chance vs human choice
- Tanaka-Ishii, Kumiko,
Springer,
2021.
ISBN: 9783030593773
https://discovery.upc.edu/discovery/fulldisplay?docid=alma991001812669706711&context=L&vid=34CSUC_UPC:VU1 -
Communicative efficiency: Language structure and use
- Levshina, Natalia,
Cambridge University Press,
2023.
-
Analyzing linguistic data: a practical introduction to statistics using R
- Baayen, R. Harald,
Cambridge University,
2008.
ISBN: 9780521709187
https://discovery.upc.edu/discovery/fulldisplay?docid=alma991003354579706711&context=L&vid=34CSUC_UPC:VU1 -
La Ciencia: su método y su filosofía
- Bunge, Mario,
Sudamericana,
1995.
ISBN: 9500710439
https://discovery.upc.edu/discovery/fulldisplay?docid=alma991002752309706711&context=L&vid=34CSUC_UPC:VU1
Complementary
-
The calculus of linguistic observations
- Herdan, Gustav,
1962.
ISBN: 9783112415443
https://discovery.upc.edu/discovery/fulldisplay?docid=alma991005066679006711&context=L&vid=34CSUC_UPC:VU1 -
Information theory meets power laws: Stochastic processes and language models
- Debowski, Lukasz,
Wiley,
2021.
Web links
- Laws of language outside human language. Statistical laws of language in the behavior of other species, genomes and beyond https://cqllab.upc.edu/biblio/laws/
Previous capacities
ProgrammingBasic probability and statistics