ORIE 5260: Machine Learning in Finance

This is an archive of course materials for ORIE 5260, taught at Cornell Tech in 2018.

Course description

Machine learning is a field at the intersection of computer science and statistics that aims to develop computational systems that learn from data and improve with experience. Though its origins lie in the field of artificial intelligence, modern machine learning has transformed a huge variety of areas, such as biology, medicine, e-commerce, retail, marketing, operations, logistics, politics, journalism, and, of course, finance.

This course provides a general introduction to machine learning with a view towards applications in finance. The goal is to provide both a solid grounding in the foundations of machine learning as well as a conceptual map of the field and its relation to areas like statistics and optimization. The focus is on mathematical and conceptual understanding; the course will occasionally touch on implementation issues and financial examples, but will not emphasize either aspect in coursework.

Topics include linear regression, logistic regression, exponential families, generalized linear models, generative models, support vector machines, loss functions and regularization, sparsity, Bayesian methods, model selection, the EM algorithm, clustering, principal components analysis, and convex optimization and optimization algorithms.

Prerequisites

The course requires background in linear algebra, probability, and optimization at the level of MATH 2940, ORIE 5500, and ORIE 5300.

Course information

Course requirements and grading

The course grade will depend on two factors: attendance and problem sets (there will be no exams). Problem sets will generally take 2-3 weeks each depending on the particular material being covered, with a total of roughly 6 problem sets. Homework should be typed, preferably in LaTeX, and submitted to the TA by email.

Every student is expected to abide by the Code of Academic Integrity of Cornell University. In particular, you must work on the problem sets alone; you can discuss the problems with other students, but only at the level of a hallway discussion. You also should not consult external references. It is fine to look up standard mathematical results as long as they are not the subject of a given problem.

Syllabus

The syllabus may be adjusted through the course of the semester. Several diagrams throughout are due to Andrew Ng; Boyd and Vandenberghe; and Hastie, Tibshirani, and Friedman.

Homework

Readings

These readings will be posted intermittently through the semester and are entirely optional. Their goal is to give some exposure to the history, culture, and debates of machine learning, statistics, and data science, and to give additional perspective. Some are just included for historical interest and are not intended to be read cover to cover.