Instructor: Prof. Elisa Celis
Schedule: TR 1:00pm-2:15pm, WLH 203
Course email address: email@example.com
Office hours: M 2:30-3:30 and by appointment, Dana House Room 203
Note: Counts towards EPE requirements
IMPORTANT: REGISTRATION REQUIREMENTS
Space is limited. Please fill out this form as soon as possible, and latest by 5pm on Tuesday, Jan 15th if you intend to register: https://goo.gl/forms/WvChom5rMv64SOLu1
You will be informed by 5pm on Wednesday, Jan 16th whether you are admitted in the class, waitlisted in the class, or not eligible for the class. I will admit students off the waitlist in as timely a manner as possible, please respond within 24hrs if you would like to accept your admitted spot so that I can best inform others who may be interested.
In this course, we will introduce, discuss, and analyze ethical issues, algorithmic challenges, and policy decisions that arise when addressing real-world problems via the lens of data science. We will do this by first grappling with the normative questions of what constitutes bias, fairness, discrimination or ethics when it comes to data science and machine learning in applications such as policing, health, journalism, and employment. We will incorporate technical precision by introducing quantitative measures that can allow us to study how algorithms codify, exacerbate and/or introduce biases of their own, and study analytic methods of correcting for or eliminating these biases. Lastly, we will study the social implications of these decisions, and understand the legal, political and policy decisions that could be used to govern data-driven decision making by making them transparent and auditable. We will read critical commentary by practitioners, state-of-the-art technical papers by data scientist and computer scientists, and samples of legal scholarship, moral and ethical philosophy, readings in sociology, and policy documents. We will often ground our discussions around recent case studies, controversies, and current events.
I encourage students from all majors to join the class. While the topic will be focused on data science applications and methods, the assignments and assessment will focus on the ethical definitions, controversies, and challenges and can be completed without explicit knowledge of programming. Please feel free to reach out via the course email address if you have any questions.
- Statistics (required): S&DS 238 OR S&DS 241 OR S&DS 242 OR similar
- Data Analysis (required): S&DS 230 OR Econ 131 OR similar
- AI/ML/Algorithms (recommended): CPSC 470 OR S&DS 365 OR ECON 429b OR similar
- Ethics/Philosophy (recommended): EP&E 215 OR PHIL 175 OR PHIL 177 OR SOCY 144 OR PLSC 262 OR PLSC 320 OR similar
COURSE LEARNING OBJECTIVES
- Develop fluency in the key technical, ethical, policy, and legal terms and concepts related to data science.
- Learn about algorithmic and data-driven approaches for mitigating biases in AI/ML systems
- Reason through problems with no clear answer in a systematic manner, taking and defending different viewpoints, and justifying your conclusions in a rigorous manner.
- Improve writing and communication skills both with a technical and lay audience.
- Listen, understand and communicate with people of varying opinions, viewpoints, and ideas. Disagreement and debate is expected, as is respectful open communication.
ASSIGNMENTS AND GRADING
- 40% Technical Report: Write a technical paper on one of 5 provided case studies/datasets, analyzing it from one or more perspectives discussed in class and in particular addressing the potential pitfalls of the original study/analysis, proposing an alternative approach/analysis, developing initial results with regard to this approach. The paper should be 4-5 pages single-spaced paper (not including references). Your paper should draw extensively from the course materials, lectures, in-class discussions. The paper should be sure to discuss technical or societal challenges or unavoidable trade-offs in developing a solution and provide a thoughtful justification of the specific solution you propose. A draft of the report will be due 3 weeks before the end of class, and you will be asked to give feedback to each other on these initial drafts.
- 25% Blog post. You will be assigned one lecture to blog about (7-8 minute read) for a lay audience. The post should clearly introduce the problem at hand, preferably with a real-world scenario as an example, and discuss the ethical, technical or societal implications in line with the course readings, material, and discussion. The blog post should be in your own words and present your thoughts and views on the topic. It should not be a direct summary of everything covered in class, though it can be inspired and build off of our discussions.
- 15% In-class Participation. Please plan on reading the assigned material before the beginning of class so that we can have a productive discussion. This grade is for participation in-class, both with respect to asking/answering questions thoughtfully and contributing to the discussion. There will also be occasional short in-class writing prompts to be turned in at the end of the class.
- 10% Article Presentation. At the beginning of one lecture, you will be asked to give a very brief (2 minutes) informal presentation on a news article on a recent event in which data science / machine learning is affecting a person or society more broadly. The presentation should outline the data set / algorithm and ethical dilemma in question, and connect the topic to the ideas we have been discussing in the course. You will also write a short (2 minute read) blog post summarizing your presentation.
- 10% Colloquium responses. Parallel to this course, there will be a series of 5 (or more) special colloquia organized around the topics of fairness and ethics in data science. I strongly encourage you to attend all of these talks and expect you to attend at least two. There will be one or more question prompts for each talk to which you are expected to respond in writing. The response need only be 1-2 paragraphs but should be articulate and make an insightful point that expands on the talk, connecting it to other areas, critiquing it, or otherwise adding something new rather than a summary of what was covered.
The schedule and readings may change slightly as we progress through the semester based on time and interest. Please check back here for the most updated version.
Welcome/Introduction (2 lectures)
Data Collection and Representation and Privacy (5 lectures)
- Data Sampling and Collection
- Managing Datasets Responsibility and Data Cannibalism
- Is the premise of data science flawed?
- Inference and Privacy
- Re-identification of Data
What is Machine Bias (4 lectures)
- How can a machine be biased?
- Bias vs Correlation vs Causation
- What is fair? What is discrimination?
- Should we just leave humans in charge?
Solutions to Bias via Algorithmic Fairness? (5 lectures)
- Preprocessing approaches & debiasing datasetss
- Impossibility Results
- In-Processing approaches to fairness
- Fairness in Deep Learning
- Representative Fairness
Social Implications and Feedback Loops (4 lectures)
- Polarization and feedback loops
- Algorithmic Persuasion
- Employment, advertising, and opportunity
- Who is data science?
Controlling ML Systems (5 lectures)
- Accountability — Who is responsible when a machine takes a decision?
- Auditing Algorithms — How can we check?
Honesty: I expect honesty in your work and in your communication. All submitted work must be yours and original with all sources cited appropriately. Please also review the resources at the writing center regarding academic integrity and plagiarism, and contact me if you have any questions or concerns about appropriately acknowledging others’ work in your submitted assignments. Note that I may use software to check for plagiarism. https://ctl.yale.edu/writing/wr-instructor-resources/addressing-academic-integrity-and-plagiarism
Respect: I expect you to exhibit respect for self, for others, for scholarship and research, for the and for the intellectual heritage of yourself and of your peers. This includes speaking and writing to each other respectfully and responding thoughtfully to others, present or otherwise, especially but not limited to discussions surrounding potentially controversial topics and ideas.
Responsibility: I expect you to assume shared responsibility for promoting the academic integrity, rigor, and production of our joint work as a class, both in our classroom and in our public blog. Please support each other, support ethical behavior, commit to excellence and depth of thought so that we can collectively consume and produce work with intellectual and social value.