Post written by an S&DS150 student!
77% of the United States population used social media in 2018 . The largest social media platform, Facebook, thus holds an enormous amount of data. Not only does Facebook have data from its users’ profiles, it also has data on their private messages, who they interact with, how long they look at different adds, what news sources they read, and the list goes on.
Even as we hear of data breaches, the vast majority of us continue to use Facebook. It is a central part of our world, and we value the benefits of having the social network more than we value our private data. Facebook agrees—in arguing that consumers can’t sue Facebook even when Facebook directly violates the law due to a lack of private right of action (the ability of individuals to sue a company for a crime), the New York Times states “Facebook is paradoxically arguing that privacy itself has no price” .
Many of us consider our privacy to be a fundamental right, but are we protected from privacy breaches, especially in this technological age? The United Declaration of Human Rights states that we are: “No one shall be subjected to arbitrary interference with his privacy, family, home, or correspondence, nor to attacks upon his honor and reputation. Everyone has the right to the protection of the law against such interference or attacks,” but privacy is never even mentioned in the United States Constitution. Instead, it falls under the catch-all of the 9th amendment: “the enumeration in the Constitution of certain rights shall not be construed to deny or disparage other rights retained by the people.”
With no federal data privacy law, most invasions of privacy fall under tort law, with cases brought forth by individuals and tried in civil court. If we can sue companies for damages under tort law, what’s the problem? Tort law requires that there be identifiable damages to the individual that can be traced causally to the company’s actions. When another company buys data collected by a Facebook app, such as in the Cambridge Analytica case, there are multiple problems. First, how do you know that your data was sold? Second, how do you quantify any monetary damages that your data being sold had on you? In cases where the release of your data to the public and to your employer causes job loss or difficulty in securing a job, monetary damages are more clear. But when your data is used in a political campaign, how do you make a case of damages? Your data may have helped the campaigners affect your vote and cause you to have a president that is not in your best interest. But how do you prove that the campaign caused you to change your vote? How do you prove that your data caused enough votes to change to have the outcome of the election change? How do you prove that the other candidates would have been better for you? The case is not very clear at all, and it becomes difficult to try the case in civil court.
If there is a law for data privacy, however, the state must try the case in criminal court. Companies such as Facebook, Amazon, and Google employ so many people that hurting them could hurt the states and cities that house them. Will the government ever file cases against these giant companies? More likely, they will allow them to foster the economy in their cities until there is a breach that affects so many so deeply that it simply cannot be ignored. But by that point, users will back out of the sites as well, making the court case less relevant.
If we do move toward a federal data privacy law, such as the European Union’s General Data Protection Regulation (GDPR), what will this law contain? How should we propose that our data be protected? With the current ability to de-anonymize data due to the amount of secondary data sets to link to and powerful algorithms out there, removing personally identifiable information is not enough.
Do we add a data curator and only allow meta-level queries into a database? The problem with this approach is that there is no way to truly know that the curator will stay loyal. The curator has access to all the raw data and can then go on to cause problems. To solve this, we can add noise. There are two possibilities: local privacy and global privacy, both of which involve adding noise. In a local privacy framework, noise is added to each raw data point and then given to the untrusted curator. That person then only has noisy data, which preserves the privacy of each data point because it is much harder to reidentify noisy data. The second option is global privacy, where the raw data is given to a trusted curator, and the curator adds noise to the answer to the query. This poses another problem in addition to needing a trusted curator—the querier could continue asking the same query over and over and average the answers to essentially eliminate the noise due to the law of large numbers. Thus we now need to limit the number of queries allowed, as well as the type of queries allowed. How do you control this? If you limit the number of queries allowed per day, then over a very long time, the noise could still be canceled out.
One common approach to local privacy is called differential privacy. Differential privacy is a way to know how much noise to add to each data point such that it is considered safe. The premise is that with a certain amount of noise, one should not be able to tell the difference between a data set including a person and not including a person. Thus since it is unknown whether the person is even in the data set, nothing can be determined about them. Differential privacy provides a nice way to quantify whether data is private “enough.” However, the level of privacy must still be set through an epsilon parameter. Stating that a company uses differential privacy can be used as a way to conclude that the company is being ethical with your data and then not worry about it. Apple, thought to be the best tech giant in terms of data privacy, uses differential privacy but does not release the value of epsilon they use or their source code. Researchers tested the level of privacy and found that Apple does not meet current standards for an appropriate level of differential privacy. 
Is requiring that companies satisfy a certain level of differential privacy before giving away their data enough? GDPR states that data subjects have the following rights: breach notification, right to access, right to be forgotten, data portability, and privacy by design. This means users are informed of any breaches immediately, can access their data, remove their data at any time, move their data, and have minimal data stored about them. GDPR also regulates that consent forms must be easy to understand and withdrawing consent must also be easy. Would a similar regulation in the US be enough? What duty does Facebook have to its users to protect their privacy? What duty does it have to allow users the freedom to use the platform under their own conditions?
Currently, if a user doesn’t like the terms and conditions, they are simply not allowed to make an account and use the platform. For sites such as Facebook, this doesn’t feel like an option to many people. Facebook is the way many stay connected to their friends, check the news, talk to their relatives, find events to go to, and get invited to important events from their friends. Not having a Facebook account puts people at a disadvantage socially. They have to find out about events through other methods and put in more effort to do other things many of us do instantaneously on Facebook, such as find out about current news. And by not having a Facebook account, they are being excluded from massive data sets that may actually cause them harm later. When they are underrepresented in the data sets that lead to decisions that affect their lives, their decision to opt out of the generally unreasonable terms and conditions has not only denied them the opportunity to use the social media but also the right to be represented in decisions that affect their lives.
If we allow users the ability to modify the terms and conditions, opting in or out of each intended use of their data, then Facebook will be serving people in the best way. Not only will they still be producing large amounts of data to fuel research and decision making systems, but the data will be opt-in and anyone who wants to be included or excluded from any part of the conditions can do so without the threat of not being able to use the site. The current system of accepting all the conditions or not being able to use the platform is not in the best interests of the users.
One thing to consider from this approach is that opt-in data suffers from selection bias, but this selection bias is not much greater than the selection bias that comes from only collecting the data of Facebook users. As a community, as our statistical methods for dealing with imperfect data improve, this is something we should be willing to work around in order to give users the flexibility to control their data usage and not be penalized for their choices.
Making the terms transparent and flexible will allow users to regain their trust in Facebook and its research and business practices, allowing for more users to continue using the site happily and allow research to continue happening. If nothing changes and data breaches keep occurring without penalty, users will likely turn elsewhere in hopes of more respect for their privacy online.
 Percentage of US Population with a Social Network Profile, Statista 2019
 We Should Be Able to Take Facebook to Court, Neema Singh Guliani, 2019
 How One Of Apple’s Key Privacy Safeguards Falls Short, Andy Greenberg, 2017