‘Legal by Design’ or ‘Legal Protection by Design’?
‘Legal by Design’ or ‘Legal Protection by Design’?
Abstract and Keywords
This chapter focuses on how machine learning (ML) and distributed ledger technologies (DLTs) change the environment of the law, the substance of legal goods, and on the extent to which these changes affect legal protection. ML applications, for example, can decide a person's credit worthiness or employability. Moreover, DLTs can, for instance, self-execute transactions and policies without and beyond the law. One of the main challenges here thus concerns the regulatory effects of these novel technologies and the potential incompatibility of legal protection with techno-regulation (defined as the regulatory effects of a technology, whether or not intended). This challenge will be discussed in terms of automated compliance (‘legal by design’) and technological articulation of fundamental rights (‘legal protection by design’).
Policymakers, lawyers, and other folk often speak of ‘regulating technologies’. This is an interesting phrase, because it can mean many things, depending on how you ‘read’ it. In the old days, most lawyers and policymakers would understand it in the sense of technologies being the object of legal regulation. The law can, for instance, impose requirements on the fabrication, design, sale, and use of cars, knives, guns, housing, office space, washing machines, toys, or medical instruments. These requirements may concern safety, privacy, or a technology’s potential to violate copyright, to disseminate child pornography, or to generate pollution of the environment. They may be aimed at protecting weaker parties, critical infrastructure, national or public security, or the environment. The default response that technologies are the object of regulation may, however, be changing.
The same phrase (‘regulating technologies’) can also refer to technology as a ‘subject’ that is regulating human behaviour, for example, by way of speed bumps, digital rights management (DRM) technologies, news feed algorithms that determine what news we perceive, and other default settings that determine our ‘choice architecture’. Here, the object of regulation is not a technology but human behaviour. So, technology can be either the object or the subject of regulation (and maybe both), whereas law is usually only seen as a subject of regulation (that which regulates).
This may be about to change due to the pervasive effects of two types of technologies that impact the environment of the law: machine learning (ML) applications that, for example, decide a person’s credit worthiness or employability, and distributed ledger technologies (DLTs) that allegedly self-execute transactions and agreements without and beyond the law.
10.1 Machine Learning (ML)
To understand the relevance of ML for legal protection, it may help to look at a very simply example, such as AB testing. Imagine that the provider of a website wants to ‘optimize’ it to achieve higher performance in terms of influencing its visitors’ purchasing behaviours, their reading habits, or political.
Let’s see if this qualifies as an example of ML. In his handbook on Machine Learning, Tom Mitchell recounts that:
A computer program is said to learn
• from experience E
• with respect to some class of tasks T
• performance measure P
• its performance at tasks in T,
• as measured by P,
• improves with experience E.
As to type of task T: this clearly sets out that machines do not learn anything if no task is defined. In this case, the task will be defined by the website ‘owner’, together with the software provider, because the definition of what counts as desirable behaviour needs to be translated into machine-readable language. A webshop may find increased purchasing behaviour desirable, though they may also formulate more complex tasks, based on a segmentation of the visitors: they may prefer to increase the purchasing behaviour of people who buy expensive products, or of people who are likely to buy more than one product over the course of a specified period of time.
As to experience E: note that the experience of this software is limited to clickstream behaviours of visitors of the page, even if they can be followed on other sites. It may be that their behaviours on other sites are not within the tracking-scope of the software provider (e.g. in offline shops or via another browser), whereas those unknown behaviours are actually more relevant for an inference about their preferences. The software’s experience, however, is necessarily limited to the available training data.
As to performance metric P: it may be that a simple performance metric, such ‘clicks on one product’, or ‘buys at least two products’, does not really say much about the preferences of the visitors, because these behaviours are instances of situated behaviour that depends on many other factors. These other factors may be more indicative of their preferences. To test both versions against each other, one may need to test six or seven different performance metrics to obtain a better picture of what qualifies as an accurate measure of achieving desirable behaviour.
10.1.1 Exploratory and confirmatory ML research design
AB testing can be done by way of an exploratory research design, meant to generate hypotheses about what kind of behaviour is more lucrative for the webshop. This implies recognition that such AB testing is a matter of real-time experimentation. As Hofman, Sharma, and Watts write:
In exploratory analyses, researchers are free to study different tasks, fit multiple models, try various exclusion rules, and test on multiple performance metrics. When reporting their findings, however, they should transparently declare their full sequence of design choices to avoid creating a false impression of having confirmed a hypothesis rather than simply having generated one (3). Relatedly, they should report performance in terms of multiple metrics to avoid creating a false appearance of accuracy.
Claiming success based on such AB testing is a very bad idea, and usually amounts to what statisticians call p-hacking. For a reliable prediction one needs a confirmatory research design, that provides tested and testable hypotheses about the preferences of visitors. As Hofman, Sharma, and Watts write:
To qualify research as confirmatory, however, researchers should be required to preregister their research designs, including data preprocessing choices, model specifications, evaluation metrics, and out-of-sample predictions, in a public forum such as the Open Science Framework (https://osf.io).
As one can understand, providers of marketing software that enables micro-targeting or underlies behavioural advertising will not be inclined to deposit their research design, including pre-processing choices, at the OSF.
10.1.2 Implications of micro-targeting
Instead, the result of micro-targeting based on flawed research design may be that visitors of websites are confronted with a personalized choice architecture that is meant to lure them into what others find desirable behaviour.
These consequences are not necessarily envisaged by developers or users of the software; they are brought about by mistaking—potentially crappy—exploratory research design for robust confirmatory research design.
This raises issues for legal protection. For instance, the mining and inferencing of behavioural data may interfere with specified fundamental rights, such as privacy, data protection, non-discrimination and freedom of expression. Behavioural data are often personal data and the mining of such data may infringe the privacy of those unaware of the rich profiles that can be built from such data, often combined with features that are inferred from such data. This may be in direct violation of the fundamental right to data protection, depending on how the data is mined and shared, on what ground, and with what purpose (see above, section 5.5.2). Based on micro-targeting, the mining and inferencing of behavioural data may also violate the freedom of expression, since this right includes the freedom to receive information free of censure. Micro-targeting based on AB testing could shield information from certain people, because there is no added value for the website owner in providing them with such information. We have entered the era of ad-driven-content, where the algorithms that infer what content is most conducive to attracting visitors may be prioritized in order to increase ad revenue. The use of ‘low hanging fruit’ to train ML algorithms will easily result in all kinds of unwarranted bias, due to the bias that is inherent in the so-called ‘training data’. Even if the right kind of data is available, the choice of the feature space, the hypothesis space, the task that is formulated, and the performance metric that is chosen may result in a biased outcome that systematically discriminates against people based on their race, ethnicity, religion, political preferences, gender, or sexual orientation.
An example of such bias is the proprietary COMPAS software, sold by Equivant (formerly Northpointe), where COMPAS stands for Correctional Offender Management Profiling for Alternative Sanctions. COMPAS is used by courts in the United States to assess the risk that an offender will recidivize (i.e. commit another offence after being released). This risk co-determines the parole or sentencing decisions. The risk score is based on a limited number of data points that have been found to correlate with re-offending. COMPAS is the result of an ML research design that tested 137 features to infer which six features were actually predictive. After Julie Angwin conducted own research (p.256) on similar training data, she claimed that COMPAS discriminates against people based on their race.
According to Equivant, this was the result of the fact that black persons on average had a higher risk of reoffending. Equivant accused Angwin of methodologically flawed methodologies, implying the laws of statistics were responsible for the disparate outcome of the risk score. As a use case, the accusation of racial discrimination has generated a flood of scientific literature on fairness in ML, underpinning requests for transparency and accountability, basically demanding that business and government employs FAT ML (fair, accountable, transparent machine learning applications).
10.1.3 Implications of micro-targeting for the rule of law
The second issue for legal protection concerns the extent to which decisions based on ML-inferences violate core principles of the Rule of Law, such as transparency and accountability.
(p.257) The second and third requirements concern the decision. In public administration, decisions must be taken in accordance with the legality principle, meaning that the justification must be based on law and citizens have a right to contest the decision in a court of law (see above, section 3.1.2). In the private sector, however, the freedom of contract and the freedom to dispose of one’s property as one wishes may provide the justification. These freedoms, however, are restricted, for instance due to the prohibition to discriminate in the context of employment, or to discriminate based on gender or race. Both in public administration and commercial enterprise, ML-based decision-making may incur invisible discrimination that is actually prohibited, for instance based on race. Such discrimination will often be unintended and invisible because it is based on a concerted set of features that correlate with race and therefore act as proxies for race. This means that such discrimination need not be based on a deliberate attempt to use race as a relevant feature; even if one removes race as a feature altogether, the proxies will probably sustain the discrimination.
Apart from prohibited discrimination, decision-making based on applied ML may have other repercussions.
In that case, individuals are basically treated on the basis of a score that probably does not apply to them. Even if such classification of individuals does not involve prohibited discrimination, it may be seen as unfair. For instance, on average women may have a risk of one out of eight to suffer from breast cancer. Depending on a woman’s age, the occurrence of breast cancer in her ancestry and family, her lifestyle, and other factors, her risk will stray from ‘one out of eight’, to a potentially much higher or lower risk. Treating each and every woman as if her risk is one out of eight would therefore be unwise, and in the case of, for example, a health insurance premium (p.258) one might argue this is unfair. This explains why the explainability of decisions based on the application of ML has become a serious issue of legal protection.
In terms of the General Data Protection Regulation (GDPR), personalized targeting based on ML would most often fall within the scope of Article 4(4):
‘profiling’ means any form of automated processing of personal data consisting of the use of personal data to evaluate certain personal aspects relating to a natural person, in particular to analyse or predict aspects concerning that natural person’s performance at work, economic situation, health, personal preferences, interests, reliability, behaviour, location or movements;
10.2 Distributed Ledger Technologies (DLTs), Smart Contracts, and Smart Regulation
As the development and usage of DLTs and/or Blockchains are in full flux, so is the terminology.
DLTs are often promoted as providing ‘trustless’ computing that enables immutable, transparent, and secure storage of transactions, with a guarantee against ex-post manipulation of previous transactions, thus ensuring the integrity of both the sequence and the content of the transactions (where the integrity of the sequence protects against ‘double spending’). Often DLTs are ‘sold’ as enabling disintermediation, meaning that users need not connect with a traditional institution (such as banks) to engage in trustworthy transactions with parties they do not know or do not trust. The idea is that the ledger allows them to interact with others in a fully transparent way, with certainty that neither the other party nor any third party can manipulate stored transactions. In a sense, the promise is that the technology can take over the role of a trusted intermediary by way of a fully predictable sequence of events that self-executes tamper-free transactions.
The difference between public and private DLTs can be defined as depending on who can ‘read’ the content, and the difference between permissioned and non-permissioned can be defined as depending who can add or ‘write’ new content. Bitcoin builds on public non-permissioned DLTs, meaning that anybody can check the content and submit new content. By now, commercial enterprises, financial institutions, as well as government agencies probe business cases for DLTs, often resorting to private permissioned versions that lack part of the lure of a decentralized system, because with private permissioned DLTs only a specified set of players is allowed to read and write on the ledger. (p.260)
Taking into account that most users do not understand computer code, such DLTs basically reinforce the role of the institutions that employ them; they require more trust, not less, and they certainly do not achieve disintermediation.
10.2.1 Smart contracts and smart regulation
For this chapter, the relevance of DLTs concerns so-called smart contracts and smart regulation, that is, the use of DLT to self-execute either an agreed contract or a specified policy based on regulatory competence. As to the first, we can think of a contract of sale that self-executes once triggered (when the system detects payment it transfers the object, or the other way around). Note that this may work perfectly if both the payment (e.g. cryptocurrency) and the assets (e.g. an electronic proof of ownership) are within the system (often referred to as being on-chain). Off-chain payments or off-chain transfer of assets, however, will require the use of ‘oracles’, that is, software applications that interface between the ledger and the real world, or other systems.
Some have observed that this conflates legislation with its execution and even with adjudication (in case of disagreement about the content of the contract). This would mean that the checks and balances of the rule of law, notably the separation of the powers of legislation, administration, and adjudication, are disrupted. This, in turn, would require new types of safeguards (legal remedies) to enable the contestation of the ensuing decisions—thus ensuring that smart regulation and smart contracts remain ‘under the rule of law’. (p.261)
From the perspective of law, the employment of DLTs raises many questions. In the context of this chapter, I focus on whether operating self-executing code via a DLT must be seen as ‘legal by design’ or as ‘legal protection by design’ (preparing the ground for the topic of section 10.3). (p.263) Do smart contracts or smart regulations guarantee that the behaviour of parties to the contract or of addressees of regulation is ‘legal by design’ or ‘legally compliant by design’? To prepare the ground, I will first discuss the question whether smart contracts are contracts in the legal sense (section 10.2.2), and whether smart regulation is law in the legal sense (section 10.2.3).
10.2.2 The legal status of ‘smart contracts’ under private law
As to contracts in the legal sense, we need to investigate what legal conditions must be fulfilled for ‘something’ to qualify as a legally binding contract. These legal conditions can be found in private law, which—in Europe—is mostly national law, as there is no binding European private law. I refer to section 3.2.2, where some of the basics of a valid contract were discussed, based on Dutch private law.
Does a smart contract qualify as a legal contract?
The fact that most contracts have no formal requirements could be used as an argument that sending a specific message to the code on the DLT may count as an expression of one’s intent to enter into the contract as defined in the code. However, the jury is still out on whether computer code counts as an expression of the content of a contract just like a written contract supposedly does. To count as such an expression, the code must be sufficiently determinate for both parties to understand the legal effect of the contract (i.e. the legal obligations it generates).
If we assume that the contract is valid, we still need to look into the legal effect of a valid contract, because in most jurisdictions such legal effect is not limited to the literal wording of the contract. (p.265)
The latter constraints may derive from either private or public mandatory law (see section 3.1.2 and 8.1.1), which cannot be overruled by contractual stipulations (whether in speech, writing, or code). To build flexibility into a contract or a policy, they often contain concepts with an open texture that leave parties or competent authorities some room to adapt the contract to concrete circumstances that cannot all be foreseen. Think of terms such as reasonably, timely, state of the art, or trustworthy, which can only be interpreted in the light of the circumstances that parties confront when performing the contract. Unforeseen changes in circumstances may have an impact on the content of the ensuing legal obligations, as when one party can claim force majeure. Whereas the ‘smart contract’ will self-execute, force majeure may overrule the obligation to perform the contract, meaning the execution may have to be undone (which may be impossible and/or the party that benefits may not be identifiable, or in a far-out jurisdiction, meaning they cannot be sued).
All this also happens to ‘normal’ contracts, and with ‘normal’ decision-making in public administration, but it is crucial to highlight that smart contracts and algorithmic decision-making in the sense of smart regulation do not necessarily solve these problems and may indeed create extra problems, precisely due to the non-adaptive nature of self-executing code. Those who wish to remedy these new problems by creating adaptive code must realize that this implies foreseeing all possible future scenarios, which is by definition not possible. Though the attempt to foresee changing circumstances may prevent some problems, it still implies that legislation (a contract can be seen as legislating how parties should act), execution (a contract should clarify what counts as a performance), and interpretation (the meaning of a contract depends on the circumstances) are all predetermined upfront by whoever writes the code. This somehow scales the past while it freezes the future.
Legal scholar Allen argues that smart contracts will be part of what he has called the ‘contract-stack’, which involves speech acts, behaviour, written documents, deeds, electronically signed documents, and—potentially—also (p.266) self-executing code. This implies that contract law will be transformed to accommodate the use of self-executing code, for example, by way of legislation, case law, and doctrinal innovation. Similar arguments can be made for smart regulation, which could similarly be seen as a ‘regulatory-stack’, involving legislative Acts that grant regulatory competences, policy documents, government agency’s behaviour patterns, decision-making processes and procedures, and—potentially—also self-executing code.
10.2.3 The legal status of ‘smart regulation’ under public law
With the term ‘regulation’ I refer to rules promulgated by public administration, or by independent supervisors that have been instituted by an Act of the legislature (usually called ‘regulators’ in the United States and the United Kingdom, e.g. the Federal Trade Commission; in the EU we can think of the EDPS or the national DPAs).
Many government decisions affect individual citizens, such as the granting of a permit, social security, or a decision on taxation. Many of the arguments provided in the previous section can be repeated here, and do not merely apply to implementation via DLTs but also to other forms of algorithmic (automated) decision-making. It simply means that the relevant rules are interpreted and translated into non-ambiguous code, to enable their self-execution.
The need to formalize will—in a sense—freeze future responses into a template that necessarily overlooks changing circumstances and may not reflect developments in case law, which could result in the code violating rights instead of enforcing compliance. In that respect, it is crucial to remember that (p.267) these rules and policies, as well as their machinic automation, fall under the rule of law.
This means that these rules and policies, as well as their machinic translations, must at some point be contestable in a court of law. Those subject to decisions based on smart regulation should be capable of requesting a justification of the decision in accordance with the legality principle. Note, however, that a justification is not equivalent to an explanation, which rather serves as a means to make the decision contestable as to its justification.
10.3 ‘Legal by Design’ or ‘Legal Protection by Design’?
Some authors claim that self-executing code could be used to ensure that the conduct of legal subjects will be ‘legal by design’ (LbD). What they mean to say is that one can interpret the content of a contract, the content of policy guidelines, or even the content of legislation such that it becomes amenable to a translation into computer code. So-called ‘Turing complete languages’ have been developed in the realm of DLTs, to write ‘smart contracts’ that—as we have seen in section 10.2—supposedly self-execute whatever has been agreed by the parties. One can imagine similar attempts to ensure compliance at the level of regulatory rules.
10.3.1 Legal by design (LbD)
LbD is a subset of what other authors have termed ‘techno-regulation’. This refers to the fact that technologies often induce or inhibit and enforce or preclude certain types of behaviours, which has a de facto regulatory effect.
In the latter cases, we speak of side-effects, though we should take note that such side-effects may be more prominent or influential than the intended effects.
Note that these steps can be analytically distinguished, but may be conflated in practice (thus hiding the act of interpretation). Due to the need to select an interpretation that can be translated into unambiguous machine language, such interpretations may be overinclusive or underinclusive compared to the relevant legal norm.
For example, a legal obligation for an employee to drive a truck from A to B within a reasonable time scale could be part of a smart contract between an employer and an employee. As the performance of the contract takes place off-chain, an oracle must be put in place to provide clear signals about whether or not this legal obligation has been fulfilled. To define what performance counts as ‘reasonable’, taking into account various types of circumstances, the contract must be interpreted beforehand and translated into a set of input variables for the oracle. As discussed in section 10.2.2, ‘reasonableness’ is not a subjective concept under contract law as it will have to be interpreted in line with relevant case law, while taking account of the unique circumstances of the case at hand. This makes it highly unlikely that a smart contract can be equated with ‘legal compliance by design’, due to the rigidity of the behaviour of computer code compared to the adaptiveness of the meaning of natural language.
(p.269) Another example could be that the legally allowed level of pollution caused by a car is integrated into smart regulation that rules out delivery of non-compliant cars by the car manufacturer. To enable this, however, the cars must be tested before leaving the factory, which necessarily disregards the actual pollution caused on the motorway. This, again, implies that there is no absolute guarantee that the car manufacturer is ‘legally compliant by design’.
10.3.2 Legal protection by design (LPbD)
Legal protection by design (LPbD) is another matter. It does not aim to guarantee enforcement of whatever legal norm, but rather aims to ensure that legal protection is not ruled out by the affordances of the technological environment that determines whether or not we enjoy the substance of fundamental rights.
Techno-regulation in general does not include these requirements and neither does LbD, which is often focused on excluding the involvement of trusted third parties. These two requirements thus distinguish LPbD from other types of ‘by design’ solutions, for instance ‘value sensitive design’ or ‘privacy by design’. The latter are often proposed as ethical requirements, which is problematic for two reasons. First, as ethical norms cannot level the playing field, companies that apply such ethical design may be pushed out of the market. Second, ethical ‘by design’ approaches make protection dependent on the ethical inclinations of those who develop and market the choice architecture of (p.270) citizens, instead of demanding that such choice architecture must meet minimum standards that provide effective and practical protection. For readers interested in the confrontation of law and ethics, see Chapter 11.
10.3.3 LPbD in the GDPR
10.3.3.1 Data protection impact assessment
Three interesting examples of LPbD can be found in the GDPR. First, the legal obligation to conduct a data protection impact assessment (DPIA) in Article 35, which is compulsory if the introduction of a new technology is likely to present a high risk to the rights and freedoms of data subjects:
1. Where a type of processing in particular using new technologies, and taking into account the nature, scope, context and purposes of the processing, is likely to result in a high risk to the rights and freedoms of natural persons, the controller shall, prior to the processing, carry out an assessment of the impact of the envisaged processing operations on the protection of personal data. A single assessment may address a set of similar processing operations that present similar high risks.
( … )
3. A data protection impact assessment referred to in paragraph 1 shall in particular be required in the case of:
a) a systematic and extensive evaluation of personal aspects relating to natural persons which is based on automated processing, including profiling, and on which decisions are based that produce legal effects concerning the natural person or similarly significantly affect the natural person;
b) processing on a large scale of special categories of data referred to in Article 9(1), or of personal data relating to criminal convictions and offences referred to in Article 10; or
c) a systematic monitoring of a publicly accessible area on a large scale.
( … )
7. The assessment shall contain at least:
a) a systematic description of the envisaged processing operations and the purposes of the processing, including, where applicable, the legitimate interest pursued by the controller;
b) an assessment of the necessity and proportionality of the processing operations in relation to the purposes;
c) an assessment of the risks to the rights and freedoms of data subjects referred to in paragraph 1; and
d) the measures envisaged to address the risks, including safeguards, security measures and mechanisms to ensure the protection of personal data and to (p.271) demonstrate compliance with this Regulation taking into account the rights and legitimate interests of data subjects and other persons concerned.
( … )
11. Where necessary, the controller shall carry out a review to assess if processing is performed in accordance with the data protection impact assessment at least when there is a change of the risk represented by processing operations.
Recital (75) adds some considerations concerning the question what constitutes the likelihood of a high risk to the rights and freedoms of natural persons.
The risk to the rights and freedoms of natural persons, of varying likelihood and severity, may result from personal data processing which could lead to physical, material or non-material damage, in particular:•
• where the processing may give rise to discrimination, identity theft or fraud, financial loss, damage to the reputation, loss of confidentiality of personal data protected by professional secrecy, unauthorised reversal of pseudonymisation, or any other significant economic or social disadvantage;
• where data subjects might be deprived of their rights and freedoms or prevented from exercising control over their personal data;
• where personal data are processed which reveal racial or ethnic origin, political opinions, religion or philosophical beliefs, trade union membership, and the processing of genetic data, data concerning health or data concerning sex life or criminal convictions and offences or related security measures;
• where personal aspects are evaluated, in particular analysing or predicting aspects concerning performance at work, economic situation, health, personal preferences or interests, reliability or behaviour, location or movements, in order to create or use personal profiles;
• where personal data of vulnerable natural persons, in particular of children, are processed; or
• where processing involves a large amount of personal data and affects a large number of data subjects.
Article 35 basically requires controllers to err on the side of caution by foreseeing risks to the rights and freedoms of natural persons. One could qualify this as the introduction of the principle of precaution in data protection law. Note that the assessment does not merely regard potential violations of the rights and obligations stipulated in the GDPR but focuses on ‘rights and freedoms’ in a more general sense, which links up with the goal of the GDPR as formulated in Article 2.2: ‘[t]his Regulation protects fundamental rights and freedoms of natural persons and in particular their right to the protection of personal data’. Moreover, the assessment of such a risk is not limited to data subjects but refers to ‘natural persons’, which includes individuals that run a (p.272) risk of being discriminated against even though their personal data are not (yet) being processed.
10.3.3.2 Data protection by default and by design (DPbDD)
Article 35.7(d) clearly indicates that a DPIA incorporates an assessment of the need for data protection by default and by design (DPbDD), as it requires an inventory of ‘the measures envisaged to address the risks, including safeguards, security measures and mechanisms to ensure the protection of personal data and to demonstrate compliance with this Regulation taking into account the rights and legitimate interests of data subjects and other persons concerned’. This brings us to Article 25, which requires to design systems that process personal data in such a way that data minimization is achieved by default, while incorporating all other GDPR obligations into the design of the system:
1. Taking into account the state of the art, the cost of implementation and the nature, scope, context and purposes of processing as well as the risks of varying likelihood and severity for rights and freedoms of natural persons posed by the processing, the controller shall,
• both at the time of the determination of the means for processing and at the time of the processing itself,
• implement appropriate technical and organisational measures, such as pseudonymisation, which are designed to implement data-protection principles, such as data minimisation,
• in an effective manner and to integrate the necessary safeguards into the processing in order to meet the requirements of this Regulation and protect the rights of data subjects.
2. The controller shall implement appropriate technical and organisational measures for ensuring that,
• by default,
• only personal data which are necessary for each specific purpose of the processing are processed.
That obligation applies to
• the amount of personal data collected,
• the extent of their processing,
• the period of their storage and
• their accessibility.
In particular, such measures shall ensure that by default personal data are not made accessible without the individual’s intervention to an indefinite number of natural persons.
(p.273) Here again, we can observe a requirement to err on the side of caution, basically echoing longstanding security principles, such as ‘select before you collect’. In paragraph 2, for instance, we read that technical and organizational measures must be in place to ensure that only data that is necessary for each specific processing purpose is processed (data minimization and purpose limitation). Though ‘privacy by design’ has deep roots in privacy engineering communities, the big difference with the new legal obligation is that this is no longer a matter of the arbitrary preferences of a company or public body that is ‘being ethical’ about their processing operations.
Though DPbDD is not to be taken lightly, it does not require what is not feasible. The obligation takes into account ‘the state of the art, the cost of implementation and the nature, scope, context and purposes of processing’ (first paragraph), meaning that measures must be doable, also in light of the business model. However, this does not mean that anything goes if the business model does not fly without taking disproportionate risks with the rights and freedoms of natural persons. Here again, as with the DPIA, those risks must be taken into account when designing (engineering) the processing operations. The proportionality depends on ‘the risks of varying likelihood and severity’, meaning that the higher the risks the more protection must be implemented ‘by design’.
10.3.3.3 Automated decisions
This brings us to a third example of LPbD in the context of the GDPR that is highly relevant for both ML applications and DLTs, as it targets the implications of automated decisions. Article 22 GDPR reads:
The data subject shall have the right not to be subject to a decision
• based solely
• on automated processing, including profiling,
• which produces legal effects concerning him or her or
• similarly significantly affects him or her.
The legal effect of the four legal conditions (two of which are alternative), is a prohibition. Even though this prohibition is formulated in a rather complicated way, the European Data Protection Board (EDPB, formerly Article 29 Working Party) has clarified that this ‘right not to be subject to’ must be understood as a prohibition.2 Note that each term in this set of legal conditions requires an act of interpretation that is not obvious in the light of technologies such as ML and DLT. For instance, which of the decisions taken by machines in the course of a machine learning operation qualify as a decision in the sense of Article 22.1: the decision of an algorithm to adept weights within a neural net, where such a decision will result in a refusal to provide credit? or, the decision to select four of the nineteen features that have some impact on a specified health risk, where such a decision results, for example, in a person being advised to undergo an operation or in a person being charged with tax fraud? Does ‘solely’ refer to machine decisions that directly affect a data subject (e.g. online acceptance of health insurance), or also to decisions that have been prepared by a software program but are ‘stamped’ by a human person who, however, does not understand how the system came to its conclusion and cannot explain to the data subject why she was not, for example, selected for a job interview? The EDPB finds that ‘[t]he controller cannot avoid the Article 22 provisions by fabricating human involvement’.3 Does the fact that automated processing is qualified as ‘including profiling’ imply that ‘smart contracts’ that do not involve profiling in the sense of Article 4(4) do not fall within the scope of Article 22? Note that English grammar answers that question, due to the fact that a comma is inserted after processing (check the rules for restrictive and non-restrictive modifiers).
When does a decision produce legal effect? The EDPB clarifies that this is the case if the decision ‘affects someone’s legal rights, such as the freedom to associate with others, vote in an election, or take legal action. ( … ) affects (p.275) a person’s legal status or their rights under a contract’.4 Any other ‘similarly significant effect’ also results in a prohibition, for example, as the EDPB writes:5
For data processing to significantly affect someone the effects of the processing must be sufficiently great or important to be worthy of attention. In other words, the decision must have the potential to:
• significantly affect the circumstances, behaviour or choices of the individuals concerned;
• have a prolonged or permanent impact on the data subject; or
• at its most extreme, lead to the exclusion or discrimination of individuals.
It is difficult to be precise about what would be considered sufficiently significant to meet the threshold, although the following decisions could fall into this category:
• decisions that affect someone’s financial circumstances, such as their eligibility to credit;
• decisions that affect someone’s access to health services;
• decisions that deny someone an employment opportunity or put them at a serious disadvantage;
• decisions that affect someone’s access to education, for example university admissions.
Having laid out the scope of the prohibition, Article 22 continues with three exceptions:
2. Paragraph 1 shall not apply if the decision:
a) is necessary for entering into, or performance of, a contract between the data subject and a data controller;
b) is authorised by Union or Member State law to which the controller is subject and which also lays down suitable measures to safeguard the data subject’s rights and freedoms and legitimate interests; or
c) is based on the data subject’s explicit consent.
Here again, a number of questions can be raised. The reader is advised to carefully study the EDPB Guidelines on Automated Individual Decision Making and Profiling, to gain a proper understanding of how these exceptions must be interpreted.
1. In the cases referred to in points (a) and (c) of paragraph 2, the data controller shall implement suitable measures to safeguard the data subject’s rights and freedoms and legitimate interests, at least the right to obtain human intervention on the part of the controller, to express his or her point of view and to contest the decision.
So, in the case of a decision based on automated processing that is necessary for a contract or a decision based on consent, access to human intervention is required, both to express one’s point of view and to contest the decision. This is related to recital (71), which adds another requirement:
In any case, such processing should be subject to suitable safeguards, which should include specific information to the data subject and the right to obtain human intervention, to express his or her point of view, to obtain an explanation of the decision reached after such assessment and to challenge the decision.
Here we find the right to obtain an explanation of the decision, which many authors interpret as being a precondition to be able to contest the decision (as required in Article 22.3). By now, a number of scientific papers have been published on ‘the right to an explanation’ and ‘explainable AI’, which are deemed highly relevant also due to potential unwarranted bias. This ‘right to an explanation’ can also be read into the transparency requirements in Articles 13.2(f), 14.2(g), and 15.1(h), which all require that the following information will be provided:
• the existence of automated decision-making, including profiling, referred to in Article 22(1) and (4) and, at least in those cases,
• meaningful information about the logic involved, as well as
• the significance and the envisaged consequences of such processing for the data subject.
Data controllers have a legal obligation to provide such information, both when the data has been provided by the data subject (Article 13), and when data has not been obtained from the data subject (Article 14), while data subjects have a right to obtain such information (Article 15). Note that the obligation to provide these three types of information does not depend on a request by the data subject but must be provided anyway. Just imagine what this could mean for an iot system that runs on real-time ML applications, or for online credit applications based on ML inferences of credit worthiness.
4. Decisions referred to in paragraph 2 shall not be based on special categories of personal data referred to in Article 9(1), unless point (a) or (g) of Article 9(2) applies and suitable measures to safeguard the data subject’s rights and freedoms and legitimate interests are in place.
The exceptions generally do not apply to automated decisions that are based on Article 9 data. Now think of unintended machine bias based on proxies that result in indirect racial discrimination as described above in section 10.1. There is no case law yet on how this prohibition must be interpreted, but we can imagine that Article 22.4 may provide far-reaching protection if properly interpreted in a balanced way.
Article 22 repeatedly speaks of ‘suitable measures to safeguard the data subject’s rights and freedoms and legitimate interests’. The EDPB clarifies that this includes technical measures. They write:
Errors or bias in collected or shared data or an error or bias in the automated decision-making process can result in:
• incorrect classifications; and
• assessments based on imprecise projections; that
• impact negatively on individuals.
Controllers should carry out frequent assessments on the data sets they process to check for any bias, and develop ways to address any prejudicial elements, including any over-reliance on correlations.
Systems that audit algorithms and regular reviews of the accuracy and relevance of automated decision-making including profiling are other useful measures.
Controllers should introduce appropriate procedures and measures to prevent errors, inaccuracies or discrimination on the basis of special category data. These measures should be used on a cyclical basis; not only at the design stage, but also continuously, as the profiling is applied to individuals. The outcome of such testing should feed back into the system design.
These types of ‘safeguards’ exemplify how LPbD can be turned into an operational requirement that guides the design of personal data processing systems, ruling out unwarranted violations of data protection law, while providing practical and effective protection at the level of the technical and organizational infrastructure.
On machine learning
Mitchell, Thomas. 1997. Machine Learning. 1st ed. New York: McGraw-Hill Education.
Mitchell, Tom M. 2017. ‘Key Ideas in Machine Learning’. In Machine Learning, draft for the 2nd ed., 1–11.
On p-hacking and other risks in ML
Berman, Ron, Leonid Pekelis, Aisling Scott, and Christophe Van den Bulte. 2018. ‘P-Hacking and False Discovery in A/B Testing’. SSRN Scholarly Paper ID 3204791. Rochester, NY: Social Science Research Network. https://papers.ssrn.com/abstract=3204791.
Hildebrandt, Mireille. 2018. ‘Preregisration of Machine Learning Research Design. Against P-Hacking’. In Being Profiled: Cogitas Ergo Sum. Amsterdam: Amsterdam University Press.
Hofman, Jake M., Amit Sharma, and Duncan J. Watts. 2017. ‘Prediction and Explanation in Social Systems’. Science 355 (6324): 486–88. https://doi.org/10.1126/science.aal3856 (quotation at p. 487).
On bias in ML applications
Angwin, Julia, Jeff Larson, Surya Mattu, and Kirchner. 2016. ‘Machine Bias: There’s Software Used Across the Country to Predict Future Criminals. And It’s Biased Against Blacks’. ProPublica. 23 May 2016. https://www.propublica.org/article/machine-bias-risk-assessments-in-criminal-sentencing.
Barocas, Solon, and Andrew D. Selbst. 2016. ‘Big Data’s Disparate Impact’. California Law Review 104: 671–732.
Chouldechova, Alexandra. 2017. ‘Fair Prediction with Disparate Impact: A Study of Bias in Recidivism Prediction Instruments’. Big Data 5 (2): 153–63. https://doi.org/10.1089/big.2016.0047.
Yong, Ed. 2018. ‘A Popular Algorithm is No Better at Predicting Crimes than Random People’. The Atlantic, January. https://www.theatlantic.com/technology/archive/2018/01/equivant-compas-algorithm/550646/.
On the potential and real effects of ML on public space, democracy, and freedom of expression
Pariser, Eli. 2011. The Filter Bubble: What the Internet is Hiding from You. London: Penguin.
Tufekci, Zeynep. 2018. ‘How Social Media Took Us from Tahrir Square to Donald Trump’. MIT Technology Review, September/October. https://www.technologyreview.com/s/611806/how-social-media-took-us-from-tahrir-square-to-donald-trump/.
Re the fundamentals of ‘smart contracts’
Buterin, Vitalik. 2014. ‘A Next-Generation Smart Contract and Decentralized Application Platform. White Paper’. Ethereum Platform.
Nakamoto, Satoshi. 2008. ‘Bitcoin: A Peer-to-Peer Electronic Cash System’. http://www.bitcoin.org/bitcoin.pdf.
Szabo, Nick. 1997. ‘Formalizing and Securing Relationships on Public Networks’. First Monday 2 (9). http://firstmonday.org/ojs/index.php/fm/article/view/548.
Re the writing of ‘smart contracts’
Seijas, Pablo Lamela, Simon J. Thompson, and Darryl McAdams. 2016. ‘Scripting Smart Contracts for Distributed Ledger Technology’. IACR Cryptology EPrint Archive 2016: 1156.
Re ‘blockchain’ and the GDPR
Finck, Michèle. 2018. ‘Blockchains and Data Protection in the European Union’. European Data Protection Law Review 4 (1): 17–35. https://doi.org/10.21552/edpl/2018/1/6.
Re compatibility of ‘smart contracts’ with Article 22 GDPR
Art. 29 Working Party WP251rev.01, Guidelines on Automated individual decision-making and Profiling for the purposes of Regulation 2016/679. https://ec.europa.eu/newsroom/article29/item-detail.cfm?item_id=612053.
CNIL, September 2018, ‘Solutions for a responsible use of the blockchain in the context of personal data’. https://www.cnil.fr/sites/default/files/atoms/files/blockchain.pdf.
Re legal contracts and ‘smart contracts’
Allen, J.G. 2018. ‘Wrapped and Stacked: “Smart Contracts” and the Interaction of Natural and Formal Language’. European Review of Contract Law 14 (4): 307–43. https://doi.org/10.1515/ercl-2018-1023.
Raskin. M. 2017. ‘The Law and Legality of Smart Contracts’. Georgetown Law and Technology Review 1 (2): 304–41.
Verstraete, Mark. 2018. ‘The Stakes of Smart Contracts’. SSRN Scholarly Paper ID 3178393. Rochester, NY: Social Science Research Network. https://papers.ssrn.com/abstract=3178393.
Re Legal by Design and Legal Protection by Design
Filippi, Primavera De, and Samer Hassan. 2016. ‘Blockchain Technology as a Regulatory Technology: From Code Is Law to Law Is Code’. First Monday 21 (12). http://firstmonday.org/ojs/index.php/fm/article/view/7113.
Hildebrandt, Mireille. 2017. ‘Saved by Design? The Case of Legal Protection by Design’. NanoEthics, August, 1–5. https://doi.org/10.1007/s11569-017-0299-0.
Lippe, Paul, Daniel Martin Katz, and Dan Jackson. 2015. ‘Legal by Design: A New Paradigm for Handling Complexity in Banking Regulation and Elsewhere in Law’. Oregon Law Review 93 (4). http://papers.ssrn.com/abstract=2539315.
Van den Berg, Bibi, and Ronald E. Leenes. 2013. ‘Abort, Retry, Fail: Scoping Techno-Regulation and Other Techno-Effects’. In Human Law and Computer Law: Comparative Perspectives, edited by Mireille Hildebrandt and Jeanne Gaakeer, 67–87. Ius Gentium: Comparative Perspectives on Law and Justice 25. Springer Netherlands. http://link.springer.com/chapter/10.1007/978-94-007-6314-2_4.
(1) Here, I use ‘distribution’ to refer to the physical location of the code or the data, and ‘centralization’ to refer to the power structure (who decides what).
(2) Article 29 Working Party WP251rev.01, Guidelines on Automated individual decision-making and Profiling for the purposes of Regulation 2016/679, at 19.
(3) Ibid. at 21.
(4) Ibid. at 21.
(5) Ibid. at 21–22.