{"title": "An Algorithmic Framework For Differentially Private Data Analysis on Trusted Processors", "book": "Advances in Neural Information Processing Systems", "page_first": 13657, "page_last": 13668, "abstract": "Differential privacy has emerged as the main definition for private data analysis and machine learning. The global model of differential privacy, which assumes that users trust the data collector, provides strong privacy guarantees and introduces small errors in the output. In contrast, applications of differential privacy in commercial systems by Apple, Google, and Microsoft, use the local model. Here, users do not trust the data collector, and hence randomize their data before sending it to the data collector. Unfortunately, local model is too strong for several important applications and hence is limited in its applicability. In this work, we propose a framework based on trusted processors and a new definition of differential privacy called Oblivious Differential Privacy, which combines the best of both local and global models. The algorithms we design in this framework show interesting interplay of ideas from the streaming algorithms, oblivious algorithms, and differential privacy.", "full_text": "An Algorithmic Framework For Differentially Private\n\nData Analysis on Trusted Processors\n\nJoshua Allen\nHarsha Nori\n\nBolin Ding\u2217\n\nOlga Ohrimenko\n\nMicrosoft\n\nAbstract\n\nJanardhan Kulkarni\n\nSergey Yekhanin\n\nDifferential privacy has emerged as the main de\ufb01nition for private data analysis and\nmachine learning. The global model of differential privacy, which assumes that\nusers trust the data collector, provides strong privacy guarantees and introduces\nsmall errors in the output.\nIn contrast, applications of differential privacy in\ncommercial systems by Apple, Google, and Microsoft, use the local model. Here,\nusers do not trust the data collector, and hence randomize their data before sending\nit to the data collector. Unfortunately, local model is too strong for several important\napplications and hence is limited in its applicability. In this work, we propose\na framework based on trusted processors and a new de\ufb01nition of differential\nprivacy called Oblivious Differential Privacy, which combines the best of both\nlocal and global models. The algorithms we design in this framework show\ninteresting interplay of ideas from the streaming algorithms, oblivious algorithms,\nand differential privacy.\n\n1\n\nIntroduction\n\nMost large IT companies rely on access to raw data from their users to train machine learning models.\nHowever, it is well known that models trained on a dataset can release private information about the\nusers that participate in the dataset [13, 50]. With new GDPR regulations and also ever increasing\nawareness about privacy issues in the general public, doing private and secure machine learning\nhas become a major challenge to IT companies. To make matters worse, while it is easy to spot a\nviolation of privacy when it occurs, it is much more tricky to give a rigorous de\ufb01nition of it.\nDifferential privacy (DP), introduced in the seminal work of Dwork et al. [20], is arguably the only\nmathematically rigorous de\ufb01nition of privacy in the context of machine learning and big data analysis.\nOver the past decade, DP has established itself as the defacto standard of privacy with a vast body\nof research and growing acceptance in industry. Among its many strengths, the promise of DP is\nintuitive to explain: No matter what the adversary knows about the data, the privacy of a single user\nis protected from output of the data-analysis. A differentially private algorithm guarantees that the\noutput does not change signi\ufb01cantly, as quanti\ufb01ed by a parameter \u0001, if the data of any single user is\nomitted from the computation, which is formalized as follows.\nDe\ufb01nition 1.1. A randomized algorithm A is (\u0001, \u03b4)-differentially private if for any two neighboring\ndatabases D1,D2 any subset of possible outputs S \u2286 Z, we have:\n\nPr [A(D1) \u2208 S] \u2264 e\u0001 \u00b7 Pr [A(D2) \u2208 S] + \u03b4.\n\n\u2217Current af\ufb01liation: Alibaba Group. Work done while at Microsoft.\n\n33rd Conference on Neural Information Processing Systems (NeurIPS 2019), Vancouver, Canada.\n\n\fThis above de\ufb01nition of DP is often called global differential privacy (GDP). It assumes that users are\nwilling to trust the data collector. There is a large body of work on GDP, and many non-trivial machine\nlearning problems can be solved in this model very ef\ufb01ciently. See authoritative book by Dwork and\nRoth [22] for more details. However in the context of IT companies, adoption of GDP is not possible\nas there is no trusted data collector \u2013 users want privacy of their data from the data collector. Because\nof this, all industrial deployments of DP by Apple, Google, and Microsoft, with the exception of\nUber [32], have been set in the so called local model of differential privacy (LDP) [24, 19, 18]. In the\nLDP model, users randomize their data before sending it to the data collector.\nDe\ufb01nition 1.2. A randomized algorithm A : V \u2192 Z is \u0001-locally differentially private (\u0001-LDP) if for\nany pair of values v, v(cid:48) \u2208 V held by a user and any subset of output S \u2286 Z, we have:\n\nPr [A(v) \u2208 S] \u2264 e\u0001 \u00b7 Pr [A(v(cid:48)) \u2208 S] .\n\nn), whereas in the global model error is O( 1\n\nDespite its very strong privacy guarantees, the local model has several drawbacks compared to the\nglobal model: many important problems cannot be solved in the LDP setting within a desired level of\naccuracy. Consider the simple task of understanding the number of distinct websites visited by users,\nor words in text data. This problem admits no good algorithms in LDP setting, whereas in the global\nmodel the problem becomes trivial. Even for problems that can be solved in LDP setting [24, 7, 6, 19],\nerrors and \u0001 are signi\ufb01cantly larger compared to the global model. For example, if one is interested in\n\u221a\nunderstanding the histogram of websites visited by users, in the LDP setting an optimal algorithm\nachieves an error of \u2126(\n\u0001 ). See experiments and details\nin [11] for scenarios where the errors introduced by (optimal) LDP algorithms are unacceptable\nin practice. Finally, in GDP there are several results that give much stronger guarantees than the\nstandard composition theorems: for example, one can answer exponentially many linear queries\n(even online) using private multiplicative weight update algorithm [21]. Such results substantially\nincrease the practical relevance of GDP algorithms. However, the local model of differential privacy\nadmits no such elegant solutions.\nThese drawbacks of LPD naturally lead to the following question:\nAre there ways to bridge the local and global differential privacy models such that users enjoy the\nprivacy guarantees of the local model whereas the data collector enjoys the accuracy of the global\nmodel?\nThis question has attracted a lot of interest in the research community recently. In remarkable recent\nresults, the authors of [5, 15, 23] propose a secure shuf\ufb02e as a way to bridge local and global models\nof DP. They show that if one has access to a user anonymization primitive, and if every user uses a local\nDP mechanism, then the overall privacy-accuracy trade-off is similar to the global model. However,\naccess to anonymization primitives that users can trust is a dif\ufb01cult assumption to implement in\npractice, and only shifts the trust boundary. For example, implementing the anonymization primitive\nvia mixnets requires assumption on non-collusion between the mixing servers. Recall that the main\nreason most companies adopted LDP setting is because users do not trust the data collector.\nIn this paper, we propose a different approach based on trusted processors (for example, Intel\nSGX [31]) and a new de\ufb01nition called Oblivious Differential Privacy (ODP) that help to design\nalgorithms that enjoy the privacy guarantees of both local and global models; see Figure 1 (left) for\nan illustration. Our framework gives the following guarantees.\n\n1. Data is collected, stored, and used in an encrypted form and is protected from the data collector.\n2. The data collector obtains information about the data only through the results of a DP-algorithm.\n\nThe DP-algorithms themselves run within a Trusted Execution Environments (TEE) that guarantee\nthat the data is decrypted only by the processor during the computation and is always encrypted in\nmemory. Hence, raw data is inaccessible to anyone, including the data collector. To this end, our\nframework is similar to other systems for private data analysis based on trusted processors, including\nmachine learning algorithms [41] and data analytical platforms such as PROCHLO [11, 48]. Recently,\nsystems for supporting TEEs have been announced by Microsoft2 and Google 3, and we anticipate a\nwide adoption of this model for doing private data analysis and machine learning.\n\n2\u201cIntroducing Azure con\ufb01dential computing\u201d, accessed October 26, 2019.\n3\u201cIntroducing Asylo: an open-source framework for con\ufb01dential computing\u201d, accessed October 26, 2019.\n\n2\n\n\fFigure 1: Left: Secure differentially-private data-analysis. Right: Visualization of the access pattern\nof a naive histogram computation over a database and four age ranges counters (k = 4) stored in\narrays a and b, respectively. The code reads a record from a, decrypts it, accesses the corresponding\nage bucket in b, decrypts its counter, increments it, encrypts it and writes it back. The arrows indicate\nincrement accesses to the histogram counters (b) and the numbers correspond to records of a that\nwere accessed prior to these accesses. An adversary of the accesses to a and b learns the histogram\nand which database records belong to the same age range.\n\nThe private data analysis within trusted processors has to be done carefully since the rest of the\ncomputational environment is untrusted and is assumed to be under adversary\u2019s control. Though\nthe data is encrypted, the adversary can learn private information based on memory access patterns\nthrough caches and page faults. In particular, memory access patterns are often dependent on private\ninformation and have been shown to be suf\ufb01cient to leak information [42, 10, 55]. Since differentially\nprivate algorithms in the global model have been designed with a trusted data collector in mind, they\nare also susceptible to information leakage through their access patterns.\nThe main goal of our paper is to formalize the design of differentially private algorithms in the trusted\nprocessor environments. Our contributions are summarized below:\n\u2022 Building on the recent works of [41, 11], we propose a framework that enables collection and\nanalysis of data in the global model of differential privacy without relying on a trusted curator.\nOur framework uses encryption and secure processors to protect data and computation such that\nonly the \ufb01nal differentially private output of the computation is revealed.\n\u2022 Trusted execution environments impose certain restrictions on the design of algorithms. We\n\u2022 We de\ufb01ne a new class of differentially-private algorithms (De\ufb01nition 3.1) called obliviously\ndifferentially private algorithms (ODP), which ensure that privacy leakage that occurs through\nalgorithm\u2019s memory access patterns and the output together satisfy the DP guarantees.\n\u2022 We design ODP algorithms with provable performance guarantees for some commonly used\nstatistical routines such as computing the number of distinct elements, histogram, and heavy\nhitters. We prove that the privacy and error guarantees of our algorithms (Theorems (4.1, 4.4,4.5)\nare signi\ufb01cantly better than in the local model, and obliviousness does not come at a steep price.\n\nformalize the mathematical model for designing differentially private algorithms in TEEs.\n\nA technically novel aspect of our paper is that it draws ideas from various different \ufb01elds: streaming\nalgorithms, oblivious algorithms, and differentially private algorithms. This fact becomes clear in \u00a74\nwhere we design ODP algorithms.\n\nRelated work There are several systems that propose con\ufb01dential data analysis using TEEs [41, 56,\n48, 11]. PROCHLO [11], in particular, provides support for differential private data analysis. While\nPROCHLO emphasizes more on the system aspects (without formal proofs), our work gives a formal\nframework based on oblivious differential privacy for analyzing and designing algorithms for private\ndata analysis in TEEs. Oblivious sampling algorithms proposed in [47] generate samples securely\ns.t. privacy ampli\ufb01cation can be used when analyzing DP algorithms executed on the samples in TEE.\n\n2 Preliminaries\n\n2.1 Secure Data Analysis with Trusted Processors and Memory Access Leakage\n\nA visualization of our framework is given in Figure 1 (left). We use Intel Software Guard Extensions\n(SGX) as an example of a trusted processor. Intel SGX [31] is a set of CPU instructions that allows\n\n3\n\nData collectorUsers+Query\u2026Differentially PrivateData Analysissecret keysbudgetTrusted Execution Environmentb: EncryptedCountersRecord #12345678910Age20201845173225511563a: EncryptedDatabaseAccesses to b0234567891< 2020-3940-59\u2265 60342113456789102noiseExternal memory\fuser-level code to allocate a private region of memory, called an enclave (which we also refer to as a\nTEE), which is accessible only to the code running in an enclave. The enclave memory is available in\nraw form only inside the physical processor package, but it is encrypted and integrity protected when\nwritten to memory. As a result, the code running inside of an enclave is isolated from the rest of the\nsystem, including the operating system. Additionally, Intel SGX supports software attestation [3]\nthat allows the enclave code to get messages signed with a private key of the processor along with\na digest of the enclave. This capability allows users to verify that they are communicating with a\nspeci\ufb01c piece of software (i.e., a differentially-private algorithm) running in an enclave hosted by\nthe trusted hardware. Once this veri\ufb01cation succeeds, the user can establish a secure communication\nchannel with the enclave (e.g., using TLS) and upload data. When the computation is over, the\nenclave, including the local variables and data, is deleted.\nAn enclave can access data that is managed by the trusted processor (e.g., data in registers and\ncaches) or by the software that is not trusted (e.g., an operating system). As a result, in the latter\ncase, data in the external memory has to be encrypted and integrity protected by the code running\ninside of an enclave. Unfortunately, encryption and integrity are not suf\ufb01cient to protect against\nthe adversary described in the introduction that can see the addresses of the data being accessed\neven if the data is encrypted. There are several ways the adversary can extract the addresses, i.e.,\nthe memory access pattern. Some typical examples are: an adversary with physical access to a\nmachine can attach probes to a memory bus, an adversary that shares the same hardware as the\nvictim enclave code (e.g., a co-tenant) can use shared resources such as caches to observe cache-\nline accesses, while a compromised operating system can inject page faults and observe page-level\naccesses. Memory access patterns have been shown to be suf\ufb01cient to extract secrets and data from\ncryptographic code [34, 43, 10, 45, 42], from genome indexing algorithms [12], and from image\nand text applications [55]. (See Figure 1 (right) for a simple example of what can be extracted\nby observing accesses of a histogram computation.) As a result, accesses leaked through memory\nside-channels 4 undermine the con\ufb01dentiality promise of enclaves [55, 49, 30, 35, 17, 12, 38].\n\n2.2 Data-Oblivious Algorithms\n\nData-oblivious algorithms [26, 41, 52, 27] are designed to protect memory addresses against the\nadversary described in \u00a72.1: they produce data access patterns that appear to be independent of\nthe sensitive data they compute on. They can be seen as external-memory algorithms that perform\ncomputation inside of small private memory while storing the encrypted data in the external memory\nand accessing it in a data-independent manner. We formally capture this property below. Suppose\nexternal memory is represented by an array a[1, 2, ..., M ] for some large value of M.\nDe\ufb01nition 2.1 (Access pattern). Let opj be either a read(a[i]) operation that reads data from the\nlocation a[i] to private memory or a write(a[i]) operation that copies some data from the private\nmemory to the external memory a[i]. Then, let s := (op1, op2, . . . , opt) denote an access pattern of\nlength t of algorithm A to the external memory.\nNote that the adversary can see only the addresses accessed by the algorithm and whether it is a read\nor a write. It cannot see the data since it is encrypted using probabilistic encryption that guarantees\nthat the adversary cannot tell if two ciphertexts correspond to the same record or two different ones.\nDe\ufb01nition 2.2 (Data-oblivious algorithm). An algorithm A is data-oblivious if for any two inputs I1\nand I2, and any subset of possible memory access patterns S \u2286 S, where S is the set of all possible\nmemory access patterns produced by an algorithm, we have:\n\nPr [A(I1) \u2208 S] = Pr [A(I2) \u2208 S]\n\nIt is instructive to compare this de\ufb01nition with the de\ufb01nition of differential privacy. The de\ufb01nition of\noblivious algorithms can be thought of as a generalization of DP to memory access patterns, where\n\u0001 = 0 and the guarantee should hold even for non-neighboring databases. Similar to external memory\nalgorithms, the overhead of a data-oblivious algorithm is measured in the number of accesses it\nmakes to external memory, while computations on private memory are assumed to be constant. Some\nalgorithms naturally satisfy De\ufb01nition 2.2 while others require changes to how they operate. For\nexample, scanning an array is a data-oblivious algorithm since for any array of the same size every\nelement of the array is accessed. Sorting networks [9] are also data-oblivious as element identi\ufb01ers\n\n4This should not be confused with vulnerabilities introduced by \ufb02oating-point implementations [37].\n\n4\n\n\faccessed by compare-and-swap operations are \ufb01xed based on the size of the array and not its content.\nOn the other hand, quicksort is not oblivious as accesses depend on the comparison of the elements\nwith the pivot element. As can be seen from Figure 1 (right), a naive histogram algorithm is also not\ndata-oblivious. (See the supplementary material for overheads of several oblivious algorithms.)\nIn this paper, we focus on measuring the overhead in performance in terms of the number of memory\naccesses of oblivious algorithms. We omit the cost of setting up a TEE, which is a one-time cost\nproportional to the size of the code and data loaded in a TEE, and the cost of encryption and\ndecryption, which is linear in the size of the data and is often implemented in hardware.\nOblivious RAM (ORAM) is designed to hide the indices of accesses to an array of size n, i.e., it\nhides how many times and when an index was accessed. There is a naive and inef\ufb01cient way to hide an\naccess by reading and writing to every index. Existing ORAM constructions incur sublinear overhead\nby using specialized data structures and re-arranging the external memory [26, 29, 46, 51, 54, 28].\nThe best known ORAM construction has O(log n) [4] overhead. Since it incurs high constants,\nPath ORAM [52] with the overhead of O((log n)2) is a preferred option in practice. ORAM can\nbe used to transform any RAM program whose number of accesses does not depend on sensitive\ncontent; otherwise, the number of accesses needs to be padded. However, if one is willing to reveal\nthe algorithm being performed on the data, then for some computations the overhead of specialized\nconstructions can be asymptotically lower than of the one based on ORAM.\nData-oblivious shuf\ufb02e [40] takes as a parameter an array a (stored in external memory) of size n\nand a permutation \u03c0 and permutes a according to \u03c0 such that \u03c0 is not revealed to the adversary.\n\u221a\nThe Melbourne shuf\ufb02e [40] is a randomized data-oblivious shuf\ufb02e that makes O(cn) deterministic\naccesses to external memory, assuming private memory of size c\nn, and fails with negligible proba-\nbility. For example, for c = 2 the overhead of the algorithm is constant as any non-oblivious shuf\ufb02e\nalgorithm has to make at least n accesses. The Melbourne shuf\ufb02e with smaller private memories\nof size m = \u03c9(log n) incurs slightly higher overhead of O(n log n/ log m) as showed in [44]. We\nwill use oblivious_shu\ufb04e(a) to refer to a shuf\ufb02e of a according to some random permutation that is\nhidden from the adversary.\nNote that the user anonymization primitive that the shuf\ufb02e model of differential privacy [5, 15, 23]\nrelies on can be implemented in TEEs with a data-oblivious shuf\ufb02e [11]. However, in this case the\ntrust model of the shuf\ufb02e model DP will be the same as described in the next section.\n\n3 Algorithmic Framework and Oblivious Differential Privacy\n\nWe now introduce the de\ufb01nition of Oblivious Differential Privacy (ODP), and give an algorithmic\nframework for the design of ODP-algorithms for a system based on trusted processors (\u00a72.1). As we\nmentioned earlier, off-the-shelf DP-algorithms may not be suitable for TEEs for two main reasons.\n\nSmall Private Memory: The private memory, which is protected from the adversary, available for\nan algorithm within a trusted processor is much smaller than the data the algorithm has to process.\nA reasonable assumption on the size of private memory is polylogarithmic in the input size.\n\nAccess Patterns Leak Privacy: An adversary who sees memory access patterns of an algorithm\nto external memory can learn useful information about the data, compromising the differential\nprivacy guarantees of the algorithm.\n\nTherefore, the algorithm designer needs to guarantee that memory access patterns do not reveal any\nprivate information, and the overall algorithm is differentially private 5. To summarize, in our attacker\nmodel private information is leaked either by the output of a DP algorithm or through memory\naccess patterns. We formalize this by introducing the notion of Oblivious Differential Privacy, which\ncombines the notions of differential privacy and oblivious algorithms.\nDe\ufb01nition 3.1. Let D1 and D2 be any two neighboring databases that have exactly the same size n\nbut differ in one record. A randomized algorithm A that has small private memory (i.e., sublinear\nin n) and accesses external memory is (\u0001, \u03b4)-obliviously differentially private (ODP), if for any subset\nof possible memory access patterns S \u2286 S and any subset of possible outputs O we have:\n\nPr [A(D1) \u2208 (O, S)] \u2264 e\u0001 \u00b7 Pr [A(D2) \u2208 (O, S)] + \u03b4.\n\n5Here, data collector runs algorithms on premise. See the supplementary material for restrictions in the cloud\n\nsetting.\n\n5\n\n\fWe believe that the above de\ufb01nition gives a systematic way to design DP algorithms in TEE settings.\nAn algorithm that satis\ufb01es the above de\ufb01nition guarantees that the private information released through\noutput of the algorithm and through the access patterns is quanti\ufb01ed by the parameters (\u0001, \u03b4). Similar\nto our de\ufb01nition, Wagh et al. [53] and more recently, in a parallel work, Chan et al. [14] also consider\nrelaxing the de\ufb01nition of obliviousness for hiding access patterns from an adversary. However, their\nde\ufb01nitions serve complementary purpose to ours: they apply DP to oblivious algorithms, whereas\nwe apply obliviousness to DP algorithms. This is crucial since algorithms that satisfy the de\ufb01nition\nin [53, 14] may not satisfy DP when the output is released, which is the main motivation for using\ndifferentially private algorithms. Our results together with [14] highlight that DP and oblivious\nalgorithms is an interesting area for further research for private and secure ML.\nRemarks: In the real world, the implementation of a TEE relies on cryptographic algorithms (e.g.,\nencryption and digital signatures) that are computationally secure and depend on a security parameter\nof the system. As a result any differentially private algorithm operating inside of a TEE has a non-zero\nparameter \u03b4 that is negligible in the security parameter.\nIn the paper, we only focus on memory accesses; but our de\ufb01nitions and framework can be easily\nextended to other forms of side-channel attacks such as timing attacks (e.g., by incorporating the time\nof each access), albeit requiring changes to algorithms presented in the next section to satisfy them.\nConnections to Streaming Algorithms: One simple strategy to satisfy De\ufb01nition 3.1 is to take a\nDP-algorithm and guarantee that every time the algorithm makes an access to the public memory,\nit makes a pass over the entire data. However, such an algorithm incurs a multiplicative overhead\nof n on the running time, and the goal would be to minimize the number of passes made over the\ndata. Interestingly, these algorithms precisely correspond to the streaming algorithms, which are\nwidely studied in big-data analysis. In the streaming setting, one assumes that we have only O(log n)\nbits of memory and data stream consists of n items, and the goal is to compute functions over the\ndata. Quite remarkably, several functions can be approximated very well in this model. See [39] for\nan extensive survey. Since there is a large body of work on streaming algorithms, we believe that\nmany algorithmic ideas there can be used in the design of ODP algorithms. We give example of such\nalgorithms for distinct elements \u00a74.1 and heavy hitters problem in \u00a74.3.\nPrivacy Budget: Finally, we note that the system based on TEEs can support interactive data\nanalysis where the privacy budget is hard-coded (and hence veri\ufb01ed by each user before they supply\ntheir private data). The data collector\u2019s queries decrease the budget appropriately and the code exits\nwhen privacy budget is exceeded. Since the budget is maintained as a local variable within the TEE it\nis protected from replay attacks while the code is running. If TEE exits, the adversary cannot restart\nit without notifying the users since the code requires their secret keys. These keys cannot be used for\ndifferent instantiations of the same code as they are also protected by TEE and are destroyed on exit.\n\n4 Obliviously Differentially Private Algorithms\n\nIn this section, we show how to design ODP algorithms for the three most commonly used statistical\nqueries: counting the number of distinct elements in a dataset, histogram of the elements, and\nreporting heavy hitters. The algorithms for these problems exhibit two common themes: 1) For many\napplications it is possible to design DP algorithms without paying too much overhead to enforce\nobliviousness. 2) The interplay of ideas from the streaming and oblivious algorithms literature in the\ndesign of ODP algorithms.\nBefore we continue with the construction of ODP algorithms, we make a subtle but important point.\nRecall that in our De\ufb01nition 3.1, we require that two neighboring databases have exactly the same\nsize. If the neighboring databases are of different sizes, then the access patterns can be of different\nlengths, and it is impossible to satisfy the ODP-de\ufb01nition. This de\ufb01nition does not change the privacy\nguarantees as in many applications the size of the database is known in advance; e.g., number of users\nof a system. However, it has implications on the sensitivity of the queries. For example, histogram\nqueries in our de\ufb01nition have sensitivity of 2 where in the standard de\ufb01nition it is 1.\n\n4.1 Number of Distinct Items in a Database\n\nAs a warm-up for the design of ODP algorithms, we begin with the distinct elements problem.\nFormally, suppose we are given a set of n users and each user i holds an item vi \u2208 {1, 2, ..., m},\n\n6\n\n\fwhere m is assumed to be much larger than n. This is true if a company wants to understand the\nnumber of distinct websites visited by its users or the number of distinct words that occur in text data.\nLet nv denote the number of users that hold the item v. The goal is to estimate n\u2217 := |{v : nv > 0}|.\nWe are not aware of a reasonable solution that achieves an additive error better than \u2126(n) for this\nproblem in the LDP setting. In a sharp contrast, the problem becomes very simple in our framework.\nIndeed, a simple solution is to do an oblivious sorting [1, 9] of the database elements, and then count\nthe number of distinct elements by making another pass over the database. Finally, one can add\nLaplace noise with the parameter 1\n\u0001 , which will guarantee that our algorithm satis\ufb01es the de\ufb01nition\nof ODP. This is true as a) the sensitivity of the query is 1, as a single user can increase or decrease\nthe number of distinct elements by at most 1; b) we do oblivious sorting. Furthermore, the expected\n(additive) error of such an algorithm is 1/\u0001. Recall that n\u2217 denotes the number of distinct elements in\na database. Thus we get:\nTheorem 4.1. There exists an oblivious sorting based (\u0001, 0)-ODP algorithm for the problem of\n\ufb01nding the number of distinct elements in a database that runs in time O(n log n). With probability\nat least 1 \u2212 \u03b8, the number of distinct elements output by our algorithm is (n\u2217 \u00b1 log(1/\u03b8) 1\n\u0001 ).\nWhile above algorithm is optimal in terms of error, we propose a more elegant streaming algorithm\nthat does the entire computation in the private memory. The main idea is to use a sketching technique\nto maintain an approximate count of the distinct elements in the private memory and report this\napproximate count by adding noise from Lap(1/\u0001). This will guarantee that our algorithm is\n(\u0001, 0)-ODP, as the entire computation is done in the private memory and the Laplace mechanism is\n(\u0001, 0)-DP. There are many streaming algorithms (e.g., Hyperloglog) [25, 33] which achieve (1 \u00b1 \u03b1)-\napproximation factor on the number of distinct elements with a space requirement of polylog(n). We\nuse the following (optimal) algorithm in [33].\nTheorem 4.2. There exists a streaming algorithm that gives a (1 \u00b1 \u03b1) multiplicative approximation\nfactor to the problem of \ufb01nding the distinct elements in a data stream. The space requirement of the\nalgorithm is at most log n\n\n\u03b12 + (log n)2 and the guarantee holds with probability 1 \u2212 1/n.\n\nIt is easy to convert the above algorithm to an ODP-algorithm by adding noise sampled from Lap( 1\n\u0001 ).\nTheorem 4.3. There exists a single pass (or online) (\u0001, 0)-ODP algorithm for the problem of \ufb01nding\nthe distinct elements in a database. The space requirement of the algorithm is at most log n\n\u03b12 + (log n)2.\nWith probability at least 1 \u2212 1/n \u2212 \u03b8, the number of distinct elements output by our algorithm is\n(1 \u00b1 \u03b1)n\u2217 \u00b1 log(1/\u03b8) 1\n\u0001 .\nThe additive error of \u00b1 log(1/\u03b8) 1\n\u0001 is introduced by the Laplace mechanism, and the multiplicative\nerror of (1 \u00b1 \u03b1) is introduced by the sketching scheme. Although this algorithm is not optimal in\nterms of the error compared to Theorem 4.1, it has the advantage that it can maintain the approximate\ncount in an online fashion.\n\n4.2 Histogram\n\nLet D be a database with n records. We assume that each record in the database has a unique identi\ufb01er.\nLet D denote all possible databases of size n. Each record (or element) r \u2208 D has a type, which,\nwithout loss of generality, is an integer in the set {1, 2, . . . , k}.\nFor a database D \u2208 D, let ni denote the number of elements of type i. Then the histogram function\nh : D \u2192 Rk is de\ufb01ned as h(D) := (n1, n2, . . . , nk).\nA simple differentially private histogram algorithm Ahist returns h(D) + (X1, X2, . . . , Xk) where Xi\nare i.i.d. random variables drawn from Lap(2/\u0001). This algorithm is not obliviously differentially\nprivate as the access pattern reveals to the adversary much more information about the data than the\nactual output. In this section, we design an ODP algorithm for the histogram problem. Let \u02c6ni denote\nthe number of elements of type i output by our histogram algorithm. We prove the following theorem\nin this paper.\nTheorem 4.4. For any \u0001 > 0, there is an (\u0001, 1\nin time O(\u02dcn log \u02dcn/ log log \u02dcn) where \u02dcn = max(n, k log n/\u0001). With probability 1 \u2212 \u03b8, it holds that\n\nn2 )-ODP algorithm for the histogram problem that runs\n\nmax\n\ni\n\n| \u02c6ni \u2212 ni| \u2264 log(k/\u03b8) \u00b7 2\n\u0001\n\n.\n\n7\n\n\fObserve that our algorithm achieves same error guarantee as that of global DP algorithm without\nmuch overhead in terms of running time.\nTo prove that our algorithm is ODP, we need to show that the distribution on the access patterns\nproduced by our algorithm for any two neighboring databases is approximately the same. The same\nshould hold true for the histogram output by our algorithm. We achieve this as follows. We want to\nuse the simple histogram algorithm that adds Laplace noise with parameter 2/\u0001, which we know is\n\u0001-DP. This is true since the histogram queries have sensitivity of 2; that is, if we change the record\nassociated with the single user, then the histogram output changes for at most 2 types. Note that if the\nprivate memory size is larger than k, then the task is trivial. One can build the entire DP-histogram in\nthe private memory by making a single pass over the database, which clearly satis\ufb01es our de\ufb01nition of\nODP. However, in many applications k (cid:29) O(log n). A common example is a histogram on bigrams\nof words which is commonly used in text prediction. When k (cid:29) log n the private memory is not\nenough to store the entire histogram, and we need to make sure that memory access patterns do not\nreveal too much information to the adversary.\nOne can make the naive histogram algorithm satisfy De\ufb01nition 3.1 by accessing the entire public\nmemory for every read/write operation, incurring an overhead of O(nk). Another method to solve\nthe histogram problem would be to rely on oblivious sorting algorithms. The overhead of this method\nwould be the overhead of sorting. However, sorting is usually an expensive operation in practice. Here,\nwe give an arguably simpler and faster algorithm for larger values of k that satis\ufb01es the de\ufb01nition\nof ODP. (At a high level our algorithm is similar to the one which appeared in the independent\nwork by Mazloom and Gordon [36], who use a differentially private histogram to protect access\npatterns of graph-parallel computation based on garbled circuits, as a result requiring a different noise\ndistribution and shuf\ufb02ing algorithm.)\nWe give a high-level overview of the algorithm here, and defer the pseudo-code and analysis to the\nsupplementary material. Let T = n + 20k log n/\u0001.\n1. Sample k random variables X1, X2, . . . , Xk from Lap(2/\u0001). If any |Xi| > 10 log n/\u0001, then we\n\nset Xi = 0 for all i = 1, 2, .., k. For all i, set Xi = (cid:100)Xi(cid:101).\n\n2. We create (10 log n/\u0001 + Xi) fake records of type i and append it to the database D. This step\ntogether with step 1 ensures that (10 log n/\u0001 + Xi) is always positive. The main reason to restrict\nthe Laplace distribution\u2019s support to [\u221210 log n/\u0001, 10 log n/\u0001] is to ensure that we only have\npositive noise. If the noise is negative, we cannot create fake records in the database simulating\nthis noise.\n\n3. Next, we create (10k log n/\u0001 \u2212(cid:80)\n\ni Xi) dummy records in the database D, which do not corre-\nspond to any particular type in 1..k. The dummy records have type k + 1. The purpose of this\nstep is to ensure that the length of the output is exactly T .\n\n4. Let \u02c6D be the augmented database that contains both dummy and fake records, where the\nadversary cannot distinguish between database, dummy and fake records as they are encrypted\nusing probabilistic encryption. Obliviously shuf\ufb02e \u02c6D [40] so that the mapping of records to\narray a[1, 2, ..., T ] is uniformly distributed.\n\n5. Initialise b with k zero counters in external memory. Scan every element from the array\na[1, 2, ..., T ] and increment the counter in histogram b associated with type of a[i]. If the record\ncorresponds to a dummy element, then access the array b[1, 2, ..., k] in round-robin fashion and\ndo a fake write without modifying the actual content of b.\n\nIn the full version [2], we show that above algorithm is (\u0001, 1\nn2 )-ODP for any \u0001 > 0. While the proof\nis long, intuition is simple: The shuf\ufb02e operation gives a uniform distribution on how \u02c6D is stored\nin public memory. Since our algorithm is deterministic after the shuf\ufb02e operation \u2013 it just makes a\nsingle pass over \u02c6D \u2013 the only information adversary learns from the memory access patterns is the\nhistogram of the elements + noise. Now, Laplace noise guarantees that it is DP.\n\n4.3 Heavy Hitters\n\nAs a \ufb01nal example, we consider the problem of \ufb01nding frequent items, also called the heavy hitters\nproblem, while satisfying the ODP de\ufb01nition. In this problem, we are given a database D of n users,\nwhere each user holds an item from the set {1, 2, ..., m}. In typical applications such as \ufb01nding the\nmost frequent websites or \ufb01nding the most frequently used words in a text corpus, m is usually much\n\n8\n\n\flarger than n. Hence reporting the entire histogram on m items is not possible. In such applications,\none is interested in the list of k most frequent items, where we de\ufb01ne an item as frequent if it occurs\nmore than n/k times in the database. In typical applications, k is assumed to be a constant or\nsublinear in n. The problem of \ufb01nding the heavy hitters is one of the most widely studied problems in\nthe LDP setting [8, 6]. In this section, we show that the heavy hitters problem becomes very simple\nin our model. Let \u02c6ni denote the number of occurrences of the item i output by our algorithm and ni\ndenotes the true count. We give the full proof of the theorem in the supplementary material.\n\n\u0001 log m. Then, there exists a (\u0001,\n\nTheorem 4.5. Let \u03c4 > 1 be some constant, and let \u0001 > 0 be the privacy parameter. Suppose\nm\u03c4\u22121 )-ODP algorithm for the problem of \ufb01nding the\nn/k > \u03c4\ntop k most frequent items that runs in time O(n log n). Furthermore, for every item i output by\nour algorithm it holds that i) with probability at least (1 \u2212 \u03b8), | \u02c6ni \u2212 ni| \u2264 log(m/\u03b8) \u00b7 2\n\u0001 and ii)\nni \u2265 n/k \u2212 log(m/\u03b8) \u00b7 2\n\u0001 .\n\n1\n\n\u0001\n\nn \u00b7 log(n/\u03b8) \u00b7 log m\n\nLDP algorithm can only achieve a guarantee of | \u02c6ni \u2212 ni| \u2264 (cid:113)\n\nWe remark that the k items output by an ODP-algorithm do not exactly correspond to the top k items\ndue to the additive error introduced by the algorithm. We can use the above theorem to return a list of\napproximate heavy hitters, which satis\ufb01es the following guarantees: 1) Every item with frequency\nhigher than n/k is in the list. 2) No item with frequency less than n/k \u2212 2 log(m/\u03b8) \u00b7 2\n\u0001 is in the list.\nWe contrast the bound of this theorem with the bound one can obtain in the LDP setting. An optimal\n. We refer the\nreader to [8, 6] for more details about the heavy hitters problem in the LDP setting. For many\napplications such as text mining, \ufb01nding most frequently visited websites within a sub-population,\nthis difference in the error turns out to be signi\ufb01cant. See experiments and details in [11].\nOur algorithm for Theorem 4.5 proceeds as follows: It sorts the elements in the database by type\nusing oblivious sorting. It then initialises an encrypted list b and \ufb01lls it in while scanning the sorted\ndatabase as follows. It reads the \ufb01rst element and saves in private memory its type, say i, and creates\na counter set to 1. It then appends to b a tuple: type i and the counter value. When reading the second\ndatabase element, it compares its type, say i(cid:48), to i. If i = i(cid:48), it increments the counter. If i (cid:54)= i(cid:48), it\nresets the counter to 1 and overwrites the type saved in private memory to i(cid:48). In both cases, it then\nappends to b another tuple: the type and the counter from private memory. It proceeds in this manner\nfor the rest of the database. Once \ufb01nished, it makes a backward scan over b. For every new type it\nencounters, it adds Lap(2/\u0001) to the corresponding counter and, additionally, extends the tuple with\na \ufb02ag set to 0. For all other tuples, a \ufb02ag set to 1 is added instead. It then sorts b: by the \ufb02ag in\nascending order and by differentially-private counter values in descending order.\nLet n\u2217 be the number of distinct elements in the database. Then the \ufb01rst n\u2217 tuples of b hold all\nthe distinct types of the database together with their differentially-private frequencies. Since these\nelements are sorted, one can make a pass, returning the types of the top k most frequent items with\nthe highest count (which includes the Laplace noise). Although this algorithm is not (\u0001, 0)-ODP, it is\neasy to show that it is (\u0001,\nm\u03c4\u22121 )-ODP when n/k > \u03c4 log m, which is the case in all applications of\nthe heavy hitters. Indeed, in most typical applications k is a constant. The proof that the algorithm\nsatis\ufb01es the statement of Theorem 4.5 appears in the supplemental material.\n\n1\n\nFrequency Oracle Based on Count-Min Sketch Another commonly studied problem in the\ncontext of heavy hitters is the frequency oracle problem. Here, the goal is to answer the number\nof occurrences of an item i in the database. While this problem can be solved by computing the\nanswer upon receiving a query and adding Laplace noise, there is a simpler approach which might be\nsuf\ufb01cient in many applications. One can maintain a count-min sketch, a commonly used algorithm\nin the streaming literature [16], of the frequencies of items by making a single pass over the data.\nAn interesting aspect of this approach is that entire sketch can be maintained in the private memory,\nhence one does not need to worry about obliviousness. Further, entire count-min sketch can be\nreleased to the data collector by adding Laplace noise. An advantage of this approach is that the\ndata collector can get the frequencies of any item he wants by simply referring to the sketch, instead\nof consulting the DP algorithm. It would be interesting to \ufb01nd more applications of the streaming\ntechniques in the context of ODP algorithms.\n\n9\n\n\fReferences\n\n[1] Mikl\u00f3s Ajtai, J\u00e1nos Koml\u00f3s, and Endre Szemer\u00e9di. An O(n log n) sorting network. In ACM\n\nSymposium on Theory of Computing (STOC), 1983.\n\n[2] Joshua Allen, Bolin Ding, Janardhan Kulkarni, Harsha Nori, Olga Ohrimenko, and Sergey\nYekhanin. An algorithmic framework for differentially private data analysis on trusted processors.\nCoRR, abs/1807.00736, 2018.\n\n[3] Ittai Anati, Shay Gueron, Simon Johnson, and Vincent Scarlata. Innovative technology for CPU\nbased attestation and sealing. In Workshop on Hardware and Architectural Support for Security\nand Privacy (HASP), 2013.\n\n[4] Gilad Asharov, Ilan Komargodski, Wei-Kai Lin, Kartik Nayak, Enoch Peserico, and Elaine\nShi. Optorama: Optimal oblivious ram. Cryptology ePrint Archive, Report 2018/892, 2018.\nhttps://eprint.iacr.org/2018/892.\n\n[5] Borja Balle, James Bell, Adri\u00e0 Gasc\u00f3n, and Kobbi Nissim. The privacy blanket of the shuf\ufb02e\n\nmodel. In Advances in Cryptology\u2014CRYPTO, pages 638\u2013667, 2019.\n\n[6] Raef Bassily, Kobbi Nissim, Uri Stemmer, and Abhradeep Guha Thakurta. Practical locally\nprivate heavy hitters. In Conference on Neural Information Processing Systems (NeurIPS),\npages 2285\u20132293, 2017.\n\n[7] Raef Bassily and Adam D. Smith. Local, private, ef\ufb01cient protocols for succinct histograms. In\n\nACM Symposium on Theory of Computing (STOC), pages 127\u2013135, 2015.\n\n[8] Raef Bassily and Adam D. Smith. Local, private, ef\ufb01cient protocols for succinct histograms. In\n\nACM Symposium on Theory of Computing (STOC), pages 127\u2013135, 2015.\n\n[9] Kenneth E. Batcher. Sorting networks and their applications. In Spring Joint Computer Conf.,\n\n1968.\n\n[10] Daniel J. Bernstein. Cache-timing attacks on AES. Technical report, Department of Mathematics,\n\nStatistics, and Computer Science, University of Illinois at Chicago, 2005.\n\n[11] Andrea Bittau, \u00dalfar Erlingsson, Petros Maniatis, Ilya Mironov, Ananth Raghunathan, David\nLie, Mitch Rudominer, Ushasree Kode, Julien Tinnes, and Bernhard Seefeld. Prochlo: Strong\nprivacy for analytics in the crowd. In ACM Symposium on Operating Systems Principles (SOSP),\n2017.\n\n[12] Ferdinand Brasser, Urs M\u00fcller, Alexandra Dmitrienko, Kari Kostiainen, Srdjan Capkun, and\nAhmad-Reza Sadeghi. Software grand exposure: SGX cache attacks are practical. In USENIX\nWorkshop on Offensive Technologies (WOOT), 2017.\n\n[13] Nicholas Carlini, Chang Liu, \u00dalfar Erlingsson, Jernej Kos, and Dawn Song. The Secret Sharer:\nEvaluating and Testing Unintended Memorization in Neural Networks. In USENIX Security\nSymposium, 2019.\n\n[14] T-H. Hubert Chan, Kai-Min Chung, Bruce M. Maggs, and Elaine Shi. Foundations of differen-\ntially oblivious algorithms. In ACM-SIAM Symposium on Discrete Algorithms (SODA), pages\n2448\u20132467, 2019.\n\n[15] Albert Cheu, Adam Smith, Jonathan Ullman, David Zeber, and Maxim Zhilyaev. Distributed\n\ndifferential privacy via shuf\ufb02ing. In Advances in Cryptology-\u00e2 \u02d8A\u00b8S-EUROCRYPT, 2019.\n\n[16] Graham Cormode and S. Muthukrishnan. An improved data stream summary: The count-min\n\nsketch and its applications. J. Algorithms, 55(1):58\u201375, April 2005.\n\n[17] Victor Costan, Ilia Lebedev, and Srinivas Devadas. Sanctum: Minimal hardware extensions for\n\nstrong software isolation. In USENIX Security Symposium, 2016.\n\n[18] Apple Differential Privacy Team. Learning with privacy at scale, 2017.\n[19] Bolin Ding, Janardhan Kulkarni, and Sergey Yekhanin. Collecting telemetry data privately. In\n\nConference on Neural Information Processing Systems (NeurIPS), 2017.\n\n[20] Cynthia Dwork, Frank McSherry, Kobbi Nissim, and Adam Smith. Calibrating noise to\nIn Theory of Cryptography Conference (TCC), pages\n\nsensitivity in private data analysis.\n265\u2013284, 2006.\n\n10\n\n\f[21] Cynthia Dwork, Moni Naor, Omer Reingold, Guy N. Rothblum, and Salil Vadhan. On the\ncomplexity of differentially private data release: Ef\ufb01cient algorithms and hardness results. In\nACM Symposium on Theory of Computing (STOC), pages 381\u2013390, 2009.\n\n[22] Cynthia Dwork and Aaron Roth. The algorithmic foundations of differential privacy. Found.\n\nTrends Theor. Comput. Sci., 9, August 2014.\n\n[23] \u00dalfar Erlingsson, Vitaly Feldman, Ilya Mironov, Ananth Raghunathan, Kunal Talwar, and\nAbhradeep Thakurta. Ampli\ufb01cation by shuf\ufb02ing: From local to central differential privacy via\nanonymity. In ACM-SIAM Symposium on Discrete Algorithms (SODA), 2019.\n\n[24] \u00dalfar Erlingsson, Vasyl Pihur, and Aleksandra Korolova. RAPPOR: randomized aggregatable\nprivacy-preserving ordinal response. In ACM Conference on Computer and Communications\nSecurity (CCS), pages 1054\u20131067, 2014.\n\n[25] Philippe Flajolet and G. Nigel Martin. Probabilistic counting algorithms for data base applica-\n\ntions. J. Comput. Syst. Sci., 31(2), September 1985.\n\n[26] Oded Goldreich and Rafail Ostrovsky. Software protection and simulation on oblivious RAMs.\n\nJournal of the ACM (JACM), 43(3), 1996.\n\n[27] Michael T. Goodrich. Data-oblivious external-memory algorithms for the compaction, selection,\nand sorting of outsourced data. In Proceedings of the Twenty-third Annual ACM Symposium on\nParallelism in Algorithms and Architectures, 2011.\n\n[28] Michael T. Goodrich and Michael Mitzenmacher. Privacy-preserving access of outsourced\ndata via oblivious RAM simulation. In International Colloquium on Automata, Languages and\nProgramming (ICALP), 2011.\n\n[29] Michael T. Goodrich, Michael Mitzenmacher, Olga Ohrimenko, and Roberto Tamassia. Privacy-\npreserving group data access via stateless oblivious RAM simulation. In ACM-SIAM Symposium\non Discrete Algorithms (SODA), 2012.\n\n[30] Johannes G\u00f6tzfried, Moritz Eckert, Sebastian Schinzel, and Tilo M\u00fcller. Cache attacks on Intel\n\nSGX. In European Workshop on System Security (EuroSec), 2017.\n\n[31] Matthew Hoekstra, Reshma Lal, Pradeep Pappachan, Carlos Rozas, Vinay Phegade, and Juan\ndel Cuvillo. Using innovative instructions to create trustworthy software solutions. In Workshop\non Hardware and Architectural Support for Security and Privacy (HASP), 2013.\n\n[32] Noah Johnson, Joseph P. Near, and Dawn Song. Towards practical differential privacy for SQL\n\nqueries. PVLDB, 11(5):526\u2013539, January 2018.\n\n[33] Daniel M. Kane, Jelani Nelson, and David P. Woodruff. An optimal algorithm for the distinct\nelements problem. In Symposium on Principles of Database Systems (PODS), pages 41\u201352,\n2010.\n\n[34] Paul C. Kocher. Timing attacks on implementations of Diffe-Hellman, RSA, DSS, and other\n\nsystems. In Advances in Cryptology\u2014CRYPTO, 1996.\n\n[35] Fangfei Liu, Yuval Yarom, Qian Ge, Gernot Heiser, and Ruby B Lee. Last-level cache side-\n\nchannel attacks are practical. In IEEE Symposium on Security and Privacy (S&P), 2015.\n\n[36] Sahar Mazloom and S. Dov Gordon. Secure computation with differentially private access\n\npatterns. In ACM Conference on Computer and Communications Security (CCS), 2018.\n\n[37] Ilya Mironov. On signi\ufb01cance of the least signi\ufb01cant bits for differential privacy. In ACM\n\nConference on Computer and Communications Security (CCS), 2012.\n\n[38] Ahmad Moghimi, Gorka Irazoqui, and Thomas Eisenbarth. Cachezoom: How SGX ampli\ufb01es\nthe power of cache attacks. In Cryptographic Hardware and Embedded Systems (CHES), 2017.\n[39] S. Muthukrishnan. Data streams: algorithms and applications. Foundations and Trends in\n\nTheoretical Computer Science, 1, 2003.\n\n[40] Olga Ohrimenko, Michael T. Goodrich, Roberto Tamassia, and Eli Upfal. The Melbourne\nshuf\ufb02e: Improving oblivious storage in the cloud. In International Colloquium on Automata,\nLanguages and Programming (ICALP), volume 8573. Springer, 2014.\n\n[41] Olga Ohrimenko, Felix Schuster, C\u00e9dric Fournet, Aastha Mehta, Sebastian Nowozin, Kapil\nVaswani, and Manuel Costa. Oblivious multi-party machine learning on trusted processors. In\nUSENIX Security Symposium, 2016.\n\n11\n\n\f[42] Dag Arne Osvik, Adi Shamir, and Eran Tromer. Cache attacks and countermeasures: the case\n\nof AES. In RSA Conference Cryptographer\u2019s Track (CT-RSA), 2006.\n\n[43] Dan Page. Theoretical use of cache memory as a cryptanalytic side-channel. Cryptology ePrint\n\nArchive, Report 2002/169, 2002.\n\n[44] Sarvar Patel, Giuseppe Persiano, and Kevin Yeo. Cacheshuf\ufb02e: A family of oblivious shuf\ufb02es.\n\nIn International Colloquium on Automata, Languages and Programming (ICALP), 2018.\n\n[45] Colin Percival. Cache missing for fun and pro\ufb01t. In Proceedings of BSDCan, 2005.\n[46] Benny Pinkas and Tzachy Reinman. Oblivious RAM revisited. In Advances in Cryptology\u2014\n\nCRYPTO, 2010.\n\n[47] Sajin Sasy and Olga Ohrimenko. Oblivious sampling algorithms for private data analysis. In\n\nConference on Neural Information Processing Systems (NeurIPS), 2019.\n\n[48] Felix Schuster, Manuel Costa, C\u00e9dric Fournet, Christos Gkantsidis, Marcus Peinado, Gloria\nMainar-Ruiz, and Mark Russinovich. V C3: Trustworthy data analytics in the cloud using SGX.\nIn IEEE Symposium on Security and Privacy (S&P), 2015.\n\n[49] Michael Schwarz, Samuel Weiser, Daniel Gruss, Clementine Maurice, and Stefan Mangard.\nMalware Guard Extension: Using SGX to conceal cache attacks. In Conference on Detection of\nIntrusions and Malware & Vulnerability Assessment (DIMVA), 2017.\n\n[50] Reza Shokri, Marco Stronati, Congzheng Song, and Vitaly Shmatikov. Membership inference\nattacks against machine learning models. In IEEE Symposium on Security and Privacy (S&P),\n2017.\n\n[51] Emil Stefanov, Elaine Shi, and Dawn Xiaodong Song. Towards practical oblivious RAM. In\n\nSymposium on Network and Distributed System Security (NDSS), 2012.\n\n[52] Emil Stefanov, Marten van Dijk, Elaine Shi, Christopher W. Fletcher, Ling Ren, Xiangyao Yu,\nand Srinivas Devadas. Path ORAM: an extremely simple oblivious RAM protocol. In ACM\nConference on Computer and Communications Security (CCS), 2013.\n\n[53] Sameer Wagh, Paul Cuff, and Prateek Mittal. Differentially private oblivious RAM. PoPETs,\n\n2018(4):64\u201384, 2018.\n\n[54] Peter Williams, Radu Sion, and Bogdan Carbunar. Building castles out of mud: practical access\npattern privacy and correctness on untrusted storage. In ACM Conference on Computer and\nCommunications Security (CCS), 2008.\n\n[55] Yuanzhong Xu, Weidong Cui, and Marcus Peinado. Controlled-channel attacks: Deterministic\nside channels for untrusted operating systems. In IEEE Symposium on Security and Privacy\n(S&P), 2015.\n\n[56] Wenting Zheng, Ankur Dave, Jethro G. Beekman, Raluca Ada Popa, Joseph E. Gonzalez, and\nIon Stoica. Opaque: An oblivious and encrypted distributed analytics platform. In USENIX\nSymposium on Networked Systems Design and Implementation (NSDI), 2017.\n\n12\n\n\f", "award": [], "sourceid": 7579, "authors": [{"given_name": "Joshua", "family_name": "Allen", "institution": "Microsoft"}, {"given_name": "Bolin", "family_name": "Ding", "institution": "Alibaba Group"}, {"given_name": "Janardhan", "family_name": "Kulkarni", "institution": "MSR, Redmond"}, {"given_name": "Harsha", "family_name": "Nori", "institution": "Microsoft"}, {"given_name": "Olga", "family_name": "Ohrimenko", "institution": "Microsoft"}, {"given_name": "Sergey", "family_name": "Yekhanin", "institution": "Microsoft"}]}