We store a significant amount of sensitive data online, such as personally identifying information (PII), trade secrets, family pictures, and customer information. The data that we store is often not protected in an appropriate manner. This specification describes a privacy-respecting mechanism for storing, indexing, and retrieving encrypted data at a storage provider. It is often useful when an individual or organization wants to protect data in a way that the storage provider cannot view, analyze, aggregate, or resell the data. This approach also ensures that application data is portable and protected from storage provider data breaches.
Comments regarding all aspects of this document are welcome. Please file issues directly on GitHub, or send them to public-credentials@w3.org (subscribe, archives).
We store a significant amount of sensitive data online, such as personally identifying information (PII), trade secrets, family pictures, and customer information. The data that we store is often not protected in an appropriate manner.
Legislation, such as the General Data Protection Regulation (GDPR), incentivizes service providers to better preserve individuals' privacy, primarily through making the providers liable in the event of a data breach. This liability pressure has revealed a technological gap, whereby providers are often not equipped with technology that can suitably protect their customers. Encrypted Data Vaults fill this gap and provide a variety of other benefits.
This specification describes a privacy-respecting mechanism for storing, indexing, and retrieving encrypted data at a storage provider. It is often useful when an individual or organization wants to protect data in a way that the storage provider cannot view, analyze, aggregate, or resell the data. This approach also ensures that application data is portable and protected from storage provider data breaches.
Explain why individuals and organizations that want to protect their privacy, trade secrets, and ensure data portability will benefit from using this technology. Explain how giving a standard API for the storage of user data empowering users to "bring their own storage", giving them control of their own information. Explain how applications that are written against a standard API and assume that users will bring their own storage can separate concerns and focus on the functionality of their application, removing the need to deal with storage infrastructure (instead leaving it to a specialist service provider that is chosen by the user).
Requiring client-side (edge) encryption for all data and metadata at the same time as enabling the user to store data on multiple devices and to share data with others, whilst also having searchable or queryable data, has been historically very difficult to implement in one system. Trade-offs are often made which sacrifice privacy in favor of usability, or vice versa.
Due to a number of maturing technologies and standards, we are hopeful that such trade-offs are no longer necessary, and that it is possible to design a privacy-preserving protocol for encrypted decentralized data storage that has broad practical appeal.
The problem of decentralized data storage has been approached from various different angles, and personal data stores (PDS), decentralized or otherwise, have a long history in commercial and academic settings. Different approaches have resulted in variations in terminology and architectures. The diagram below shows the types of components that are emerging, and the roles they play. Encrypted Data Vaults fulfill a storage role.
This section describes the roles of the core actors and the relationships between them in an ecosystem where this specification is expected to be useful. A role is an abstraction that might be implemented in many different ways. The separation of roles suggests likely interfaces and protocols for standardization. The following roles are introduced in this specification:
These should be in a USE-CASES document.
The following four use cases have been identified as representative of common usage patterns (though are by no means the only ones).
I want to store my data in a safe location. I don't want the storage provider to be able to see any data I store. This means that only I can see and use the data.
Over time, I will store a large amount of data. I want to search the data, but don't want the service provider to know what I'm storing or searching for.
I want to share my data with other people and services. I can decide on giving other entities access to data in my storage area when I save the data for the first time or in a later stage. The storage should only give access to others when I have explicitly given consent for each item.
I want to be able to revoke the access of others at any time. When sharing data, I can include an expiration date for the access to my data by a third-party.
I want to backup my data across multiple storage locations in case one fails. These locations can be hosted by different storage providers and can be accessible over different protocols. One location could be local on my phone, while another might be cloud-based. The locations should be able to synchronize between each other so data is up to date in both places regardless of how I create or update data, and this should happen automatically and without my help as much as possible.
Based on the use cases, we consider the following deployment topologies:
The following sections elaborate on the requirements that have been gathered from the core use cases.
One of the main goals of this system is ensuring the privacy of an entity's data so that it cannot be accessed by unauthorized parties, including the storage provider.
To accomplish this, the data must be encrypted both while it is in transit (being sent over a network) and while it is at rest (on a storage system).
Since data could be shared with more than one entity, it is also necessary for the encryption mechanism to support encrypting data to multiple parties.
It is necessary to have a mechanism that enables authorized sharing of encrypted information among one or more entities.
The system is expected to specify one mandatory authorization scheme, but also allow other alternate authorization schemes. Examples of authorization schemes include OAuth2, Web Access Control, and [[ZCAP]]s (Authorization Capabilities).
The system should be identifier agnostic. In general, identifiers that are a form of URN or URL are preferred. While it is presumed that [[DID-CORE]] (Decentralized Identifiers, DIDs) will be used by the system in a few important ways, hard-coding the implementations to DIDs would be an anti-pattern.
It is expected that information can be backed up on a continuous basis. For this reason, it is necessary for the system to support at least one mandatory versioning strategy and one mandatory replication strategy, but also allow other alternate versioning and replication strategies.
Large volumes of data are expected to be stored using this system, which then need to be efficiently and selectively retrieved. To that end, an encrypted search mechanism is a necessary feature of the system.
It is important for clients to be able to associate metadata with the data such that it can be searched. At the same time, since privacy of both data and metadata is a key requirement, the metadata must be stored in an encrypted state, and service providers must be able to perform those searches in an opaque and privacy-preserving way, without being able to see the metadata.
Since this system can reside in a variety of operating environments, it is important that at least one protocol is mandatory, but that other protocols are also allowed by the design. Examples of protocols include HTTP, gRPC, Bluetooth, and various binary on-the-wire protocols. An HTTPS API is defined in .
This section elaborates upon a number of guiding principles and design goals that shape Encrypted Data Vaults.
A layered architectural approach is used to ensure that the foundation for the system is easy to implement while allowing more complex functionality to be layered on top of the lower foundations.
For example, Layer 1 might contain the mandatory features for the most basic system, Layer 2 might contain useful features for most deployments, Layer 3 might contain advanced features needed by a small subset of the ecosystem, and Layer 4 might contain extremely complex features that are needed by a very small subset of the ecosystem.
This system is intended to protect an entity's privacy. When exploring new features, always ask "How would this impact privacy?". New features that negatively impact privacy are expected to undergo extreme scrutiny to determine if the trade-offs are worth the new functionality.
Servers in this system are expected to provide functionality strongly focused on the storage and retrieval of encrypted data. The more a server knows, the greater the risk to the privacy of the entity storing the data, and the more liability the service provider might have for hosting data. In addition, pushing complexity to the client enables service providers to provide stable server-side implementations while innovation can by carried out by clients.
TBD
The following sections outline core concepts, such as encrypted storage, which form the foundation of this specification.
An important consideration of encrypted data stores is which components of the architecture have access to the (unencrypted) data, or who controls the private keys. There are roughly three approaches: storage-side encryption, client-side (edge) encryption, and gateway-side encryption (which is a hybrid of the previous two).
Any data storage systems that let the user store arbitrary data also support client-side encryption at the most basic level. That is, they let the user encrypt data themselves, and then store it. This doesn't mean these systems are optimized for encrypted data however. Querying and access control for encrypted data may be difficult.
Storage-side encryption is usually implemented as whole- disk encryption or filesystem-level encryption. This is widely supported and understood, and any type of hosted cloud storage is likely to use storage-side encryption. In this scenario the private keys are managed by the service provider or controller of the storage server, which may be a different entity than the user who is storing the data. Encrypting the data while it resides on disk is a useful security measure should physical access to the storage hardware be compromised, but does not guarantee that only the original user who stored the data has access.
Conversely, client-side encryption offers a high level of security and privacy, especially if metadata can be encrypted as well. Encryption is done at the individual data object level, usually aided by a keychain or wallet client, so the user has direct access to the private keys. This comes at a cost, however, since the significant responsibility of key management and recovery falls squarely onto the end user. In addition, the question of key management becomes more complex when data needs to be shared.
Gateway-side encryption systems take an approach that combines techniques from storage-side and client-side encryption architectures. These storage systems, typically encountered among multi-server clusters or some "encryption as a platform" cloud service providers, recognize that client-side key management may be too difficult for some users and use cases, and offer to perform encryption and decryption themselves in a way that is transparent to the client application. At the same time, they aim to minimize the number of components (storage servers) that have access to the private decryption keys. As a result, the keys usually reside on "gateway" servers, which encrypt the data before passing it to the storage servers. The encryption/decryption is transparent to the client, and the data is opaque to the storage servers, which can be modular/pluggable as a result. Gateway-side encryption provides some benefits over storage-side systems, but also share the drawbacks: the gateway sysadmin controls the keys, not the user.
The fundamental unit of storage in data vaults is the encrypted structured document which, when decrypted, provides a data structure that can be expressed in popular syntaxes such as JSON and CBOR. Documents can store structured data and metadata about the structured data. Structured document sizes are limited to 16MB.
For files larger than 16MB or for raw binary data formats such as audio, video, and office productivity files, a streaming API is provided that enables data to be streamed to/from a data vault. Streams are described using structured documents, but the storage of the data is separated from the structured document using a hashlink to the encrypted content.
Data vaults are expected to store a very large number of documents of varying kinds. This means that it is important to be able to search the documents in a timely way, which creates a challenge for the storage provider as the content is encrypted. Previously this has been worked around with a certain amount of unencrypted metadata attached to the data objects. Another possibility is unencrypted listings of pointers to filtered subsets of data.
In the case of data vaults, an encrypted search scheme is provided for secure data vaults that enable data vault clients to do meta data indexing while not leaking metadata to the storage provider.
Review this section for language that should be properly normative.
This section describes the architecture of the Encrypted Data Vault protocol, in the form of a client-server relationship. The vault isregarded as the server and the client acts as the interface used to interact with the vault.
This architecture is layered in nature, where the foundational layer consists of an operational system with minimal features, and where more advanced features are layered on top. Implementations can choose to implement only the foundational layer, or optionally, additional layers consisting of a richer set of features for more advanced use cases.
The server is assumed to be of low trust, and must have no visibility into the data that it persists. However, even in this model, the server still has a set of minimum responsibilities it must adhere to.
The client is responsible for providing an interface to the server, with bindings for each relevant protocol (HTTP, RPC, or binary over-the-wire protocols), as required by the implementation.
All encryption and decryption of data is done on the client side, at the edges. The data (including metadata) MUST be opaque to the server, and the architecture is designed to prevent the server from being able to decrypt it.
Layer 1 consists of a client-server system that is capable of encrypting data in transit and at rest.
When a vault client makes a request to store, query, modify, or delete data in the vault, the server validates the request. Since the actual data and metadata in any given request is encrypted, such validation is necessarily limited and largely depends on the protocol and the semantics of the request.
The mechanism a server uses to persist data, such as storage on a local, networked, or distributed file system, is determined by the implementation. The persistence mechanism is expected to adhere to the common expectations of a data storage provider, such as reliable storage and retrieval of data.
A vault has a global configuration that defines the following properties:
The configuration allows the the client to perform capability discovery regarding things like authorization, protocol, and replication mechanisms that are used by the server.
When a client makes a request to store, query, modify, or delete data in the vault, the server enforces any authorization policy that is associated with the request.
An Encrypted Data Vault is capable of storing many different types of data, including large unstructured binary data. This means that storing a file as a single entry would be challenging for systems that have limits on single record sizes. For example, some databases set the maximum size for a single record to 16MB. As a result, it is necessary that large data is chunked into sizes that are easily managed by a server. It is the responsibility of the client to set the chunk size of each resource and chunk large data into manageable chunks for the server. It is the responsibility of the server to deny requests to store chunks larger that it can handle.
Each chunk is encrypted individually using authenticated encryption. Doing so protects against attacks where an attacking server replaces chunks in a large file and requires the entire file to be downloaded and decrypted by the victim before determining that the file is compromised. Encrypting each chunk with authenticated encryption ensures that a client knows that it has a valid chunk before proceeding to the next one. Note that another authorized client can still perform an attack by doing authenticated encryption on a chunk, but a server is not capable of launching the same attack.
The process of storing encrypted data starts with the creation of a Resource by the client, with the following structure.
Resource:
id
(required)
meta
meta.contentType
MIME type
content
- entire payload, or a manifest-like list of hashlinks to individual chunks
If the data is less than the chunk size, it is embedded directly into the
content
.
Otherwise, the data is sharded into chunks by the client (see next section), and
each chunk is encrypted and sent to the server. In this case, content
contains a manifest-like listing of URIs to individual chunks (integrity-protected
by [[HASHLINK]].
The process of creating the Encrypted Resource. If the data was sharded into chunks, this is done after the individual chunks are written to the server.
id
index
- encrypted index tags prepared by the client (for use with
privacy-preserving querying over encrypted resources)
jwe
[[RFC7516]], cwe
[[RFC8152]] or other appropriate mechanism
Layer 2 consists of a system that is capable of sharing data among multiple entities, of versioning and replication, and of performing privacy-preserving searches in an efficient manner.
To enable privacy-preserving querying (where the search index is opaque to the server), the client must prepare a list of encrypted index tags (which are stored in the Encrypted Resource, alongside the encrypted data contents).
Need details about salting and encryption mechanism of index tags.
A server must support at least one versioning/change control mechanism.
Replication is done by the client, not by the server (since the client controls
the keys, knows about which other servers to replicate to, etc.). If an
Encrypted Data Vault implementation aims to provide replication functionality,
it MUST also pick a versioning/change control strategy (since replication
necessarily involves conflict resolution). Some versioning strategies are
implicit ("last write wins", eg. rsync
or uploading a file to a file
hosting service), but keep in mind that a replication strategy always implies
that some sort of conflict resolution mechanism should be involved.
An individual vault's choice of authorization mechanism determines how a client shares resources with other entities (authorization capability link or similar mechanism).
It is helpful if data storage providers are able to notify clients when changes to persisted data occurs. A server may optionally implement a mechanism by which clients can subscribe to changes in the vault.
Vault-wide integrity protection is provided to prevent a variety of storage provider attacks where data is modified in a way that is undetectable, such as if documents are reverted to older versions or deleted. This protection requires that a global catalog of all the resource identifiers that belong to a user, along with the most recent version, is stored and kept up to date by the client. Some clients may store a copy of this catalog locally (and include integrity protection mechanism such as [[HASHLINK]] to guard against interference or deletion by the server.
The following sections outlines the data model for data vaults.
Data vault configuration isn't strictly necessary for using the other features of data vaults. This should have its own conformance section/class or potentially event be non-normative.
A data vault configuration specifies the properties a particular data vault will have.
Property | Description |
---|---|
sequence | A unique counter for the data vault in order to ensure that clients are properly synchronized to the data vault. The value is required and MUST be an unsigned 64-bit number. |
controller | The entity or cryptographic key that is in control of the data vault. The value is required and MUST be a URI. |
invoker | The root entities or cryptographic key(s) that are authorized to invoke an authorization capability to modify the data vault's configuration or read or write to it. The value is optional, but if present, MUST be a URI or an array of URIs. When this value is not present, the value of controller property is used for the same purpose. |
delegator | The root entities or cryptographic key(s) that are authorized to delegate authorization capabilities to modify the data vault's configuration or read or write to it. The value is optional, but if present, MUST be a URI or an array of URIs. When this value is not present, the value of controller property is used for the same purpose. |
referenceId | Used to express an application-specific reference identifier. The value is optional and, if present, MUST be a string. |
keyAgreementKey.id | An identifier for the key agreement key. The value is required and MUST be a URI. The key agreement key is used to derive a secret that is then used to generate a key encryption key for the receiver. |
keyAgreementKey.type | The type of key agreement key. The value is required and MUST be or map to a URI. |
hmac.id | An identifier for the HMAC key. The value is required a MUST be or map to a URI. |
hmac.type | The type of HMAC key. The value is required and MUST be or map to a URI. |
{ "sequence": 0, "controller": "did:example:123456789", "referenceId": "my-primary-data-vault", "keyAgreementKey": { "id": "https://example.com/kms/12345", "type": "X25519KeyAgreementKey2019" }, "hmac": { "id": "https://example.com/kms/67891", "type": "Sha256HmacKey2019" } }
A structured document is used to store application data as well as metadata about the application data. This information is typically encrypted and then stored on the data vault.
Property | Description |
---|---|
id | An identifier for the structured document. The value is required and MUST be a Base58-encoded 128-bit random value. |
meta | Key-value metadata associated with the structured document. |
content | Key-value content for the structured document. |
{ "id": "urn:uuid:94684128-c42c-4b28-adb0-aec77bf76044", "meta": { "created": "2019-06-18" }, "content": { "message": "Hello World!" } }
Streams can be used to store images, video, backup files, and any other
binary data of arbitrary length. This is performed by using the
stream
property and additional metadata that further identifies
the type of stream being stored. This table below provides the metadata
to be stored in addition to the values specified in StructuredDocument.
Property | Description |
---|---|
meta.chunks | Specifies the number of chunks in the stream. |
stream.id | The identifier for the stream. The stream identifier MUST be a URI that references a stream on the same data vault. Once the stream has been written to the data vault, the content identifier MUST be updated such that it is a valid hashlink. To allow for streaming encryption, the value of the digest for the stream is assumed to be unknowable until after the stream has been written. The hashlink MUST exist as a content hash for the stream that has been written to the data vault. |
{ "id": "urn:uuid:41289468-c42c-4b28-adb0-bf76044aec77", "meta": { "created": "2019-06-19", "contentType": "video/mpeg", "chunks": 16 }, "stream": { "id": "https://example.com/encrypted-data-vaults/zMbxmSDn2Xzz?hl=zb47JhaKJ3hJ5Jkw8oan35jK23289Hp" } }
An encrypted document is used to store a structured document in a way that ensures that no entity can read the information without the consent of the data controller.
While the table below is a simple version of an EncryptedDocument, there is no other table that yet describes the indexed property and its subproperties, should it be present on an EncryptedDocument.
Property | Description |
---|---|
id | An identifier for the encrypted document. The value is required and MUST be a Base58-encoded 128-bit random value. |
sequence | A unique counter for the data vault in order to ensure that clients are properly synchronized to the data vault. The value is required and MUST be an unsigned 64-bit number. |
jwe or cwe | A JSON Web Encryption or COSE Encrypted value that, if decoded, results in the corresponding StructuredDocument. |
Another example should be added that shows that a Diffie-Hellman key can be identified in the JWE recipients field. This type of key can be used for key agreement on a key wrapping key.
Another section should detail that data vault servers may omit certain fields or certain values in certain fields, such as the recipients field, based on whether or not the entity requesting an EncryptedDocument is authorized to see the field or its values. This can be finely controlled through the use of Authorization Capabilities.
{ "id":"z19x9iFMnfo4YLsShKAvnJk4L", "sequence":0, "indexed":[ { "hmac":{ "id":"did:ex:12345#key1", "type":"Sha256HmacKey2019" }, "sequence":0, "attributes":[ ] } ], "jwe":{ "protected":"eyJlbmMiOiJDMjBQIn0", "recipients":[ { "header":{ "kid":"urn:123", "alg":"ECDH-ES+A256KW", "epk":{ "kty":"OKP", "crv":"X25519", "x":"d7rIddZWblHmCc0mYZJw39SGteink_afiLraUb-qwgs" }, "apu":"d7rIddZWblHmCc0mYZJw39SGteink_afiLraUb-qwgs", "apv":"dXJuOjEyMw" }, "encrypted_key":"4PQsjDGs8IE3YqgcoGfwPTuVG25MKjojx4HSZqcjfkhr0qhwqkpUUw" } ], "iv":"FoJ5uPIR6HDPFCtD", "ciphertext":"tIupQ-9MeYLdkAc1Us0Mdlp1kZ5Dbavq0No-eJ91cF0R0hE", "tag":"TMRcEPc74knOIbXhLDJA_w" } }
This section introduces the HTTPS API for interacting with data vaults and their contents.
A website may provide service endpoint discovery by embedding JSON-LD in their
top-most HTML web page (e.g. at https://example.com/
):
<!DOCTYPE html> <html lang="en"> <head> <meta charset="utf-8"> <title>Example Website</title> <link rel="stylesheet" href="style.css"> <script src="script.js"></script> <script type="application/ld+json"> { "@context": "https://w3id.org/encrypted-data-vaults/v1", "id": "https://example.com/", "name": "Example Website", "dataVaultManagementService": "https://example.com/data-vaults" } </script> </head> <body> <!-- page content --> </body> </html>
Service descriptions may also be requested via content negotiation.
In the following example a JSON-compatible service description is provided
(e.g. curl -H "Accept: application/json" https://example.com/
):
{ "@context": "https://w3id.org/encrypted-data-vaults/v1", "id": "https://example.com/", "name": "Example Website", "dataVaultCreationService": "https://example.com/data-vaults" }
A data vault is created by performing an HTTP POST of a
DataVaultConfiguration
to the dataVaultCreationService
. The following HTTP
status codes are defined for this service:
HTTP Status | Description |
---|---|
201 |
data vault creation was successful. The HTTP Location header will
contain the URL for the newly created data vault.
|
400 | data vault creation failed. |
409 | A duplicate data vault exists. |
An example exchange of a data vault creation is shown below:
POST /data-vaults HTTP/1.1 Host: example.com Content-Type: application/json Accept: application/json, text/plain, */* Accept-Encoding: gzip, deflate { "sequence": 0, "controller": "did:example:123456789", "referenceId": "urn:uuid:abc5a436-21f9-4b4c-857d-1f5569b2600d", "keyAgreementKey": { "id": "https://example.com/kms/12345", "type": "X25519KeyAgreementKey2019" }, "hmac": { "id": "https://example.com/kms/67891", "type": "Sha256HmacKey2019" } }
Explain the purpose of the controller property to root authority. Explain how Authorization Capabilities can be created and invoked via HTTP signatures to authorize reading and writing from/to data vaults.
If the creation of the data vault was successful, an HTTP 201 status code is expected in return:
HTTP/1.1 201 Created Location: https://example.com/encrypted-data-vaults/z4sRgBJJLnYy Cache-Control: no-cache, no-store, must-revalidate Pragma: no-cache Expires: 0 Date: Fri, 14 Jun 2019 18:35:33 GMT Connection: keep-alive Transfer-Encoding: chunked
A structured document is stored in a data vault by encoding a StructuredDocument as an EncryptedDocument and then performing an HTTP POST to a data vault endpoint created via . The following HTTP status codes are defined for this service:
HTTP Status | Description |
---|---|
201 |
Structured document creation was successful. The HTTP Location
header will contain the URL for the newly created document.
|
400 | Structured document creation failed. |
In order to convert a StructuredDocument to an EncryptedDocument an implementer MUST encode the StructuredDocument as a JWE or a COSE Encrypted object. Once the document is encrypted, it can be sent to the document creation service.
A protocol example of a document creation is shown below:
POST /encrypted-data-vaults/z4sRgBJJLnYy/docs HTTP/1.1 Host: example.com Content-Type: application/json Accept: application/json, text/plain, */* Accept-Encoding: gzip, deflate { "id": "urn:uuid:94684128-c42c-4b28-adb0-aec77bf76044", "sequence": 0, "jwe": { "protected": "eyJlbmMiOiJDMjBQIn0", "recipients": [{ "header": { "alg": "A256KW", "kid": "https://example.com/kms/zSDn2MzzbxmX" }, "encrypted_key": "OR1vdCNvf_B68mfUxFQVT-vyXVrBembuiM40mAAjDC1-Qu5iArDbug" }], "iv": "i8Nins2vTI3PlrYW", "ciphertext": "Cb-963UCXblINT8F6MDHzMJN9EAhK3I", "tag": "pfZO0JulJcrc3trOZy8rjA" } }
If the creation of the structured document was successful, an HTTP 201 status code is expected in return:
HTTP/1.1 201 Created Location: https://example.com/encrypted-data-vaults/z4sRgBJJLnYy/docs/zMbxmSDn2Xzz Cache-Control: no-cache, no-store, must-revalidate Pragma: no-cache Expires: 0 Date: Fri, 14 Jun 2019 18:37:12 GMT Connection: keep-alive Transfer-Encoding: chunked
Reading a document from a data vault is performed by retrieving the EncryptedDocument and then decrypting it to a StructuredDocument. The following HTTP status codes are defined for this service:
HTTP Status | Description |
---|---|
200 | EncryptedDocument retrieval was successful. |
400 | EncryptedDocument retrieval failed. |
404 | EncryptedDocument with given id was not found. |
In order to convert an EncryptedDocument to a StructuredDocument an implementer MUST decode the EncryptedDocument from a JWE or a COSE Encrypted object. Once the document is decrypted, it can be processed by the web application.
A protocol example of a document retrieval is shown below:
Explain that the URL path structure is fixed for all data vaults to enable portability and the use of stable URLs (such as through DID URLs) to reference certain documents while allowing users to change their data vault service providers. Explain how this enables portability.
GET https://example.com/encrypted-data-vaults/z4sRgBJJLnYy/docs/zMbxmSDn2Xzz HTTP/1.1 Host: example.com Accept: application/json, text/plain, */* Accept-Encoding: gzip, deflate
If the retrieval of the encrypted document was successful, an HTTP 200 status code is expected in return:
HTTP/1.1 200 OK Date: Fri, 14 Jun 2019 18:37:12 GMT Connection: keep-alive { "id": "urn:uuid:94684128-c42c-4b28-adb0-aec77bf76044", "sequence": 0, "jwe": { "protected": "eyJlbmMiOiJDMjBQIn0", "recipients": [{ "header": { "alg": "A256KW", "kid": "https://example.com/kms/zSDn2MzzbxmX" }, "encrypted_key": "OR1vdCNvf_B68mfUxFQVT-vyXVrBembuiM40mAAjDC1-Qu5iArDbug" }], "iv": "i8Nins2vTI3PlrYW", "ciphertext": "Cb-963UCXblINT8F6MDHzMJN9EAhK3I", "tag": "pfZO0JulJcrc3trOZy8rjA" } }
A structured document is updated in a data vault by encoding the updated StructuredDocument as an EncryptedDocument and then performing an HTTP POST to a data vault endpoint created via . The following HTTP status codes are defined for this service:
HTTP Status | Description |
---|---|
200 | Structured document update was successful. |
400 | Structured document update failed. |
In order to convert a StructuredDocument to an EncryptedDocument an implementer MUST encode the StructuredDocument as a JWE or a COSE Encrypted object. Once the document is encrypted, it can be sent to the document creation service.
A protocol example of a document update is shown below:
POST /encrypted-data-vaults/z4sRgBJJLnYy/docs/zMbxmSDn2Xzz HTTP/1.1 Host: example.com Content-Type: application/json Accept: application/json, text/plain, */* Accept-Encoding: gzip, deflate { "id": "urn:uuid:94684128-c42c-4b28-adb0-aec77bf76044", "sequence": 1, "jwe": { "protected": "eyJlbmMiOiJDMjBQIn0", "recipients": [{ "header": { "alg": "A256KW", "kid": "https://example.com/kms/zSDn2MzzbxmX" }, "encrypted_key": "OR1vdCNvf_B68mfUxFQVT-vyXVrBembuiM40mAAjDC1-Qu5iArDbug" }], "iv": "i8Nins2vTI3PlrYW", "ciphertext": "Cb-963UCXblINT8F6MDHzMJN9EAhK3I", "tag": "pfZO0JulJcrc3trOZy8rjA" } }
If the update to the encrypted document was successful, an HTTP 200 status code is expected in return:
HTTP/1.1 200 OK Cache-Control: no-cache, no-store, must-revalidate Date: Fri, 14 Jun 2019 18:39:52 GMT Connection: keep-alive
A structured document is deleted by performing an HTTP DELETE to a data vault endpoint created via . The following HTTP status codes are defined for this service:
HTTP Status | Description |
---|---|
200 | Structured document was deleted successfully. |
400 | Structured document deletion failed. |
404 | Structured document was not found. |
A protocol example of a document deletion is shown below:
DELETE /encrypted-data-vaults/z4sRgBJJLnYy/docs/zMbxmSDn2Xzz HTTP/1.1 Host: example.com
If the deletion of the encrypted document was successful, an HTTP 200 status code is expected in return:
HTTP/1.1 200 OK Cache-Control: no-cache, no-store, must-revalidate Date: Fri, 14 Jun 2019 18:40:18 GMT Connection: keep-alive
This section is out of date, do not implement.
Another design is being considered that would transform streams into a single index document and a collection of documents, each of which contains a chunk of the stream. This would be done to help prevent misuse of a decryption stream prior to its authentication. In order for this approach to be implemented in a Web browser, it also requires certain File or Blob APIs. Further investigation is needed to ensure that support of these APIs would be sufficient for this design approach, as it would be preferred to prevent data misuse and to make better use of native implementations of authenticated encryption modes.
A stream is stored in a data vault by writing a document containing metadata about the stream, encrypting the stream, writing it to a data vault, and then updating the document containing metadata about the stream. The following HTTP status codes are defined for this service:
HTTP Status | Description |
---|---|
201 |
Stream creation was successful. The HTTP Location
header will contain the URL for the newly created stream.
|
400 | Stream creation failed. |
Implementations first encode the metadata associated with the stream into a StructuredDocument:
{ "id": "urn:uuid:94684128-c42c-4b28-adb0-aec77bf76044", "meta": { "created": "2019-06-18", "contentType": "video/mpeg", "contentLength": 56735817 }, "content": { "id": "https://example.com/encrypted-data-vaults/z4sRgBJJLnYy/streams/zMbxmSDn2Xzz" } }
In this case, the value of content.id
is a reference to the
stream located at
https://example.com/encrypted-data-vaults/z4sRgBJJLnYy/streams/zMbxmSDn2Xzz
,
which is the location that the stream MUST be written to. This content
identifier MUST be updated to include a hashlink once the stream has been
written and its digest is known.
The StructuredDocument above is then transformed to an EncryptedDocument and the procedure in is executed:
POST /encrypted-data-vaults/z4sRgBJJLnYy/docs HTTP/1.1 Host: example.com Content-Type: application/json Accept: application/json, text/plain, */* Accept-Encoding: gzip, deflate { "id": "urn:uuid:94684128-c42c-4b28-adb0-aec77bf76044", "sequence": 0, "jwe": { "protected": "eyJlbmMiOiJDMjBQIn0", "recipients": [{ "header": { "alg": "A256KW", "kid": "https://example.com/kms/zSDn2MzzbxmX" }, "encrypted_key": "OR1vdCNvf_B68mfUxFQVT-vyXVrBembuiM40mAAjDC1-Qu5iArDbug" }], "iv": "i8Nins2vTI3PlrYW", "ciphertext": "Cb-963UCXblINT8F6MDHzMJN9EAhK3I", "tag": "pfZO0JulJcrc3trOZy8rjA" } }
If the creation of the structured document was successful, an HTTP 201 status code is expected in return:
HTTP/1.1 201 Created Location: https://example.com/encrypted-data-vaults/z4sRgBJJLnYy/docs/zp4H8ekWn Cache-Control: no-cache, no-store, must-revalidate Pragma: no-cache Expires: 0 Date: Fri, 14 Jun 2019 18:37:12 GMT Connection: keep-alive Transfer-Encoding: chunked
Next, in order to convert a stream to an EncryptedStream an implementer MUST encrypt the stream. Once the stream is encrypted (or as it is encrypted), it can be sent to the stream creation service.
A protocol example of a stream creation is shown below:
POST /encrypted-data-vaults/z4sRgBJJLnYy/streams HTTP/1.1 Host: example.com Content-Type: application/octet-stream Transfer-Encoding: chunked Accept: application/json, text/plain, */* Accept-Encoding: gzip, deflate TBD
If the creation of the stream was successful, an HTTP 201 status code is expected in return:
HTTP/1.1 201 Created Location: https://example.com/encrypted-data-vaults/z4sRgBJJLnYy/streams/zMbxmSDn2Xzz Cache-Control: no-cache, no-store, must-revalidate Pragma: no-cache Expires: 0 Date: Fri, 14 Jun 2019 18:37:12 GMT Connection: keep-alive Transfer-Encoding: chunked
Once a stream is created, the metadata related to the stream can be updated in the data vault using the protocol defined in . An example of updating a link to a video file is shown below.
Implementations update the metadata associated with the stream in its StructuredDocument:
{ "id": "urn:uuid:94684128-c42c-4b28-adb0-aec77bf76044", "sequence": 1, "meta": { "created": "2019-06-18", "contentType": "video/mpeg", "contentLength": 56735817 }, "content": { "id": "https://example.com/encrypted-data-vaults/z4sRgBJJLnYy/streams/zMbxmSDn2Xzz?hl=zb47JhaKJ3hJ5Jkw8oan35jK23289Hp", "jwe": { "protected": "eyJlbmMiOiJDMjBQIn0", "recipients": [{ "header": { "alg": "A256KW", "kid": "https://example.com/kms/zSDn2MzzbxmX" }, "encrypted_key": "OR1vdCNvf_B68mfUxFQVT-vyXVrBembuiM40mAAjDC1-Qu5iArDbug" }], "iv": "i8Nins2vTI3PlrYW", "tag": "pfZO0JulJcrc3trOZy8rjA" } } }
The value of content.id
MUST be updated to include a hashlink
now that the stream has been written and its digest is known.
The StructuredDocument above is then transformed to an EncryptedDocument and the procedure in is executed:
POST /encrypted-data-vaults/z4sRgBJJLnYy/docs HTTP/1.1 Host: example.com Content-Type: application/json Accept: application/json, text/plain, */* Accept-Encoding: gzip, deflate { "id": "urn:uuid:94684128-c42c-4b28-adb0-aec77bf76044", "sequence": 1, "jwe": { "protected": "eyJlbmMiOiJDMjBQIn0", "recipients": [{ "header": { "alg": "A256KW", "kid": "https://example.com/kms/zSDn2MzzbxmX" }, "encrypted_key": "OR1vdCNvf_B68mfUxFQVT-vyXVrBembuiM40mAAjDC1-Qu5iArDbug" }], "iv": "i8Nins2vTI3PlrYW", "ciphertext": "Cb-963UCXblINT8F6MDHzMJN9EAhK3I", "tag": "pfZO0JulJcrc3trOZy8rjA" } }
If the creation of the structured document was successful, an HTTP 200 status code is expected in return:
HTTP/1.1 200 OK Location: https://example.com/encrypted-data-vaults/z4sRgBJJLnYy/docs/zp4H8ekWn Cache-Control: no-cache, no-store, must-revalidate Pragma: no-cache Expires: 0 Date: Fri, 14 Jun 2019 18:37:12 GMT Connection: keep-alive Transfer-Encoding: chunked
This section is out of date, do not implement.
Reading a stream from a data vault is performed by retrieving the associated metadata that is encrypted as an EncryptedDocument, decoding the hashlink information, and then retrieving the EncryptedStream and then decrypting it. The following HTTP status codes are defined for this service:
HTTP Status | Description |
---|---|
200 | Encrypted stream retrieval was successful. |
400 | Encrypted stream retrieval failed. |
404 | Encrypted stream with given id was not found. |
In order to convert an EncryptedStream to a stream an implementer MUST decode the EncryptedStream using the information provided in the associated EncryptedDocument. Once the stream is decrypted, it can be processed by the web application.
Implementers can perform random seeking in the stream by utilizing the
Content-Range
HTTP Header.
A protocol example of a stream retrieval is shown below:
GET https://example.com/encrypted-data-vaults/z4sRgBJJLnYy/streams/zn2XmSDzMbxz HTTP/1.1 Host: example.com Content-Range: 0-1048576 Accept: application/octet-stream Accept-Encoding: gzip, deflate
If the retrieval of the encrypted stream was successful, an HTTP 200 status code is expected in return:
HTTP/1.1 200 OK Date: Fri, 14 Jun 2019 18:37:12 GMT Content-Range: 0-1048576 Content-Length: 1048576 Connection: keep-alive ...
This section is out of date, do not implement.
A stream is deleted by performing an HTTP DELETE to a data vault stream endpoint created via and the corresponding metadata document created via . The following HTTP status codes are defined for this service:
HTTP Status | Description |
---|---|
200 | Stream was deleted successfully. |
400 | Stream deletion failed. |
404 | Stream was not found. |
A protocol example of a stream deletion is shown below:
DELETE /encrypted-data-vaults/z4sRgBJJLnYy/streams/zMbxmSDn2Xzz HTTP/1.1 Host: example.com
If the deletion of the encrypted stream was successful, an HTTP 200 status code is expected in return:
HTTP/1.1 200 OK Cache-Control: no-cache, no-store, must-revalidate Date: Fri, 14 Jun 2019 18:40:18 GMT Connection: keep-alive
Once the stream is deleted, implementations MUST also delete the corresponding metadata document:
DELETE /encrypted-data-vaults/z4sRgBJJLnYy/docs/zMbxmSDn2Xzz HTTP/1.1 Host: example.com
If the deletion of the encrypted stream was successful, an HTTP 200 status code is expected in return:
HTTP/1.1 200 OK Cache-Control: no-cache, no-store, must-revalidate Date: Fri, 14 Jun 2019 18:40:18 GMT Connection: keep-alive
It is often useful to search a data vault for structured documents that contain specific metadata. Efficient searching requires the use of search indexes and local access to data. This poses an interesting challenge as the search has to be performed on the storage provider without leaking information that could violate the privacy of the entities that are storing information in the data vault. This section details how encrypted indexes can be created and used to perform efficient searching while protecting the privacy of entities that are storing information in the data vault.
When creating an EncryptedDocument, blinded index properties MAY be utilized to perform efficient searches. An example of the use of these properties is shown below:
{ "id": "urn:uuid:698f3fb6-592f-4d22-9e04-462cc4606a23", "sequence": 0, "indexed": [{ "sequence": 0, "hmac": { "id": "https://example.com/kms/z7BgF536GaR", "type": "Sha256HmacKey2019" }, "attributes": [{ "name": "CUQaxPtSLtd8L3WBAIkJ4DiVJeqoF6bdnhR7lSaPloZ", "value": "RV58Va4904K-18_L5g_vfARXRWEB00knFSGPpukUBro", "unique": true }, { "name": "DUQaxPtSLtd8L3WBAIkJ4DiVJeqoF6bdnhR7lSaPloZ", "value": "QV58Va4904K-18_L5g_vfARXRWEB00knFSGPpukUBro" }] }], "jwe": { "protected": "eyJlbmMiOiJDMjBQIn0", "recipients": [ { "header": { "alg": "A256KW", "kid": "https://example.com/kms/z7BgF536GaR" }, "encrypted_key": "OR1vdCNvf_B68mfUxFQVT-vyXVrBembuiM40mAAjDC1-Qu5iArDbug" } ], "iv": "i8Nins2vTI3PlrYW", "ciphertext": "Cb-963UCXblINT8F6MDHzMJN9EAhK3I", "tag": "pfZO0JulJcrc3trOZy8rjA" } }
The example above demonstrates the use of unique index values as well as non-unique indexes.
The example above enables the storage provider to build efficient indexes on encrypted properties while enabling storage agents to search the information without leaking information that would create privacy concerns.
Provide instructions and examples for how indexes are blinded using an HMAC key.
Explain that multiple entities can maintain their own independent indexes (using their own HMAC key) provided they have been granted this capability. Explain that indexes can be sparse/partial. Explain that indexes have their own sequence number and that it will match the document's sequence number once it is updated.
Add a section showing the update index endpoint and how it works.
The contents of a data vault can be searched using encrypted indexes created using the processes described in . There are two primary ways of searching for encrypted documents. The first is to search for a specific value associated with a specific index. The second is to search to see if a specific index exists on a document.
The example below demonstrates how to search for a specific value associated with a specific index.
POST https://example.com/encrypted-data-vaults/z4sRgBJJLnYy HTTP/1.1 Host: example.com Content-Type: application/json Accept: application/json, text/plain, */* Accept-Encoding: gzip, deflate { "index": "DUQaxPtSLtd8L3WBAIkJ4DiVJeqoF6bdnhR7lSaPloZ", "equals": [ {"QV58Va4904K-18_L5g_vfARXRWEB00knFSGPpukUBro": "dh327d234h8437hc34f43f43ZXGHDXG"} ] }
A successful query will result in a standard HTTP 200 response with a list of identifiers for all encrypted documents that match the query:
HTTP/1.1 200 OK Cache-Control: no-cache, no-store, must-revalidate Date: Fri, 14 Jun 2019 18:45:18 GMT Connection: keep-alive ["https://example.com/encrypted-data-vaults/z4sRgBJJLnYy/docs/zMbxmSDn2Xzz"]
The contents of a data vault can also be searched to see if a certain attribute
name is indexed by using the has
keyword.
POST https://example.com/encrypted-data-vaults/z4sRgBJJLnYy/queries HTTP/1.1 Host: example.com Content-Type: application/json Accept: application/json, text/plain, */* Accept-Encoding: gzip, deflate { "has": ["CUQaxPtSLtd8L3WBAIkJ4DiVJeqoF6bdnhR7lSaPloZ"] }
If the query above is successful, an HTTP 200 code is expected with a list of EncryptedDocument identifiers that contain the value.
HTTP/1.1 200 OK Cache-Control: no-cache, no-store, must-revalidate Date: Fri, 14 Jun 2019 18:45:18 GMT Connection: keep-alive ["https://example.com/encrypted-data-vaults/z4sRgBJJLnYy/docs/zMbxmSDn2Xzz"]
Encrypted Data Vaults support a number of extension points:
This section details the general privacy considerations and specific privacy implications of deploying this specification into production environments.
Write privacy considerations.
There are a number of security considerations that implementers should be aware of when processing data described by this specification. Ignoring or not understanding the implications of this section can result in security vulnerabilities.
While this section attempts to highlight a broad set of security considerations, it is not a complete list. Implementers are urged to seek the advice of security and cryptography professionals when implementing mission critical systems using the technology outlined in this specification.
While a service provider is not able to read data in an Encrypted Data Vault, it is possible for a service provider to delete, add, or modify encrypted data. The deletion, addition, or modification of encrypted data can be prevented by keeping a global manifest of data in the data vault.
An Encrypted Data Vault can be compromised if the data controller (the entity who holds the decryption keys and appropriate authorization credentials) accidentally grants access to an attacker. For example, a victim might accidentally authorize an attacker to the entire vault or mishandle their encryption key. Once an attacker has access to the system, they may modify, remove, or change the vault's configuration.
While it is normally difficult for a server to determine the identity of an entity as well as the purpose for which that entity is accessing the Encrypted Data Vault, there is always metadata related to access patterns, rough file sizes, and other information that is leaked when an entity accesses the vault. The system has been designed to not leak information that it creates concerning privacy limitations, an approach that protects against many, but not all, surveillance strategies that may be used by servers that are not acting in the best interest of the privacy of the vault's users.
Assuming that all encryption schemes will eventually be broken is a safe assumption to make when protecting one's data. For this reason, it is inadvisable that servers use any sort of public storage network to store encrypted data as a storage strategy.
While this system goes to great lengths to encrypt content and metadata, there are a handful of fields that cannot be encrypted in order to ensure the server can provide the features outlined in this specification. For example, a version number associated with data provides insight into how often the data is modified. The identifiers associated with encrypted content enables a server to gain knowledge by possibly correlating identifiers across documents. Implementations are advised to minimize the amount of information that is stored in an unencrypted fashion.
The encrypted indexes used by this system are designed to maximize privacy. As a result, there are a number of operations that are common in search systems that are not available with encrypted indexes, such as partial matching on encrypted text fields or searches over a scalar range. These features might be added in the future through the use of zero-knowledge encryption schemes.
While it is expected that most service providers are not malicious, it is also important to understand what a malicious service provider can and cannot do. The following attacks are possible given a malicious service provider:
There are a number of accessibility considerations implementers should be aware of when processing data described in this specification. As with any web standards or protocols implementation, ignoring accessibility issues makes this information unusable to a large subset of the population. It is important to follow accessibility guidelines and standards, such as [[WCAG21]], to ensure all people, regardless of ability, can make use of this data. This is especially important when establishing systems utilizing cryptography, which have historically created problems for assistive technologies.
This section details the general accessibility considerations to take into account when utilizing this data model.
Write accessibility considerations.
There are a number of internationalization considerations implementers should be aware of when publishing data described in this specification. As with any web standards or protocols implementation, ignoring internationalization makes it difficult for data to be produced and consumed across a disparate set of languages and societies, which would limit the applicability of the specification and significantly diminish its value as a standard.
This section outlines general internationalization considerations to take into account when utilizing this data model.
Write i18n considerations.