CBOR is a compact binary data serialization and messaging format. This specification defines CBOR-LD 1.0, a CBOR-based format to serialize Linked Data. The encoding is designed to leverage the existing JSON-LD ecosystem, which is deployed on hundreds of millions of systems today, to provide a compact serialization format for those seeking efficient encoding schemes for Linked Data. By utilizing semantic compression schemes, compression ratios in excess of 60% better than generalized compression schemes are possible. This format is primarily intended to be a way to use Linked Data in storage and bandwidth constrained programming environments, to build interoperable semantic wire-level protocols, and to efficiently store Linked Data in CBOR-based storage engines.

This document is experimental.

There is a reference implementation that is capable of demonstrating the features described in this document.

Introduction

CBOR is a compact binary data serialization and messaging format. This specification defines CBOR-LD 1.0, a CBOR-based format to serialize Linked Data. The encoding is designed to leverage the existing JSON-LD ecosystem, which is deployed on hundreds of millions of systems today, to provide a compact serialization format for those seeking efficient encoding schemes for Linked Data. By utilizing semantic compression schemes, compression ratios in excess of 60% better than generalized compression schemes are possible. This format is primarily intended to be a way to use Linked Data in storage and bandwidth constrained programming environments, to build interoperable semantic wire-level protocols, and to efficiently store Linked Data in CBOR-based storage engines.

How to Read this Document

This document is a detailed specification for a serialization of Linked Data in CBOR. The document is primarily intended for the following audiences:

Contributing

There are a number of ways that one may participate in the development of this specification:

Design Goals and Rationale

CBOR-LD satisfies the following design goals:

Simplicity
Implementations should be simple to implement given an existing JSON-LD implementation.
Efficient Storage
The encoding process should generate an aggressively compact Linked Data binary format.
Generalized Algorithm
The encoding algorithm must be generalized.
Semantic Compression
The encoding format should maximize compression of Linked Data URLs (terms and values). Focusing here ensures that the algorithms can achieve compression ratios better than generalized compression algorithms.
Raw Binary
Base-encoded binary values, and other compressible data types, should be translated to their raw binary forms from base-encoded formats when possible without sacrificing generality.

Similarly, the following are non-goals.

The following minefields have been identified while working on this specification:

Basic Concept

The general CBOR-LD encoding algorithm takes a JSON-LD Document and does the following:

Algorithms

JSON-LD to CBOR-LD Algorithm

This algorithm takes a JSON-LD object `jsonldDocument` and `options` as input.

  1. Let `result` be an empty CBOR-encoded byte array.
  2. Initialize `contextUrls` to the return value of the "Get Context URLs Algorithm" passing jsonldDocument as input.
  3. If the "Get Context URLs Algorithm" resulted in an error, set `result` to the return value of the "Generate Uncompressed CBOR-LD Algorithm".
  4. Otherwise, set `result` to the return value of the "Generate Compressed CBOR-LD Algorithm" passing `contextUrls` as `options.contextUrls`.
  5. Return `result`.

Uncompressed CBOR-LD Buffer Algorithm

This algorithm takes a JSON-LD object `jsonldDocument` and `options` as input.

  1. Let `result` be an empty CBOR-encoded byte array.
  2. Set the first two bytes (CBOR Tag) to 0x5000 (CBOR-LD - 0x50, Uncompressed - 0x00))
  3. For every key-value in the map, generate the Uncompressed CBOR-LD Buffer by converting it to the associated CBOR-LD header and value. For complex values (maps, arrays), recursively convert the value to something that will losslessly encode and decode back to JSON-LD.
  4. Return the Uncompressed CBOR-LD Buffer.

Compressed CBOR-LD Buffer Algorithm

This algorithm takes a JSON-LD object `jsonldDocument` and `options` as input. The `options` MUST contain:

`applicationContextMap`
A map of application-specific JSON-LD context URL strings that are mapped to their encoded CBOR-LD values. The values MUST be values greater than 32767 (0x7FFF). Values from 0-32767 (0x0-0x7FFF) are reserved for globally recognized JSON-LD Context URL values.
`applicationTermMap`
A map of JSON-LD terms and their associated CBOR-LD term codecs.
  1. Let `result` be an empty CBOR-encoded byte array.
  2. Set the first three bytes of `result` to 0xd95001 (CBOR Tag - 0xd9, CBOR-LD - 0x50, Compressed - CBOR-LD compression algorithm version 1 - 0x01)).
  3. Initialize `termCodecMap` to the result of the , passing `contextUrls` as input.
  4. Add to `result` by recursively processing every name-value pair in `jsonldDocument`
    1. Let `termHint` be the value associated with the JSON name in the `termCodecMap`.
    2. Set the CBOR key to the `termHint.value` value.
    3. Set the CBOR value to the result of the `termHint.valueCompressor` function.
  5. Return `result`.

Get Term Codec Map Algorithm

This algorithm takes a list of URL strings `contextUrls` and returns a CBOR-LD term codec map that maps JSON-LD terms to their associated byte values and value compression functions.

  1. Let `result` be an ordered map.
  2. For each value in `contextUrls`, dereference the JSON-LD contexts and process every entry.
    1. Set the entry key to the JSON-LD term key.
    2. Set the entry value to an unordered map with two entries.
      1. The first entry should be set to `value` with an undefined value.
      2. Let `compressor` be a known global compressor function associated with the `@type` property, a known local compressor function that was provided to this function, or the generic CBOR compressor function, which returns the bytes associated with a typical CBOR compression of the given datatype.
  3. Let `sortedTerms` be the value of sorting all of the keys in `result`.
  4. For every value in the list of `sortedTerms` set the associated `termHint.value` value to the associated index of `sortedTerms`.
  5. Return `result`.

Get Context URLs Algorithm

  1. Let `result` be a ordered map.
  2. Walk the JSON tree, for each JSON name-value pair:
    1. If the name is `@context`
      1. Add all values that are referenced by a URL to `result` where the key in the map is set to the JSON value associated with `@id`.
    2. If a non-URL value is detected, throw an ERR_NON_URL_JSONLD_CONTEXT_DETECTED error.
  3. Return `result`.

Term Codec Registry

The following is a registry of well-known term codecs. These will be registered on a first-come first-serve basis.

Value Context URL Context Name
0x00 - 0x0f RESERVED Reserved for future use.
0x10 https://www.w3.org/ns/activitystreams ActivityStreams 2.0
0x11 https://www.w3.org/2018/credentials/v1 Verifiable Credentials Data Model v1
0x12 https://www.w3.org/ns/did/v1 Decentralized Identifiers (DID) Core Spec v1
0x13 https://w3id.org/security/suites/ed25519-2018/v1 Ed25519Signature2018 Suite
0x14 https://w3id.org/security/suites/ed25519-2020/v1 Ed25519Signature2020 Suite
0x15 https://w3id.org/cit/v1 Concealed Id Token
0x16 https://w3id.org/age/v1 Age Verification
0x17 https://w3id.org/security/suites/x25519-2020/v1 X25519KeyAgreementKey2020 Suite
0x18 https://w3id.org/veres-one/v1 Veres One DID Method
0x19 https://w3id.org/webkms/v1 WebKMS (Key Management System)
0x1A https://w3id.org/zcap/v1 Authorization Capabilities (zCap)
0x1B https://w3id.org/security/suites/hmac-2019/v1 Sha256HmacKey2019 Crypto Suite
0x1C https://w3id.org/security/suites/aes-2019/v1 AesKeyWrappingKey2019 Crypto Suite
0x1D https://w3id.org/vaccination/v1 Vaccination Certificate Vocabulary v0.1
0x1E https://w3id.org/vc-revocation-list-2020/v1 Verifiable Credentials Revocation List 2020
0x1F https://w3id.org/dcc/v1 DCC (Decentralized Credentials Consortium) Core Context