Specification of canonical-form JSON for equivalence comparison.
This project is maintained by insilica
RFC 7159 defines JSON as “a text format for the serialization of structured data”, but allows many distinct serializations to describe the same data. Such human-friendly flexibility can hinder machine treatment of JSON text, particularly when it is used as input for cryptographic hash functions that are expected to yield identical results for logically equivalent input (as is the case in computation of digital signatures). This specification defines a unique canonical form for every JSON value, the result being safe for comparison (in that logically equivalent structured data are guaranteed to have the same canonical form).
JSON text in canonical form:
\b
U+0008 BACKSPACE\t
U+0009 CHARACTER TABULATION (“tab”)\n
U+000A LINE FEED (“newline”)\f
U+000C FORM FEED\r
U+000D CARRIAGE RETURN\"
U+0022 QUOTATION MARK\\
U+005C REVERSE SOLIDUS (“backslash”), and\u00xx
uppercase hexadecimal escape sequences for control characters that require escaping but lack a two-character sequence, and\uDxxx
uppercase hexadecimal escape sequences for lone surrogates{"-0":0,"-1":-1,"0.1":1.0E-1,"1":1,"10.1":1.01E1,"emoji":"😃","escape":"\u001B","lone surrogate":"\uDEAD","whitespace":" \t\n\r"}
The following projects are known to correctly implement this specification:
If you know of any others, please submit a pull request to add them!
This repository can be used to validate any implementation.
./test.sh /path/to/executable
from this repository, substituting the path to the above executable in the first argument.test.sh
will provide known input and look for expected output, printing the results, exiting with a status of 0 if and only if the executable (and therefore the candidate implementation) adheres to this specification.This specification updates the expired JSON Canonical Form internet draft to ensure a unique canonical representation of every JSON value.
Representation of non-integer numbers still matches the canonical float representation from section 3.2.4.2 of XML Schema Datatypes, but integer numbers now have a non-exponential representation matching integer (section 3.3.13.2) and RFC 7638 JSON Web Key (JWK) Thumbprint.
The treatment of strings generalizes section 3.3 of RFC 7638 and Keybase canonical JSON packing (both of which cryptographically hash JSON text) to cover the full range of Unicode characters.
OLPC “Canonical JSON” (which is also intended to support meaningful hashes of structured data) describes a format that is not actually JSON, because its strings are sequences of bytes rather than sequences of Unicode code points (e.g., the tab-containing string " "
is conforming OLPC “Canonical JSON” but not JSON and "\t"
is conforming JSON but not OLPC “Canonical JSON”).
But where they overlap, this specification generalizes OLPC “Canonical JSON” to include floating point numbers and revises it for Unicode-aware string sorting.