jaq manual
jaq is an interpreter for the jq programming language.
It is designed to be usable as a drop-in replacement for the jq program,
which is the reference interpreter for the jq language written in C.
Written in Rust, jaq focuses on correctness, high performance, and simplicity.
In addition, jaq adds some functionality not present in jq:
- Support for multiple file formats, including JSON, YAML, CBOR, TOML, XML
- Support for invalid UTF-8 code units in text strings
- Byte strings
-
Objects with non-string keys, such as
{0: 1, [2]: 3} - In-place replacement of input files
This manual aspires to:
- Be comprehensible, concise, and complete.
- Document all functionality in jaq, in particular jaq’s command-line interface, jq’s core language, and jq’s standard library. That covers the same concepts as the jq manual.
-
Document all divergences between
jqand jaq.
In case that this manual falls short of these goals, please open an issue or create a pull request. The same holds if you wish to propose new functionality for jaq. This project lives from your contributions!
The creation of this manual was funded through the NGI0 Commons Fund, a fund established by NLnet with financial support from the European Commission’s Next Generation Internet programme, under the aegis of DG Communications Networks, Content and Technology under grant agreement No 101135429. Additional funding is made available by the Swiss State Secretariat for Education, Research and Innovation (SERI).
This manual uses “compatibility” blocks to
point out occasions where jaq diverges from jq.
In this manual,
“jq” refers to the C implementation, whereas
“jq” refers to the jq language.
These “advanced” blocks are a kind of “making of” for this manual. They document experiments I made or ideas I had during the writing of the manual. As such, they are not essential for understanding the jq language. However, they might be useful if you wish to embark on a journey to become a master of the jq language.
To satisfy your hunger for an even deeper understanding of jq semantics, my jq language specification should provide sufficient material.
Feel free to skip these “advanced” blocks if you do not seek enlightenment.
Command-line interface
Running
jaq [OPTION]… [FILTER] [FILE]…
performs the following steps:
- Parse FILTER as jq program; see jq language
-
For each FILE:
- Parse FILE to a stream of values
-
For each input value in the file:
- Run FILTER on the input value and print its output values
- If an uncaught error is encountered at any point, jaq stops.
For example, jaq '.name?' persons.json
parses the filter .name?, then
reads all values in persons.json one-by-one.
It then executes the filter .name? on each of the values and
prints the outputs of the filter as JSON.
Input
If no FILE is given, jaq reads from standard input. jaq determines the format to parse a FILE as follows:
-
If
--fromFORMAT is used, jaq uses that format. -
Otherwise, if FILE has a file extension known by jaq, such as
.json,.yaml,.cbor,.toml,.xml, jaq uses the corresponding format. - Otherwise, jaq assumes JSON.
--from FORMAT
Interpret all input files as FORMAT.
For example,
jaq --from yaml . myfile.yml
parses myfile.yml as YAML.
Possible values of FORMAT include:
raw, json, yaml, cbor, toml, xml.
jaq automatically chooses the corresponding input format for
files with the extensions
.json, .yaml, .cbor, .toml, .xml, .xhtml.
That means that
jaq --from cbor . myfile.cbor is equivalent to
jaq . myfile.cbor.
jq does not have this option.
-n, --null-input
Feed null as input to the main program, ignoring any input files.
For example,
yes | jaq -n yields null,
which shows that this does indeed not read any input.
The inputs can still be obtained via the inputs filter; for example,
yes true | jaq -n 'first(inputs)' yields true.
This can be useful to fold over all inputs with reduce / foreach.
-R, --raw-input
Read lines of the input as sequence of strings.
For example,
echo -e "Hello\nWorld" | jaq -R yields two outputs; "Hello" and "World".
When combined with --slurp,
this yields the whole input as a single string.
For example,
echo -e "Hello\nWorld" | jaq -Rs yields "Hello\nWorld\n".
See --rawfile.
This is equivalent to --from raw.
-s, --slurp
Read (slurp) all input values into one array.
For example,
jaq -s <<< "1 2 3" yields a single output, namely the array [1, 2, 3], whereas
jaq <<< "1 2 3" yields three outputs, namely 1, 2, and 3.
When combined with --raw-input,
jaq reads the full input as a single string.
For example,
jaq -Rs <<< "1 2 3" yields the single output "1 2 3\n".
See --rawfile.
When multiple files are slurped in,
jq combines the inputs of all files into one single array, whereas
jaq yields an array for every file.
This is motivated by jaq’s --in-place option,
which could not work with the behaviour implemented by jq.
The behaviour of jq can be approximated in jaq;
for example, to achieve the output of
jq -s . a b, you may use
jaq -s . <(cat a b).
Output
--to FORMAT
Print all output values in the given FORMAT.
Any FORMAT accepted by --from can be used here.
Note that not every value can be printed in every format.
For example, TOML requires that the root value is an object, so
jaq --to toml <<< []
yields an error.
jq does not have this option.
-c, --compact-output
Print JSON compactly, omitting whitespace.
For example, jaq -c <<< '[1, 2, 3]' yields the output [1,2,3].
-r, --raw-output
Write (text and byte) strings without escaping them and without surrounding them with quotes.
For example,
jaq -r <<< '"Hello\nWorld"' outputs two lines; Hello and World, whereas
jaq <<< '"Hello\nWorld"' outputs a single line; "Hello\nWorld".
This does not impact strings contained inside other values, i.e. arrays and objects.
For example,
jaq -r <<< '["Hello\nWorld"]' outputs ["Hello\nWorld"].
This is equivalent to --to raw.
-j, --join-output
Do not print a newline after each value.
For example,
jaq -j <<< 'true false' yields the output truefalse, without trailing newline.
This is particularly useful in combination with --raw-output (-r); for example,
jaq -jr <<< '"Hello" " " "World" "\n"' yields the output Hello World
(with trailing newline).
-i, --in-place
Overwrite input file with its output.
For example,
jaq -i . myfile.json reads the file myfile.json and
overwrites it with a formatted version of it.
Note that the input file is overwritten only
once there is no more output and
if there has not been any error.
jq does not have this option.
-S, --sort-keys
Print objects sorted by their keys.
For example,
jaq -Sc <<< '{"b": {"d": 3, "c": 2}, "a": 1}' yields
{"a":1,"b":{"c":2,"d":3}}, whereas
jaq -c <<< '{"b": {"d": 3, "c": 2}, "a": 1}' yields
{"b":{"d":3,"c":2},"a":1}.
-C, --color-output
Always color output, even if jaq does not print to a terminal.
For example,
jaq -C <<< '{}' | jaq --from raw tobytes yields the byte string
b"\x1b[1m{\x1b[0m\x1b[1m}\x1b[0m", containing ANSI color sequences, whereas
jaq <<< '{}' | jaq --from raw tobytes yields
b"{}".
(Here, jaq --from raw tobytes prints a byte representation of its input.)
-M, --monochrome-output
Do not color output.
--tab
Use tabs for indentation rather than spaces.
For example,
jaq --tab <<< '[1, [2]]' | jaq -Rs yields
"[\n\t1,\n\t[\n\t\t2\n\t]\n]\n", whereas
jaq <<< '[1, [2]]' | jaq -Rs yields
"[\n 1,\n [\n 2\n ]\n]\n".
--indent N
Use N spaces for indentation (default: 2).
Compilation
If no FILTER is given, jaq uses . (the identity filter) as filter.
When passing filters directly as FILTER argument on the command-line,
care has to be taken to properly escape the filter.
How to do this depends from platform to platform, but on Unixoid systems,
surrounding the filter with single quotes (') and
replacing occurrences of ' in filters by '\'' suffices.
For example, to run the filter "'" that
produces a string containing a single quote, you can use
jaq -n '"'\''"'.
Running filters that start with the negation operator,
such as jaq '-1', fails because - is interpreted as
start of a command-line switch rather than negation.
You can remedy this by using
jaq -- '-1' instead, or by surrounding the filter in parentheses, i.e.
jaq '(-1)'.
-f, --from-file
Read filter from a file given by filter argument.
With this option, jaq interprets the FILTER argument as
name of a file containing the filter.
Note that the file name may not directly succeed this option.
For example,
jaq --from-file -n script.jq
uses the contents of the file script.jq as filter.
-L, --library-path DIR
Search for modules and data in given directory.
jaq searches for modules and data in a set of directories called “search paths”.
Using --library-path adds a new directory to the global search paths.
For example,
jaq -L . -L .. 'include "script"; foo'
looks for script.jq first in the current directory, then in the parent directory.
If --library-path is not given, the following global search paths are used:
-
~/.jq -
$ORIGIN/../lib/jq -
$ORIGIN/../lib
See the modules section for more details.
Variables
--arg A V
Set variable $A to string V.
For example,
jaq --arg name "John Doe" -n '"Welcome, " + $name' yields "Welcome, John Doe".
--argjson A V
Set variable $A to JSON value V.
For example,
jaq --argjson song '{"name": "One of Us", "artist": "ABBA", "year": 1981}' -n '"Currently playing: \($song.name) (\($song.year))"'
yields
"Currently playing: One of Us (1981)".
If V contains more than a single value, e.g. 1 2, then jaq yields an error.
--slurpfile A F
Set variable $A to array containing the JSON values in file F.
For example, if values.json contains 1 2 3, then
jaq --slurpfile xs values.json -n '$xs' yields [1, 2, 3].
--rawfile A F
Set variable $A to string containing the contents of file F.
jaq tries to load the file via memory mapping,
taking constant time and allowing to load files that do not fit into memory.
If this fails, jaq loads the file regularly, taking linear time.
This is also what happens when using
-Rs (–raw-input and –slurp)
to load a file (as opposed to standard input).
Unlike jq, jaq does not verify that the file is valid UTF-8.
That permits loading arbitrary binary files;
these can be processed as byte strings via tobytes.
--args
Collect remaining positional arguments into $ARGS.positional.
If this option is given, then all further arguments that
would have been interpreted as input files are
instead collected into an array at $ARGS.positional.
For example, if the file input.json exists, then
jaq '$ARGS.positional' input.json --args foo -n bar -- baz -c qux yields
["foo", "bar", "baz", "-c", "qux"].
Note that here, input.json and -n are not collected into the array —
the former because it comes before --args, and
the latter because it would not have been interpreted as input file.
However, -c is collected into the array because it comes after --,
which leads every argument after it to be interpreted as input file.
Miscellanea
-e, --exit-status
Use the last output value as exit status code.
This enables the use of the exit codes 1 and 4, which are not used otherwise.
jaq uses the following exit codes:
- 0: No errors.
-
1: The last output value is
falseornull. - 2: I/O or CLI error, e.g. file not found or unknown CLI option.
- 3: Filter parse/compilation error.
- 4: The filter did not yield any output.
-
5: Any other error, e.g. call to the filter
error.
The filters halt and halt_error
can be used to exit jaq with arbitrary exit codes.
For example:
$ jaq -n empty; echo $?
0
$ jaq -n false >/dev/null; echo $?
0
$ jaq -en false >/dev/null; echo $?
1
$ jaq . does_not_exit.json 2>/dev/null; echo $?
2
$ jaq --foo 2>/dev/null; echo $?
2
$ jaq '+' 2>/dev/null; echo $?
3
$ jaq -en empty; echo $?
4
$ jaq -n error 2>/dev/null; echo $?
5
$ jaq -n 'halt(9)'; echo $?
9
-h, --help
Print summary of CLI options.
-V, --version
Print jaq version.
Unsupported
The following command-line options are supported by jq, but not by jaq:
-
--ascii-output,-a -
--raw-output0 -
--unbuffered -
--stream -
--stream-errors -
--seq -
--jsonargs
Core language
The jq language is a lazy, functional streaming programming language
originally designed by Stephen Dolan.
The jq language is Turing-complete and can therefore be used to write
any program that can be written in any other programming language.
jq programs can be executed with several interpreters, including
jq, gojq, fq, and jaq.
A program written in the jq language is called a jq program or filter. A filter is a function that takes an input value and yields a stream of output values.
The stream of output values can be infinite; for example, the jq filter
repeat("Hi") yields an infinite sequence of strings "Hi".
The following sections document all filters with built-in syntax in jq.
Examples are written like 1 + 2, true or false ⟼ 3 true, which means that
running jaq -n '1 + 2, true or false' yields the outputs 3 and true.
An atomic filter is a filter that is not a binary operator.
For example, the filters
-1, map(.+1), and .[0] are all atomic, whereas the filters
1 + 2, explode | .[], and .[0] += 1 are all non-atomic.
To turn a non-atomic filter f into an equivalent atomic filter,
surround it with parentheses, i.e. (f).
Values
This section lists all potential values that jq filters can process, and how to produce them.
jaq extends the set of values that jq can process by byte strings and objects with non-string values. Where jq reads and writes values by default as JSON, jaq reads and writes values by default as XJON, which is an extension of JSON. See those sections for how jaq serialises values.
null
The filter null returns the null value.
The null value can be also obtained in various other ways,
such as indexing a non-existing key in an array or object, e.g.
[] | .[0] ⟼ null or
{} | .a ⟼ null.
Booleans
The filters true and false return the boolean values true and false.
Booleans can also be produced by comparison operations, e.g.
0 == 0 ⟼ true or
[] == {} ⟼ false.
Every jq value can be mapped to a boolean value, namely
null and false have the boolean value false,
all other values have the boolean value true.
This is important for filters such as
if-then-else and
//.
Numbers
Numbers are filters that return their corresponding value, e.g.
0 ⟼ 0,
3.14 ⟼ 3.14, and
2.99e6 ⟼ 2.99e6.
Negative numbers can be constructed by
applying the negation operator to a number, e.g. -1 ⟼ -1.
Internally, jaq distinguishes integers, floating-point numbers (floats), and decimal numbers:
-
A number without a dot (
.) and without an exponent (e/E) is losslessly stored as integer. jaq can store and calculate with integers of arbitrary size, e.g.340282366920938463463374607431768211456(2^128). -
Any non-integer number is stored initially as decimal number,
which is a string representation of the number.
That means that jaq losslessly preserves
any number corresponding to the regular expression above,
such as
1.0e500, if it occurs in a JSON file or jq filter. -
When calculating with a decimal number,
jaq converts it transparently to a 64-bit
IEEE-754 floating-point number.
For example,
1.0e500 + 1 ⟼ Infinity, because jaq converts1.0e500to the closest floating-point number, which isInfinity.
The rules of jaq are:
-
The sum, difference, product, and remainder of two integers is integer, e.g.
1 + 2 ⟼ 3. -
Any other operation between two numbers yields a float, e.g.
10 / 2 ⟼ 5.0and1.0 + 2 ⟼ 3.0.
You can convert an integer to a floating-point number e.g.
by adding 0.0, by multiplying with 1.0, or by dividing with 1.
You can convert a floating-point number to an integer by
round, floor, or ceil, e.g.
1.2 | floor, round, ceil ⟼ 1 1 2.
jq uses floats for any number,
meaning that it does not distinguish integers from floats.
Many operations in jaq, such as array indexing,
check whether the passed numbers are indeed integer.
The motivation behind this is to avoid
rounding errors that may silently lead to wrong results.
For example,
[0, 1, 2] | .[1] ⟼ 1, whereas
[0, 1, 2] | .[1.0000000000000001] yields
an error in jaq as opposed to 1 in jq.
Furthermore, jq prints NaN as null; e.g. nan | tojson yields null.
In contrast, jaq prints NaN as NaN; e.g. nan | tojson ⟼ "NaN".
See the XJON section for details.
A number corresponds to the regular expression
[0-9]+(\.[0-9]+)?([eE][+-]?[0-9]+)?.
Text strings
A text string is an array of bytes that can be constructed using the syntax "...".
Here, ... may contain any UTF-8 characters
in the range from U+0020 to U+10FFFF, excluding '"' and '\'.
For example,
"Hello 東京!" ⟼ "Hello 東京!".
Furthermore, ... may contain the following escape sequences:
-
\b,\f,\n,\r, or\t -
\"or\\ -
\uHHHH, whereHHHHis a hexadecimal number -
\(f), wherefis a jq filter (string interpolation)
Most characters below U+0020 can only be produced via a Unicode escape sequence, i.e.
"NUL = \u0000" ⟼ "NUL = \u0000".
A string containing an interpolated filter, such as
"...\(f)...", is equivalent to
"..." + (f | tostring) + "...".
For example:
-
"Hello \("Alice", "Bob") 😀!\n" ⟼ "Hello Alice 😀!\n" "Hello Bob 😀!\n" -
41 | "The successor of \(.) is \(.+1)." ⟼ "The successor of 41 is 42.".
A string is identifier-like if it matches the regular expression
[a-zA-Z_][a-zA-Z_0-9]*.
Strings may be prefixed with @x, where x is an identifier; e.g. @uri.
In such a case, the filters of interpolated strings are
piped through @x instead of tostring, i.e.
@x "...\(f)..." is equivalent to
@x "..." + (f | @x) + @x "...".
When "..." does not contain any interpolated filter, then
@x "..." is equivalent to "...".
For example,
"-[]?" | @uri "https://gedenkt.at/jaq/?q=\(.)" ⟼
"https://gedenkt.at/jaq/?q=-%5B%5D%3F".
Byte strings
A byte string is an array of bytes that is not interpreted as (UTF-8) text.
It can be produced from a text string via the filter tobytes, e.g.
"Hello, world! 🙂" | tobytes ⟼ b"Hello, world! \xf0\x9f\x99\x82".
Currently, there is no native syntax in jaq to produce a byte string directly.
Byte strings differ from text strings in a few regards; in particular, they can be indexed and sliced in constant time. That makes byte strings interesting e.g. for parsing binary formats.
For compatibility reasons, jaq considers
both text strings and byte strings as strings.
That means that "Hello" | isstring and (tobytes | isstring) ⟼ true.
Furthermore, a text string and a byte string that
contain equivalent bytes are considered equal, e.g.
"Hello" | . == tobytes ⟼ true.
Byte strings do not exist in jq; however, they exist in fq.
Arrays
An array is a finite sequence of values.
An empty array can be constructed with the filter
[], which is a short form for
[empty] ⟼ [].
An array of values can be constructed using the syntax
[f], where f is a filter.
The filter [f] passes its input to f and runs it,
returning an array containing all outputs of f.
If f throws an error, then [f] returns that error instead of an array.
This syntax allows constructing arrays such as [1, 2, 3] ⟼ [1, 2, 3].
Here, 1, 2, 3 is just a filter that yields three numbers,
see concatenation.
There is no dedicated array syntax [x1, ..., xn].
That means that you can use arbitrary filters in [f]; for example,
[limit(3; repeat(0))] ⟼ [0, 0, 0].
You can use the input passed to [f] inside f;
for example, we can write the previous filter equivalently as
{count: 3, elem: 0} | [limit(.count; repeat(.elem))] ⟼ [0, 0, 0].
Objects
An object is a mapping from keys to values.
An empty object can be constructed with the filter {}.
An object with a single key and value can be constructed by {(k): v},
where k and v are both filters.
For example, {("a"): 1} ⟼ {"a": 1}.
If k is a string such as "a" or a variable such as $k,
then you can omit the parentheses, e.g. {"a": 1} or {$k: 1}.
If the string is identifier-like,
then you can also omit the quotes, e.g. {a: 1}.
To construct an object with multiple key-value pairs,
you can add multiple objects; e.g.
{(k1): v1} + ... + {(kn): vn}.
You can write this more compactly as
{(k1): v1, ..., (kn): vn}.
For example,
{a: 1, b: 2} ⟼ {"a": 1, "b": 2}.
Instead of {k: .k} (see indexing),
you can also write {k}; e.g.,
{a: 1, b: 2} | {a} ⟼ {"a": 1}.
Instead of {k: $k},
you can also write {$k}; e.g.,
1 as $k | {$k} ⟼ {"k": 1}.
The filter {(k): v} is equivalent to k as $k | v as $v | {$k: $v}.
That means that when either k or v yield multiple output values,
an object is produced for every output combination; for example,
{("a", "b"): (1, 2)} ⟼ {"a": 1} {"a": 2} {"b": 1} {"b": 2}.
Note that here, it is necessary to surround 1, 2 with parentheses,
in order to clarify that , does not start a new key-value pair.
In jq, keys must be strings, whereas in jaq, keys can be arbitrary values. Because object keys can be arbitrary values in jaq, you can write e.g.:
{(0): 1, "2": 3, ([4]): 5, ({}): 6} ⟼
{ 0 : 1, "2": 3, [4] : 5, {} : 6}
Note that in a jq filter, non-string keys must be surrounded with parentheses, whereas in XJON, parentheses must not be used.
This yields an error in jq.
Path operators
jq’s path operators retrieve parts of their input.
Indexing (.[x])
The indexing operator .[x] yields the x-th element of the input.
For example:
If there is no element at that index in the input, then this operator returns null.
For example:
This operator treats byte string input like an array of numbers in the range 0..256.
For example,
"@ABC" | tobytes | .[0] ⟼ 64,
because the character '@' is encoded in UTF-8 as single byte 64 (0x40).
This operator yields an error either if the input is neither an array or an object, or if the input is an array and the index is not an integer.
For identifier-like strings like name,
you can write .name as short form for .["name"].
For example,
{a: 1} | .a ⟼ 1.
This works also for strings that are the same as reserved keywords; e.g.
{if: 1} | .if ⟼ 1.
For array input, you can also use negative numbers as index,
which will be interpreted as index counting from the end of the array.
For example,
[1, 2, 3] | .[-1] ⟼ 3.
jq accepts floating-point numbers as array indices; e.g.
[1, 2] | .[1.5] yields 2.
In jaq, indexing an array with a floating-point number yields an error.
The same applies to the slicing operator below.
Slicing (.[x:y])
The slicing operator .[x:y] yields a slice of a string or an array,
from the x-th element to (excluding) the y-th element.
For example,
[1, 2, 3] | .[1:3] ⟼ [2, 3], and
"Hello World!" | .[6:11] ⟼ "World".
When slicing a text string, the x-th element refers to the x-th UTF-8 character.
For example,
"老虎" | .[1:2] ⟼ "虎".
On the other hand, when slicing a byte string, the x-th element refers to the x-th byte.
For example,
"Färber" | tobytes, . | .[2:] | [., length] ⟼ [b"\xa4rber", 5] ["rber", 4],
because the letter ‘ä’ takes two bytes, but only one character.
Like the indexing operator, the slicing operator treats byte string input like an array of numbers and interprets negative indices as indices counting from the end.
The short form .[x:] creates a slice from the index x to the end, and
the short form .[:y] creates a slice from the beginning to (excluding) the index x.
For example, "Hello World!" | .[:5], .[6:] ⟼ "Hello" "World!".
Iterating (.[])
The iteration operator .[] yields all values v1 ... vn when given
an array [v1, ..., vn] or
an object {k1: v1, ..., kn: vn}.
For example,
[1, 2, 3] | .[] ⟼ 1 2 3 and
{a: 1, b: 2} | .[] ⟼ 1 2.
Compound paths
For each of the filters in this section,
you can employ any atomic filter instead of the leading . (identity).
For example, you can write explode[] as short form for explode | .[].
You can also terminate any filter in this section with a ?,
which silences errors from that operator.
For example, .[] yields an error if the input is neither an array nor an object,
but .[]? silences such errors, yielding no output instead.
You can chain together an arbitrary number of these operators;
for example, .[0][]?[:-1] is the same as .[0] | .[]? | .[:-1].
When you combine the techniques above, be aware that
the filters x and y in .[x] and .[x:y]
are executed on the original input and
are not influenced by error suppression with ?.
That means that f[][x]?[y:z] is equivalent to
f as $f | x as $x | y as $y | z as $z | $f | .[] | .[$x]? | .[$y:$z].
Nullary
A nullary filter does not take any arguments.
Identity (.)
The filter . yields its input as single output.
For example,
1 | . ⟼ 1.
Recursion (..)
The filter .. yields its input and all recursively contained values.
For example:
{"a": 1, "b": [2, ["3"]]} | .. ⟼
{"a": 1, "b": [2, ["3"]]}
1
[2, ["3"]]
2
["3"]
"3"
Unary
A unary filter takes a single argument.
Negation (-)
The prefix operator -f runs the atomic filter f and negates its outputs.
For example,
-1 ⟼ -1 and
-(1, 2) ⟼ -1 -2.
Error suppression (?)
The postfix operator f? runs the atomic filter f and
returns all its outputs until (excluding) the first error.
For example,
(1, error, 2)? ⟼ 1.
This operator is equivalent to try f (see try-catch).
For example,
try (1, error, 2) ⟼ 1.
We can see that error suppression has a higher precedence than negation by
try -[]? catch -1 ⟼ -1, which shows us that
-[]? yields the same as -([]?), which is equivalent to
-[] and yields an error.
If negation would have a higher precedence,
then -[]? would be equivalent to (-[])?;
however, that filter yields no output, as we can see by
(-[])? ⟼ .
Binary (complex)
A binary filter takes two arguments.
All binary operators that contain the characters | or = are right-associative.
All other binary operators are left-associative.
That means that for example,
f | g | h is equivalent to f | (g | h), whereas
f - g - h is equivalent to (f - g) - h.
This section lists all binary infix filters sorted by increasing precedence.
Composition (|)
The filter f | g runs f, and
for every output y of f, it runs g with y as input and yields its outputs.
For example, (1, 2) | (., 3) ⟼ 1 3 2 3.
If either f or g yields an error, then
f | g yields that error, followed by nothing.
For example, (1, 2) | (., error) yields the same as 1, error.
Variable binding (as $x |)
The filter f as $x | g runs f, and
for every output x of f, it binds the value of x to the variable $x.
Here, x is an identifier.
It then runs g on the original input,
replacing any reference to $x in g by the value x.
For example,
"Hello" | length as $x | . + " has length \($x)" ⟼ "Hello has length 5".
Variables can be shadowed, such as
0 as $x | (1 as $x | $x), $x ⟼ 1 0.
Concatenation (,)
The filter f, g yields the concatenation of the outputs of f and g.
For example, 1, 2 ⟼ 1 2.
Plain assignment (=)
The filter f = g runs g with its input, and
for every output y of g, it
returns a copy of the input whose
values at the positions given by f are replaced by y.
For example:
[1, 2, 3] | .[0] = (length, 4) ⟼
[3, 2, 3]
[4, 2, 3]
Update assignment (|=)
The filter f |= g returns a copy of its input where
each value v at a position given by f is replaced by
the output of g run with input v.
For example,
[1, 2, 3] | .[] |= .*2 ⟼ [2, 4, 6].
When g yields no outputs, then the value at the position is deleted;
for example,
[1, 2, 3] | .[0] |= empty ⟼ [2, 3] and
{a: 1, b: 2} | .a |= empty ⟼ {"b": 2}.
When g yields multiple outputs, then it depends on
the input type and on f whether more than one output of g is considered.
For example, the following updates consider multiple outputs:
On the other hand, the following updates consider only the first output:
Arithmetic update assignment (+=, -=, …)
The filters
f += g,
f -= g,
f *= g,
f /= g,
f %= g, and
f //= g
are short-hand forms for
f = . + g, …
For example:
[1, 2, 3] | .[0] += (length, 4) ⟼
[4, 2, 3]
[5, 2, 3]
Alternation (//)
The filter f // g runs f and yields
all its outputs whose boolean value is true; that is,
all outputs that are neither null nor false.
If f yields no such outputs, then this filter yields the outputs of g.
For example:
Logic (or, and)
The filter f or g (disjunction) evaluates f and
returns true if its boolean value is true, else it
returns the boolean values of g.
The filter f and g (conjunction) evaluates f and
returns false if its boolean value is false, else it
returns the boolean values of g.
For example:
(true, false) or (true, false) ⟼
true
true
false
(true, false) and (true, false) ⟼
true
false
false
The filter f and g has higher precedence than f or g.
We can see the higher precedence of and by
false and true or true ⟼ true, which yields the same as
(false and true) or true ⟼ true, but not the same as
false and (true or true) ⟼ false.
To find this formula, I used the following program:
def bool: true, false;
{x: bool, y: bool, z: bool} | select(
((.x and .y) or .z ) !=
( .x and (.y or .z))
) ⟼
{
"x": false,
"y": true,
"z": true
}
{
"x": false,
"y": false,
"z": true
}
It holds that f or g is equivalent to f as $x | $x or g;
similar for and.
Binary (simple)
Every simple binary operator such as + in this section fulfills the property that
f + g is equivalent to
f as $x | g as $y | $x + $y.
For example,
(1, 2) + (1, 3) ⟼ 2 4 3 5 and
(1, 2) * (1, 3) ⟼ 1 3 2 6.
This property does not hold for the complex binary filters above.
In jq, a slightly different property holds, namely that
f + g is equivalent to
g as $y | f as $x | $x + $y.
That means that in jq,
(1, 2) + (1, 3) yields 2 3 4 5.
As a result, in jaq,
(1,2) * (3,4) ⟼ 3 4 6 8 and
{a: (1,2), b: (3,4)} | .a * .b ⟼ 3 4 6 8
yield the same outputs, whereas jq yields 3 6 4 8 for the former example.
Equality (==, !=)
The operator $x == $y returns true if the two values $x and $y are equal, else false.
Similarly, $x != $y returns the negation of $x == $y, i.e. $x == $y | not.
Interesting cases include:
-
NaN does not equal any value, including itself; i.e.
nan == nan ⟼ false -
An integer
iequals a floatfiffis finite andiconverted to a float is equal tof; i.e.1 == 1.0 ⟼ true. -
Arrays are equal if they have the same values; i.e.
[1, 2, 3] == [1, 2, 3], but[3, 2, 1] != [1, 2, 3]. -
Objects are equal if they have the same keys and for every key,
the associated value is equal; i.e.
{a: 1, b: 2} == {b: 2, a: 1} ⟼ true
Ordering (<, >, <=, >=)
The operator $x < $y returns true if $x is smaller than $y.
Similarly:
-
$x > $yis equivalent to$y < $x, -
$x <= $yis equivalent to$x < $y or $x == $y, and -
$x >= $yis equivalent to$x > $y or $x == $y.
Values are ordered as follows:
-
null -
Booleans:
false < true ⟼ true - Numbers:
-
Strings: lexicographic ordering by underlying bytes, e.g.
"Hello" < "Hello World" ⟼ trueand"@B" < "A" ⟼ true. -
Arrays: lexicographic ordering, e.g.
[1, 2] < [1, 2, 3] ⟼ trueand[0, 2] < [1] ⟼ true. -
Objects: An object
$xis smaller than an object$yeither if:-
the keys of
$xare smaller than the keys of$yor -
the keys of
$xare equal to the keys of$yand the values of$xare smaller than the values of$y.
-
the keys of
More precisely, an object $x is smaller than an object $y if:
($x | to_entries | sort_by(.key)) as $ex |
($y | to_entries | sort_by(.key)) as $ey |
[$ex[].key] < [$ey[].key] or
[$ex[].key] == [$ey[].key] and
[$ex[].value] < [$ey[].value]
Addition / subtraction (+, -)
The filter $x + $y adds two values as follows:
-
null + $xand$x + nullyields$x. -
Adding numbers yields their sum, which is
integer if both numbers are integer, else a floating-point number.
For example,
1 + 2 ⟼ 3and1 + 2.0 ⟼ 3.0. -
Adding strings or arrays concatenates them, i.e.
"Hello, " + "World!" ⟼ "Hello, World!"and[1, 2] + [3, 4] ⟼ [1, 2, 3, 4]. -
Adding objects yields their union.
If a key is present in both objects, then the resulting object
will contain the key with the value of the object on the right; i.e.
{a: 1, b: 2} + {b: 3, c: 4} ⟼ {"a": 1, "b": 3, "c": 4}. - Adding anything else yields an error.
The filter $x - $y subtracts $y from $x as follows:
Multiplication / division (*, /)
The filter $x * $y multiplies two values as follows:
- Multiplying numbers yields their product, similar to addition.
-
Multiplying a string with an integer
$nyields the$n-fold concatenation of the string, i.e."abc" * 3 ⟼ "abcabcabc". If$n <= 0, then this yieldsnull, i.e.0 * "abc" ⟼ null. -
Multiplying two objects merges them recursively.
In particular,
$x * {k: v, ...}yields($x + {k: $x[k] * v}) * {...}if bothvand$x[k]are objects, else($x + {k: v}) * {...}. For example,{a: {b: 0, c: 2}, e: 4} * {a: {b: 1, d: 3}, f: 5} ⟼ {"a": {"b": 1, "c": 2, "d": 3}, "e": 4, "f": 5}. - Multiplying anything else yields an error.
The filter $x / $y divides two values as follows:
-
Dividing a number by a number yields their quotient as floating-point number.
To perform this operation, both arguments are first converted to floating-point numbers.
For example,
1 / 2 ⟼ 0.5. -
Dividing a string by a string splits
$xby$y, yielding an array of strings. For example,"foobarfoobazfoo" / "foo" ⟼ ["", "bar", "baz", ""]. If$yis empty, then$x / $yyields an array with each character of the input as separate string. For example,"🧑🔬 is 🤔" / "" ⟼ ["🧑","","🔬"," ","i","s"," ","🤔"].
You can round-trip string division with join($y).
For example:
In jq, division by 0 yields an error, whereas
in jaq, n / 0 yields
nan if n == 0,
infinite if n > 0, and
-infinite if n < 0.
jaq’s behaviour is closer to the IEEE standard for floating-point arithmetic (IEEE 754).
Modulus (%)
The filter $x % $y calculates the modulus of two numbers,
and fails for anything else.
For example,
5 % 2 ⟼ 1.
Any of the two numbers can also be a floating-point number;
however, the result of this may be unexpected.
For example,
5.1 % 2 ⟼ 1.0999999999999996 and
5.5 % 2 ⟼ 1.5.
Keywords
This section lists all filters that start with a reserved keyword.
if-then-else
The filter if p then f else g end runs the filter p with its input.
For every output of p, if its boolean value is true,
the outputs of f run with the original input are returned, else
the outputs of g run with the original input are returned.
Examples:
There exists a longer form of this filter, namely
if p1 then f1 elif p2 then f2 ... else g end.
This is equivalent to
if p1 then f1 else (if p2 then f2 else (... else g end) end) end.
When the else g part is omitted, it is equivalent to else .; for example,
0 | if false then .+1 end ⟼ 0.
try-catch
The filter try f catch g runs the atomic filter f with its input,
and returns its outputs until (excluding) the first error.
If f yields an error, then
the atomic filter g is run with the error value as input,
and its outputs are returned.
Examples:
A short form of this filter is
try f, which is equivalent to
try f catch empty as well as to
f? (error suppression).
label-break
The filter label $x | f binds the label $x in f,
runs f and yields its outputs.
If the evaluation of f calls break $x,
then the evaluation of label $x | f is stopped and returns no more outputs.
For example,
label $x | 1, break $x, 2 ⟼ 1.
Labels are distinct from variables, which means that
0 as $x | label $x | $x, break $x ⟼ 0.
Like variables, labels can be shadowed; e.g.
label $x | 1, (label $x | 2, break $x, 3), 4, break $x, 5 ⟼ 1 2 4.
It is possible to break from a filter argument; e.g.
def f(g): 1, g, 2; label $x | f(break $x) ⟼ 1.
reduce / foreach
The filters
reduce xs as $x (init; update) and
foreach xs as $x (init; update; project)
both run the atomic filter xs on its input.
Suppose that the outputs of xs are x1, …, xn.
Then the filters are equivalent to:
reduce x1, ..., xn as $x (init; update) :=
init
| x1 as $x | update
| ...
| xn as $x | update
foreach x1, ..., xn as $x (init; update; project) :=
init |
( x1 as $x | update | project,
( ...
( xn as $x | update | project,
( empty ))...))
Here, both update and project have access to the current $x.
The filter
foreach xs as $x (init; update) is equivalent to
foreach xs as $x (init; update; .).
As example, we can calculate the sum and the cumulative sum using
reduce and foreach, respectively:
-
reduce (1, 2, 3) as $x (0; . + $x) ⟼ 6 -
foreach (1, 2, 3) as $x (0; . + $x) ⟼ 1 3 6 -
foreach (1, 2, 3) as $x (0; . + $x; [$x, .]) ⟼ [1, 1] [2, 3] [3, 6]
Let us expand the first and the last example using the equivalences above to see what is calculated:
# reduce (1, 2, 3) as $x (0; . + $x)
0
| 1 as $x | . + $x # 1
| 2 as $x | . + $x # 3
| 3 as $x | . + $x # 6
⟼ 6
# foreach (1, 2, 3) as $x (0; . + $x; [$x, .])
0 |
( 1 as $x | . + $x | [$x, .], # [1, 1]
( 2 as $x | . + $x | [$x, .], # [2, 3]
( 3 as $x | . + $x | [$x, .], # [3, 6]
( empty ))))
⟼ [1, 1] [2, 3] [3, 6]
We can also reverse a list via
[1, 2, 3] | reduce .[] as $x ([]; [$x] + .) ⟼ [3, 2, 1].
(However, note that this has quadratic runtime and is thus quite inefficient.)
Note that when xs yields no outputs, then
reduce yields init, whereas
foreach yields no output.
For example:
The execution of reduce and foreach differs between jq and jaq
when update yields multiple outputs.
However, the precise behaviour of jq in that case
is quite difficult to describe.
The interpretation of reduce/foreach in jaq has the following advantages over jq:
-
It deals very naturally with filters that yield multiple outputs.
In contrast, jq discriminates outputs of
f, because it recurses only on the last of them, although it outputs all of them. -
It makes the implementation of
reduceandforeachspecial cases of the same code, reducing the potential for bugs.
Consider the following example for an update yielding multiple values:
foreach (5, 10) as $x (1; .+$x, -.) ⟼
6 16 -6 -1 9 1 in jaq, whereas it yields
6 -1 9 1 in jq.
We can see that both jq and jaq yield the values
resulting from the first iteration (where $x is 5), namely
1 | 5 as $x | (.+$x, -.) ⟼ 6 -1.
However, jq performs the second iteration (where $x is 10)
only on the last value returned from the first iteration, namely -1,
yielding the values
-1 | 10 as $x | (.+$x, -.) ⟼ 9 1.
jaq yields these values too, but it also performs the second iteration
on all other values returned from the first iteration, namely 6,
yielding the values
6 | 10 as $x | (.+$x, -.) ⟼ 16 -6.
def
The filter def x: f; g binds the filter f to a filter with the name x.
Here, x is an identifier.
The filter g can contain calls to the filter x, and
any such calls will be replaced by the filter f.
For example, we can define a filter iter by
def iter: .[]; and use it subsequently by
def iter: .[]; [1, 2, 3] | iter ⟼ 1 2 3.
This is equivalent to writing
[1, 2, 3] | .[] ⟼ 1 2 3.
Definitions can be chained and nested. For example:
def foo:
def bar: 1;
def baz: 2;
bar + baz;
foo ⟼ 3
Here, we chained the definitions of bar and baz.
These definitions are only visible inside the definition of foo;
that means that at the place where we call foo,
we can use neither bar nor baz.
Definitions can be recursive, meaning that they call themselves.
For example,
def f: 0, f; f yields an infinite sequence of 0 values.
An example of a non-terminating filter is
def r: r; r.
Finally, the following filter yields an infinite
stream of integers starting from its input:
def ints_from: ., (. + 1 | ints_from);
1 | limit(3; ints_from) ⟼ 1 2 3
Definitions can also take arguments:
The filter def x(x1; ...; xn): f; g binds the filter f to
a filter with the name x and the arity n.
Here, x1 to xn are identifiers that are the arguments of x, and
f can contain references to these arguments.
The filter g can contain calls of the shape x(g1; ...; gn),
where g1 to gn are filters.
Any such calls will be replaced by the filter f, where
every argument xi is replaced by its corresponding filter gi.
For example, the filter map(f) in the standard library is defined by
def map(f): [.[] | f].
We can use it via
[1, 2, 3] | map( .+1) ⟼ [2, 3, 4], which is equivalent to
[1, 2, 3] | [.[] | .+1] ⟼ [2, 3, 4].
We can use variables as arguments of definitions.
For example, we can write
def singleton($x): [$x] as short form of
def singleton(x): x as $x | [$x].
Note that this is not the same as
def singleton(x): [x]:
-
def singleton($x): [$x]; singleton(1, 2, 3) ⟼ [1] [2] [3] -
def singleton( x): [ x]; singleton(1, 2, 3) ⟼ [1, 2, 3]
Arguments of definitions may capture variables, labels, and other definitions. For example:
def iter_map(f): .[] | f;
def double: .+.;
3 as $threshold |
label $lbl |
[1, 2, 3] | iter_map(if . < $threshold then double else break $lbl end) ⟼
2 4
Here, the argument to iter_map captures
the definition double,
the variable $threshold, and
the label $lbl.
We can always transform a
definition with variable arguments to an equivalent
definition without variable arguments.
For that, suppose that $x is the rightmost variable argument in a definition
def x(...; $x; ...): g.
We can replace it by
def x(...; x; ...): x as $x | g.
For example, consider the definition
def f($x1; x2; $x3; x4): g.
This is equivalent to
def f(x1; x2; x3; x4): x1 as $x1 | x3 as $x3 | g.
Standard library
This section lists all named filters that are available by default in any jq module. These filters are also known as “builtins”.
Basic
error, error(f)
The filter error(f) throws an error for every output of f,
with the output as payload.
The filter error is equivalent to error(.).
It is possible to use error(f) also on the left-hand side of assignments.
Examples:
length
The output of the filter length depends on its input type:
-
null:0, i.e.null | length ⟼ 0 -
boolean: error, i.e.
true | try length catch "fail" ⟼ "fail" -
number: the absolute value of the number, i.e.
-1, 1 | length ⟼ 1 1 -
text string: the number of characters, i.e.
"ゼノギアス" | length ⟼ 5 -
byte string: the number of bytes, i.e.
"ゼノギアス" | tobytes | length ⟼ 15 -
array: the number of values, i.e.
[1, [2, 3], 4] | length ⟼ 3 -
object: the number of key-value pairs, i.e.
{a: 0, b: 1} | length ⟼ 2
keys, keys_unsorted
The filter keys_unsorted yields an array that contains
all keys if the input is an object or
all indices if the input is an array.
The filter keys is equivalent to keys_unsorted | sort.
For example:
to_entries, from_entries, with_entries(f)
The filter to_entries takes as input an array or an object.
It converts them to an array of objects of the shape
{key: k, value: v}, such that .[k] on the original input yields v.
For example:
-
[ 1, 2] | to_entries ⟼ [{"key": 0 , "value": 1}, {"key": 1 , "value": 2}] -
{a: 1, b: 2} | to_entries ⟼ [{"key": "a", "value": 1}, {"key": "b", "value": 2}]
The filter from_entries constructs an object from
an array of entries as given by to_entries.
For example, {a: 1, b: 2} | to_entries | from_entries ⟼ {"a": 1, "b": 2}.
The filter with_entries(f) is equivalent to to_entries | map(f) | from_entries.
For example, {"a": 1, "b": 2} | with_entries(.key |= ascii_upcase) ⟼ {"A": 1, "B": 2}
type
The filter type returns the type of its input value as string. For example:
-
null | type ⟼ "null" -
false | type ⟼ "boolean" -
0 | type ⟼ "number" -
"foo" | type ⟼ "string" -
[1] | type ⟼ "array" -
{} | type ⟼ "object"
Note that both text strings and byte strings both have the same type "string".
The type filter can be relatively slow to run;
if you use it for simple comparisons such as
type == "string", then you can also use filters like
isstring.
Stream consumers
first, first(f), last, last(f)
The filter first(f) yields the first output of f if there is one, else nothing.
For example,
first(1, 2, 3) ⟼ 1 and
first(empty) ⟼ (no output).
This filter stops evaluating f after the first output, meaning that
it yields an output even if f yields infinitely many outputs.
For example,
first(repeat(0)) ⟼ 0 and
first(1, def f: f; f) ⟼ 1.
Similarly, last(f) yields the last output of f if there is one, else nothing.
If f yields an error, then the first error of f is yielded.
For example,
last(1, 2, 3) ⟼ 3,
last(empty) ⟼ (no output), and
try last(1, error("fail"), 3) catch . ⟼ "fail".
The filters first and last are short forms for
first(.[]) and last(.[]), respectively.
You can use them to retrieve the first/last element of an array, such as
[1, 2, 3] | first, last ⟼ 1 3.
limit($n; f)
The filter limit($n; f) yields the first $n outputs of f.
If $n <= 0, it yields no outputs.
For example:
When $n < 0, jq yields an error instead.
skip($n; f)
The filter skip($n; f) yields all outputs after the first $n outputs of f.
If $n <= 0, it yields all outputs of f.
For example:
nth($i), nth($i; f)
The filter nth($i; f) yields the $i-th output of f.
If f yields less than $i outputs, then this filter yields no output.
For example:
The filter nth($i) is a short form for .[$i]; e.g.
[1, 2, 3] | nth(0) ⟼ 1.
isempty(f)
The filter isempty(f) yields true if f yields no outputs, else false.
If the first output of f is an error, isempty(f) yields that error instead.
For example:
any, any(p), any(f; p)
The filter any(f; p) yields true if
any output of f | p has the boolean value true, else false.
For example:
The filters any(p) and any are short forms of
any(.[]; p) and any(.), respectively.
For example:
all, all(p), all(f; p)
The filter all(f; p) yields true if
all outputs of f | p have the boolean value true, else false.
For example:
The filters all(p) and all are defined analogously to any(p) and any.
add, add(f)
The filter add(f) yields the sum of all elements yielded by f, or
null if f yields no outputs.
For example:
The filter add is a short form of add(.[]).
You can use it to add all values of an array or object:
Stream generators
empty
The filter empty yields no output.
This filter is defined as ([][] as $x | .).
While a simpler filter like [][] also yields no outputs,
this rather contrived-looking definition guarantees that
empty can be used on the left-hand side of assignments.
This comes into play when you use select(p),
which uses empty under the hood.
range($upto), range($from; $upto), range($from; $upto; $step)
The filter range($from; $upto; $step)
adds $step to $from until it exceeds $upto.
For example:
-
range(1; 9; 2) ⟼ 1 3 5 7 -
range(1; 10; 2) ⟼ 1 3 5 7 9 -
range(9; 1; -2) ⟼ 9 7 5 3 -
range(9; 0; -2) ⟼ 9 7 5 3 1
The filter range($from; $upto) is a short form of range($from; $upto; 1) and
the filter range($upto) is a short form of range(0; $upto).
For example:
In jq, range/1 and range/2 are more restrictive versions of range/3
that prohibit non-numeric arguments.
The filter is equivalent to:
def range($from; $to; $by): $from |
if $by > 0 then while(. < $to; . + $by)
elif $by < 0 then while(. > $to; . + $by)
else while(. != $to; . + $by)
end;
range(1; 10; 2) ⟼
1 3 5 7 9
For that reason, we can also use it with other values than numbers:
This makes it quite easy to accidentally create an infinite sequence, e.g. by
range(""; "b"; "a").
recurse, recurse(f)
The filter recurse(f) is equivalent to ., (f | recurse(f)).
It first outputs its input, then runs f and recurse(f) on its outputs.
This is useful to create infinite sequences.
You can create a finite sequence by having f return empty, e.g. via select.
For example:
The filter recurse(f; p) is equivalent to recurse(f | select(p)).
That means that it recurses only on
outputs of f for which p yields a true output.
For example:
The filter recurse is a short form for recurse(.[]?).
It returns all values recursively contained in the input, e.g.
[1, [2], {a: 3}] | recurse ⟼ [1, [2], {"a": 3}] 1 [2] 2 {"a":3} 3.
We can write a Fibonacci generator as follows:
def fib: def next: [.[1], add]; [0, 1] | recurse(next)[1];
limit(5; fib) ⟼ 1 1 2 3 5
The next filter takes an array with
the two previous values (.[0], .[1]), and
yields a new array containing
the second previous value (.[1]) as well as
the sum of the previous two values (add).
repeat(f)
The filter repeat(f) runs f and yields its outputs over and over again.
For example,
2 | limit(7; repeat(1, ., 3)) ⟼ 1 2 3 1 2 3 1.
This filter does not cache the outputs of f.
while(p; f), until(p; f)
The filter while(p; f) yields its input and applies f to it,
while p returns true.
The filter until(p; f) applies f to its input until p returns true,
at which point the filter returns its input.
Examples:
Booleans
The filters in this section classify their inputs or output them selectively.
toboolean
The filter toboolean is for booleans what tonumber is for numbers.
For example:
select(p)
The filter select(p) yields its input for each true output of p.
For example,
(0, 1, -1, 2, -2) | select(. >= 0) ⟼ 0 1 2.
nulls, booleans, numbers, strings, arrays, objects
Any of these filters yields its input if it is of the given type, else nothing. For example:
-
null, true, 0, "Hi!", [1, 2], {a: 1} | nulls ⟼ null -
null, true, 0, "Hi!", [1, 2], {a: 1} | booleans ⟼ true -
null, true, 0, "Hi!", [1, 2], {a: 1} | numbers ⟼ 0 -
null, true, 0, "Hi!", [1, 2], {a: 1} | strings ⟼ "Hi!" -
null, true, 0, "Hi!", [1, 2], {a: 1} | arrays ⟼ [1, 2] -
null, true, 0, "Hi!", [1, 2], {a: 1} | objects ⟼ {"a": 1}
These filters are equivalent to
select(. == null),
select(isboolean),
…,
select(isobject).
isboolean, isnumber, isstring, isarray, isobject
For every filter in this section, like isboolean, …, isobject, there is
a corresponding filter in the previous section like, booleans, …, objects.
Any of these filters yields true if
its corresponding filter in the previous section yields some output, else false.
For example:
-
null | isboolean ⟼ false, becausenull | booleans ⟼(no output). -
true | isboolean ⟼ true, becausetrue | booleans ⟼ true.
jq does not implement these filters.
normals, finites
These filters return its input if
isnormal or isfinite is true, respectively, else false.
values, iterables, scalars
The filter values yields its input if it is not null, else nothing.
If a value is either an array or an object,
it is said to be iterable; otherwise,
it is said to be scalar.
(The iteration filter .[] succeeds on any iterable value,
whereas it fails on any scalar.)
The filters iterables and scalars yield their input if
it is iterable or scalar, respectively, else nothing.
Examples:
isnan, isinfinite, isfinite, isnormal
The filter isnan yields true if
its input is NaN, else false.
Note that it is not equivalent to . == nan, because
nan is not equal to itself; see equality.
The filter isinfinite yields true if
its input is either Infinity or -Infinity, else false.
The filter isfinite yields true if
its input is a number that is not infinite, else false.
The filter isnormal yields true if
its input is a number that is neither 0, NaN, nor infinite.
Examples:
not
The filter not converts its input to its boolean value and returns its negation.
For example:
Membership
contains($x), inside($x)
The filter contains($x) yields true
if any of the following conditions holds, else false.
-
The input is a string and
$xis a substring of it. -
The input is an array,
$xis an array, and for every valuevin$x, there is some value in the input thatcontains(v). -
The input is an object,
$xis an object, and for every key-value pair{k: v}in$x, there is a value for the keykin the input thatcontains(v). -
The input is
null, boolean, or a number, and$xis equal to the input.
Examples:
-
"Hello, world!" | contains("world") ⟼ true -
[1, 2, 3] | contains([1, 3]) ⟼ true -
[[1, 2], 3] | contains([3, [1]]) ⟼ true -
{a: 1, b: 2} | contains({a: 1}) ⟼ true -
{a: [1, 2]} | contains({a: [1]}) ⟼ true -
0 | contains(0) ⟼ true
The filter inside($x) is a flipped version of contains.
For example,
"world" | inside("Hello, world") ⟼ true.
The filter inside($x) is equivalent to . as $i | $x | contains($i).
indices($x)
The filter indices($x) yields the following:
-
If the input and
$xare either both strings or both arrays, then it yields the indicesifor which.[i:][:$x | length] == $x; e.g."Alice, Bob, and Carol" | indices(", ") ⟼ [5, 10]and[0, 1, 2, 3, 1, 2, 3] | indices([1, 2]) ⟼ [1, 4]. -
If the input is an array and
$xis not an array, then it yields the indicesifor which.[i] == $x; e.g.[0, 1, 2, 3, 1, 2, 3] | indices(1) ⟼ [1, 4]. - Otherwise, it yields an error.
This means that [[1, 2], 3] | indices([1, 2]) ⟼ [], because
the input array has neither 1 nor 2, just [1, 2] and 3.
index($x), rindex($x)
The filters index($x) and rindex($x) are shorthand for
indices($x) | first and
indices($x) | last, respectively.
For example:
has($k), in($x)
The filter has($k) yields true if
$k is among the keys of the input, else false.
For example:
The filter in($x) is a flipped version of has, just like
inside is a flipped version of contains.
For example,
"a" | in({a: 1, b: 2}) ⟼ true.
Updates
map(f), map_values(f)
The filter map(f) obtains all values of the input (via .[]),
applies f to the values, and collects all results into an array.
For example:
-
[1, 2, 3] | map(., .*2) ⟼ [1, 2, 2, 4, 3, 6]. -
{a: 1, b: 2} | map(., .*2) ⟼ [1, 2, 2, 4]. -
[1, 2, 3, 4] | map(select(. % 2 == 0)) ⟼ [2, 4].
The filter map_values(f) has the same effect as map(f)
when the input is an array, but when the input is an object,
map_values(f) also outputs an object.
For example:
The filter map(f) is equivalent to [.[] | f] and
the filter map_values(f) is equivalent to .[] |= f.
walk(f)
The filter walk(f) recursively updates its input with f.
For example:
In jaq, walk(f) is defined as .. |= f, whereas
in jq, a definition similar to the following is used:
def walk(f): def rec: (.[]? |= rec) | f; rec;
This is a more efficient version of:
def walk(f): (.[]? |= walk(f)) | f;
We can show that in jaq, .. |= f and jq’s definition of walk(f) are equivalent.
For this, we will use equivalences about pathless updates.
First, let us recall that .. |= f is equivalent to the following in jaq:
def rec_up: (.[]? | rec_up), .; rec_up |= f
We can thus unfold .. |= f:
.. |= f === (unfolding .. |= f)
rec_up |= f === (unfolding rec_up)
((.[]? | rec_up), .) |= f === (because (l, r) |= f === (l |= f) | (r |= f))
((.[]? | rec_up) |= f) | (. |= f) === (because . |= f === f)
((.[]? | rec_up) |= f) | f === (because (l | r) |= f === l |= (r |= f))
(.[]? |= (rec_up |= f)) | f === (because rec_up |= f === .. |= f)
(.[]? |= (.. |= f)) | f
We can see thus that
.. |= f is equivalent to
(.[]? |= (.. |= f)) | f.
In the same sense,
walk(f) is equivalent to
(.[]? |= walk(f)) | f.
We can conclude that .. |= f is equivalent to walk(f).
Note, however, that this equivalence does not hold in jq,
because jq‘s updates work differently than jaq’s.
The difference shows in particular when f returns multiple values.
del(f)
The filter del(f) deletes values at the locations given by f.
It is equivalent to f |= empty.
For example:
Paths
path(f)
The filter path(f) records for each output of f its position in the input,
and yields that position as a path.
A path is an array that may contain indices or “slice objects”.
The latter must contain a "start" and/or an "end" key with an integer value.
For example:
-
[{a: 1}, {a: 2}] | path(.[].a) ⟼ [0, "a"] [1, "a"] -
[1, 2, 3] | path(.[1:][:-1]) ⟼ [{"start": 1}, {"end": -1}] -
[1, 2, 3] | path(.[1: -1]) ⟼ [{"start": 1, "end": -1}]
If f returns values that do not point to the input, then path(f) yields an error.
The filter path(f) is at the heart of how
jq executes assignments such as p |= u,
whereas jaq pursues a different, “pathless” approach.
See the section on path-based updates for details on how
path(f) is calculated.
paths, paths(p)
The filter paths yields the paths to all ancestor values of the input.
It is equivalent to skip(1; path(..)).
The filter paths(p) yields the paths to all ancestor values of the input
for which p yields true.
Examples:
getpath($path)
The filter getpath($path) is the inverse filter for path(f).
If path(f) yields no error, then
getpath(path(f)) yields the same outputs as f.
For example:
setpath($path; $v)
The filter setpath($path; $v) sets the value at $path to $v.
It is equivalent to getpath($path) = $v.
For example:
delpaths($paths)
The filter delpaths($paths) takes an array of paths and
deletes all corresponding values in the order given by the array.
For example:
In jq,
the $paths are interpreted relative to the original input value, whereas
in jaq, they are interpreted relative to the current value.
For example,
[1, 2, 3] | delpaths([[0], [0]]) ⟼ [3] in jaq, because
it first deletes the 0-th element 1 (yielding [2, 3]),
then it deletes the 0-th element 2 (yielding [3]).
Here, jq yields [2, 3], because the 0-th element always
refers to the 0-th element of the original input, which is 1.
To use delpaths in an interoperable fashion, use $paths such that:
- Paths to descendants come before paths to their ancestors.
- Paths to array elements to the right come before paths to elements to the left.
For example, .. returns
ancestors before descendants and
array elements to the left before elements to the right.
To use the output of .. in delpaths, it suffices to
reverse the order of its outputs:
pick(f)
The filter pick(f) constructs an object that
contains only those parts of the input that f returns.
For example:
-
{a: {b: 1, c: 2}, d: 3} | pick(. ) ⟼ {"a": {"b": 1, "c": 2}, "d": 3} -
{a: {b: 1, c: 2}, d: 3} | pick(.a.c ) ⟼ {"a": { "c": 2} } -
{a: {b: 1, c: 2}, d: 3} | pick(.a.c, .d) ⟼ {"a": { "c": 2}, "d": 3}
In jq, pick(f) also supports paths to arrays; for example:
-
[1, 2, 3] | pick(.[0 ])yields[1] -
[1, 2, 3] | pick(.[1 ])yields[null, 2] -
[1, 2, 3] | pick(.[1:])yields[2, 3]
While implementing this functionality in jaq, I found many corner cases
that would have made the proper documentation of this filter very complex.
I also found a few surprising behaviours in jq, e.g. that
[1, 2, 3] | pick(.[-1]) yields an error.
In the end, I decided to support only the simpler and well-understandable
subset of paths to objects.
We have the property that pick(f, g) is equivalent to pick(f) * pick(g).
Numbers
tonumber
The filter tonumber takes as input either a number or a string.
If the input is a number, it is returned unchanged;
if the input is a string, it is parsed to a number, failing if this does not succeed.
For example:
infinite, nan
The filters infinite and nan yield the floating-point numbers
Infinity and NaN:
abs
The filter abs
yields the negation of the input if the input is smaller than 0, else it
yields the input.
Note that due to this definition, strings, arrays, and objects
are also returned unchanged, because they are larger than 0;
see ordering.
Examples:
floor, round, ceil
The filters floor, round and ceil round a number
to its closest smaller integer,
to its closest integer, and
to its closest larger integer, respectively.
For example:
Math
jaq implements many mathematical functions via libm.
If not specified otherwise, these filters take and return floating-point numbers.
Zero-argument filters:
-
acos -
acosh -
asin -
asinh -
atan -
atanh -
cbrt -
cos -
cosh -
erf -
erfc -
exp -
exp10 -
exp2 -
expm1 -
fabs -
frexp, which returns pairs of (float, integer). -
gamma -
ilogb, which returns integers. -
j0 -
j1 -
lgamma -
log -
log10 -
log1p -
log2 -
logb -
modf, which returns pairs of (float, float). -
nearbyint -
pow10 -
rint -
significand -
sin -
sinh -
sqrt -
tan -
tanh -
tgamma -
trunc -
y0 -
y1
Two-argument filters that ignore .:
-
atan2 -
copysign -
drem -
fdim -
fmax -
fmin -
fmod -
hypot -
jn, which takes an integer as first argument. -
ldexp, which takes an integer as second argument. -
nextafter -
nexttoward -
pow -
remainder -
scalb -
scalbln, which takes as integer as second argument. -
yn, which takes an integer as first argument.
Three-argument filters that ignore .:
-
fma
Examples:
Arrays
sort, sort_by(f)
The filter sort takes an array and sorts it.
For example:
[true, 1, "abc", [1], {"a": 1}, null, false, 0, "ABC", [], {}] | sort ⟼
[null, false, true, 0, 1, "ABC", "abc", [], [1], {}, {"a": 1}]
The filter sort_by(f) evaluates
the filter f for each value in the input array, and
sorts the values by the output of f.
For example:
-
[0, 1, 2, 3] | sort_by(. % 2) ⟼ [0, 2, 1, 3] -
[{a: 1, b: 2}, {a: 0, b: 3}] | sort_by(. ) ⟼ [{"a": 0, "b": 3}, {"a": 1, "b": 2}] -
[{a: 1, b: 2}, {a: 0, b: 3}] | sort_by(.a) ⟼ [{"a": 0, "b": 3}, {"a": 1, "b": 2}] -
[{a: 1, b: 2}, {a: 0, b: 3}] | sort_by(.b) ⟼ [{"a": 1, "b": 2}, {"a": 0, "b": 3}]
We have the following correspondences:
-
sort_byis equivalent tosort_by(.). -
sort_by(f)is equivalent tosort_by([f]).
group_by(f)
The filter group_by(f) sorts its input array by f, then
groups all values for which f produced identical outputs into the same array.
For example:
unique, unique_by(f)
The filter unique_by(f) sorts its input array by f.
If f produces the same outputs for multiple values in the array,
only the first is kept.
For example:
-
["foo", "", "bar", "quux", "baz"] | unique_by(length) ⟼ ["", "foo", "quux"] -
[1, 2, 3, 4] | unique_by(. % 2) ⟼ [2, 1]
The filter unique is equivalent to unique_by(.).
It sorts the input array and removes any duplicates; e.g.
[3, 2, 1, 3, 4] | unique ⟼ [1, 2, 3, 4].
min, max, min_by(f), max_by(f)
The filters min and max yield the smallest and largest element of an array, respectively.
For example:
The filters min_by(f) and max_by(f) evaluate
the filter f for each value in the input array, and
yield the value for which f produces the smallest or largest output,
respectively.
For example:
-
["abc", [1, 2], {"a": 1}] | min_by(length) ⟼ {"a": 1} -
["abc", [1, 2], {"a": 1}] | max_by(length) ⟼ "abc"
You can yield multiple values in f to break ties such as:
We have the following correspondences:
-
minandmaxare equivalent tomin_by(.)andmax_by(.), respectively. -
min_by(f)andmax_by(f)are equivalent tomin_by([f])andmax_by([f]), respectively.
reverse
The filter reverse takes an array and reverses it.
For example, [1, 2, 3] | reverse ⟼ [3, 2, 1].
transpose
The filter transpose takes an array of arrays and yields its transposition.
Examples:
-
[[1 , 2, 3], [4, 5, 6]] | transpose ⟼ [[1, 4], [2, 5], [3, 6]] -
[[1], [2, 3], [4, 5, 6]] | transpose ⟼ [[1, 2, 4], [null, 3, 5], [null, null, 6]]
More precisely, transpose yields an array $t that contains
map(length) | max arrays of length length, such that
$t[x][y] == .[y][x] for every x and y.
We can verify this:
def verify: transpose as $t |
($t | length) == (map(length) | max),
(range($t | length) as $x |
($t[$x] | length) == length,
(range(length) as $y |
$t[$x][$y] == .[$y][$x]
)
);
[[1, 2, 3], [4, 5, 6]],
[[1], [2, 3], [4, 5, 6]] | all(verify; .) ⟼ true true
flatten, flatten($depth)
The filter flatten flattens input arrays, and
the filter flatten($depth) flattens input arrays up to a certain depth.
For example:
-
[1, [2, [3]], {a: [1, [2]]}] | flatten ⟼ [1, 2, 3 , {"a": [1, [2]]}] -
[1, [2, [3]], {a: [1, [2]]}] | flatten(0) ⟼ [1, [2, [3]], {"a": [1, [2]]}] -
[1, [2, [3]], {a: [1, [2]]}] | flatten(1) ⟼ [1, 2, [3] , {"a": [1, [2]]}] -
[1, [2, [3]], {a: [1, [2]]}] | flatten(2) ⟼ [1, 2, 3 , {"a": [1, [2]]}] -
null, true, 0, "Hi" | flatten ⟼ [null] [true] [0] ["Hi"]
Note that flatten does not impact arrays that are descendants of an object.
We can define flatten/0 and flatten/1 as:
def flattens : if isarray then .[] | flattens end;
def flattens($d): if isarray and $d >= 0 then .[] | flattens($d-1) end;
def flatten : [flattens ];
def flatten($d): [flattens($d)];
[1, [2, [3]], {"a": [1, [2]]}] | flatten, flatten(0), flatten(1), flatten(2) ⟼
[1, 2, 3 , {"a": [1, [2]]}]
[1, [2, [3]], {"a": [1, [2]]}]
[1, 2, [3] , {"a": [1, [2]]}]
[1, 2, 3 , {"a": [1, [2]]}]
bsearch($x)
The filter bsearch($x) takes a sorted array and
performs a binary search for $x in the array.
If the array contains $x, then
the filter yields a positive $i such that .[$i] == $x; otherwise,
the filter yields a negative $i such that inserting $x at the index -$i-1
in the array would preserve its the ordering.
Examples:
If the input array is not sorted, then the output of this filter is meaningless.
Text strings
Unless stated otherwise, all filters in this section take a text string as input, and fail if the input is of any other type.
tostring
The filter tostring converts its input to a string.
Its output depends on the type of its input:
utf8bytelength
The filter utf8bytelength yields the number of bytes of the input string.
It is equivalent to tobytes | length, but different from length,
which counts the number of characters.
For example,
"ゼノギアス" | length, utf8bytelength, (tobytes | length) ⟼ 5 15 15.
startswith($s), endswith($s)
The filter startswith($s) yields
true if the input string starts with the string $s, else false.
Similar for endswith($s).
For example:
trim, ltrim, rtrim
The filters ltrim and rtrim remove from the input string all
leading and trailing whitespace, respectively.
Here, whitespace corresponds to the White_Space Unicode property.
The filter trim is equivalent to ltrim | rtrim.
For example:
Note that there are a few quite unusual whitespace characters in this string.
ltrimstr($s), rtrimstr($s)
The filters ltrimstr($s) and rtrimstr($s) remove a single occurrence of
$s from the start or the end of the string, respectively.
If there is no such occurrence, the original string is returned.
For example:
explode, implode
The filter explode yields an array containing
a positive number for each valid Unicode code point of the input string and
a negative number for each byte of each invalid Unicode code unit.
For example:
"Dear ☀️" + (255 | tobytes | tostring) | explode ⟼
[68,101,97,114,32,9728,65039,-255]
Here, we can see that "☀️" has turned into two code points, namely
9728 and 65039, whereas the invalid FF byte (= 255) has become -255.
The inverse filter of explode is implode:
[68,101,97,114,32,9728,65039, -255] | implode[:-1] ⟼
"Dear ☀️"
(I omitted the FF byte at the end, because it is hard to save in a text editor.)
jq does not permit invalid code units in text strings, so it
returns and accepts only natural numbers in explode and implode.
split($s)
This filter yields . / $s if its input . and $s are both strings, else it fails.
See the section on division for details.
Note that there is also split($re; $flags) that splits by a regex.
join($s)
The filter join($s) takes as input an array [x1, ..., xn] and yields
"" if the array is empty, otherwise
"\(x1)" + $s + ... + $s + "\(xn)".
That is, it concatenates the string representations of the array values interspersed with $s.
For example, to memorise the hierarchy of values in jq:
["null", "boolean", "number", "string", "array", "object"] | join(" < ") ⟼
"null < boolean < number < string < array < object".
Unlike jq, jaq does not map null values in the array to "",
nor does it reject array or object values in the array.
ascii_downcase, ascii_upcase
The filters ascii_downcase and ascii_upcase convert all
ASCII letters in the input string to their lower/upper case variants, respectively.
For example:
Text string formatting
The filters in this section can be prefixed to strings to influence string interpolation. However, these filters can also be used outside of string interpolation. For example:
-
"1 + 2 * 3" | @uri "https://duckduckgo.com/?q=\(.)" ⟼ "https://duckduckgo.com/?q=1%20%2B%202%20%2A%203" -
"1 + 2 * 3" | @uri ⟼ "1%20%2B%202%20%2A%203"
If not indicated otherwise, all filters in this section
convert their input to a string with tostring and
yield a single string output.
As result, byte strings are treated like equivalent text strings; e.g.
"Hello world!" | (tobytes | @base64) == @base64 ⟼ true.
Unlike in jq, you can define you own filters that start with @; for example,
def @xml: @html;
@text
The filter @text is equivalent to tostring.
@json
The filter @json is equivalent to tojson.
@html, @htmld
The filter @html escapes a string so that it can be embedded in an HTML document.
It replaces the following characters by HTML version:
| Text | < |
> |
& |
' |
" |
| HTML | < |
> |
& |
' |
" |
The filter @htmld reverses the effect of @html.
For example:
"\"1 < 2 & 2 > 1\", that's what he said." | @html | ., @htmld ⟼
""1 < 2 & 2 > 1", that's what he said."
"\"1 < 2 & 2 > 1\", that's what he said."
jq does not support @htmld.
@base64, @base64d
The filter @base64 Base64-encodes its input.
The filter @base64d reverses this operation.
For example:
"Hello world!" | @base64 | ., @base64d ⟼
"SGVsbG8gd29ybGQh"
"Hello world!"
In jaq, @base64d only succeeds if its whole input is a valid Base64 string.
In contrast, jq accepts also strings where only a part is valid Base64,
thus potentially leading to hidden data corruption.
See #282 for a detailed discussion.
@uri, @urid
The filter @uri applies
percent-encoding
to encode arbitrary data in a uniform resource identifier (URI).
The filter @urid reverses this encoding.
For example:
The HTML version of this manual is created with jaq, and
@uri is used to encode the examples to create links to the jaq playground.
@sh
The filter @sh escapes data for constructing Unix command-line prompts.
It performs different things depending on its input type:
-
null, boolean, number: Convert to string viatostring. -
String: Replace occurrences of
'by'\''and surround by'. -
Array of scalars: Apply
@shto elements and join the results with" "as separator. - Fail for any other type of value.
Examples:
-
null, true, 1 | @sh ⟼ "null" "true" "1" -
"It's green!" | @sh ⟼ "'It'\\''s green!'" -
["jaq", "-n", "--arg", "slogan", "It's green!", "$slogan"] | @sh ⟼ "'jaq' '-n' '--arg' 'slogan' 'It'\\''s green!' '$slogan'"
When copy-pasting the output of the previous example to your terminal,
be sure to replace \\ by \ before.
That is, you should end up with 'jaq' '-n' '--arg' 'slogan' 'It'\''s green!' '$slogan',
which you can execute in good conscience.
Unescaping can be avoided by running jaq with --raw-output,
which does not escape \ with \\ in the first place.)
@csv
The filter @csv takes an array of scalars.
It transforms each array element depending on its type:
-
null: Yield"". -
Boolean, number: Transform it via
tostring. -
String: Replace occurrences of
"by""and surround by". - Fail for any other type of value.
Finally, the filter joins the transformed elements with "," as separator.
For example:
[true, null, false, 1, "Give me \"quotes\" or die"] | @csv ⟼
"true,,false,1,\"Give me \"\"quotes\"\" or die\""
@tsv
The filter @tsv is similar to @csv, with the following differences:
-
In strings, the characters
'\n','\r','\t','\', and'\u0000'are replaced by strings"\\n","\\r","\\t","\\\\","\\0". (In raw output, these look like\n,\r,\t,\\, and\0.) The result is not surrounded by". -
The transformed elements are joined with
\t(tabulator).
For example:
[true, null, false, 1, "Newline\nBackslash\\NUL\u0000"] | @tsv ⟼
"true\t\tfalse\t1\tNewline\\nBackslash\\\\NUL\\0"
Byte strings
tobytes
The filter tobytes converts its input to a byte string.
Its output depends on the type of input:
-
Natural number in the range
0to255(0xFF): Yields a byte string with a single byte, e.g.0 | tobytes ⟼ b"\x00". -
Text string:
Yields a byte string containing the underlying bytes of the text string, e.g.
"Hi" | tobytes ⟼ b"Hi". This takes constant time. - Byte string: Yields the byte string unchanged.
-
Array: Converts each element to a byte string and yields their concatenation, e.g.
[0, "Hi", [1, 255]] | tobytes ⟼ b"\x00Hi\x01\xFF". This is equivalent tomap(tobytes) | add. - Anything else: Yields an error.
This is inspired by Erlang’s iolist_to_binary function.
jq does not have byte strings and thus does not have tobytes.
fq, which has pioneered the tobytes filter, has both.
Serialisation & Deserialisation
The filters in this section read and write data in all formats supported by jaq. See the formats section for general information about how jaq interprets these formats.
jq supports only JSON, so it only implements the
fromjson/tojson filters in this section.
fromjson, tojson
The filter fromjson takes a string as input,
parses it to JSON values and yields them.
For example:
"null true 0 \"foo\" [1] {\"foo\": 1}" | fromjson ⟼
null true 0 "foo" [1] { "foo" : 1}
The filter tojson takes an arbitrary value and
outputs a string containing its JSON representation.
For example:
[null,true,0, "foo" ,[1],{ "foo" :1}] | tojson ⟼
"[null,true,0,\"foo\",[1],{\"foo\":1}]"
Note that tojson behaves similarly to tostring, but
when its input is a string, it will also encode it to JSON,
instead of returning it unchanged; i.e.
"Hi" | tojson ⟼ "\"Hi\"".
In jq, fromjson yields an error when its input string contains multiple JSON values.
Furthermore, in jaq,
tojson | fromjson is equivalent to identity (.), whereas in jq,
this is not the case, because
nan | tojson | fromjson yields null, not nan.
fromyaml, toyaml
The filter fromyaml takes a text string and parses it as sequence of
YAML documents.
It can yield an arbitrary number of outputs. For example:
The filter toyaml always yields exactly one output, namely
a text string containing the current value encoded as YAML.
fromcbor, tocbor
The filter fromcbor takes a byte string and parses it as sequence of
CBOR values.
For example:
[0, 1, 32, 64, 96, 128, 160, 244, 245, 246] | tobytes | fromcbor ⟼
0 1 -1 b"" "" [] {} false true null
The filter tocbor always yields exactly one output, namely
a byte string containing the current value encoded as CBOR.
fromtoml, totoml
The filter fromtoml takes a text string and parses it as a single
TOML document.
It yields always one output, because every TOML document encodes exactly one value.
For example:
"
[database]\n
enabled = true\n
ports = [ 8000, 8001, 8002 ]\n
data = [ [\"delta\", \"phi\"], [3.14] ]\n
temp_targets = { cpu = 79.5, case = 72.0 }\n
" | fromtoml ⟼
{"database": {
"enabled": true,
"ports": [8000,8001,8002],
"data": [["delta", "phi"], [3.14]],
"temp_targets": {"cpu": 79.5, "case": 72.0}
}}
The filter totoml fails if
the input is not an object or
the input contains any jaq value not supported by TOML.
It converts invalid UTF-8 sequences like CBOR.
fromxml, toxml
The filter fromxml takes a text string and parses it as sequence of
XML tags.
For example:
"<?xml version='1.0'?>
<html>
<body xmlns='http://www.w3.org/1999/xhtml'>
Hello HTML!
</body>
</html>" | fromxml ⟼
{"xmldecl": {"version": "1.0"}}
{"t": "html","c": [{
"t": "body",
"a": {"xmlns": "http://www.w3.org/1999/xhtml"},
"c": ["Hello HTML!"]
}]}
The filter toxml takes data produced in the format by fromxml and
yields a corresponding text string.
[
{"xmldecl": {"version": "1.0"}},
{"t": "html","c": [{
"t": "body",
"a": {"xmlns": "http://www.w3.org/1999/xhtml"},
"c": ["Hello HTML!"]
}]}
] | toxml ⟼
"<?xml version=\"1.0\"?>
<html>
<body xmlns=\"http://www.w3.org/1999/xhtml\">
Hello HTML!
</body>
</html>"
Date & Time
The filters in this section serve to convert between different time formats, such as:
-
Unix epoch: Marks a point in time by the number of seconds passed since
January 1, 1970 00:00:00 (UTC).
You can obtain the current time as Unix epoch via
now. -
ISO-8601 datetime string: Represents a date, a time, and a time zone as a string,
such as
"1970-01-01T00:00:00Z"(corresponding to Unix epoch0). -
“Broken down time” (BDT) array: Represents a date and a time as an array of the shape
[year, month, day, hour, minute, second, weekday, yearday]. All components are integers, except forsecond, which may be a floating-point number. Themonthis counted from0, theweekdayis counted from Sunday (which is0), and theyeardayis the day in the year counted from0. When a BDT array is used as input, only the first six components are considered.
You can convert between these representations via:
-
Unix epoch from/to ISO 8601:
fromdate,todate -
BDT to Unix epoch:
mktime -
Unix epoch to BDT:
gmtime, localtime -
Unix epoch or BDT from/to custom string:
strptime,strftime,strflocaltime
As example, let us consider the time where the
Hill Valley courthouse’s clock tower was struck by lightning, namely
Saturday, November 12, 1955, at 10:04 p.m. PST.
The corresponding date can be written in ISO 8601 as
"1955-11-12T10:04:00-08:00".
We can convert that to a Unix epoch and from there to a (UTC) BDT via:
"1955-11-12T22:04:00-08:00" | fromdate | gmtime ⟼
[
1955,
10,
13,
6,
4,
0,
0,
316
]
We can infer that at this moment, in UTC
it was November 13
(the BDT month is 10 and not 11, because BDT months are counted from 0),
at 06:04:00.
Furthermore, that day
was a Sunday (because the weekday is 0), which
was the 316-th day of the year (where 0 is the first day).
jq does not allow time zone information in ISO 8601 datetime strings.
fromdate, todate, fromdateiso8601, todateiso8601
These filters convert between Unix time and ISO-8601 timestamps.
For example, the
Apollo 13 accident
happened at 03:08 UTC on April 14, 1970.
Its corresponding Unix time is
"1970-04-14T03:08:00Z" | fromdate ⟼ 8910480.
We can get back the ISO-8601 timestamp via
8910480 | todate ⟼ "1970-04-14T03:08:00Z".
These filters can handle floating-point numbers, e.g.
0.123456 | todate ⟼ "1970-01-01T00:00:00.123456Z" and
"1970-01-01T00:00:00.123456Z" | fromdate ⟼ 0.123456.
In particular, fromdate yields a floating-point number if
the time cannot be represented losslessly as an integer.
The filters
fromdateiso8601 and todateiso8601 are synonyms of
fromdate and todate, respectively.
strftime($fmt), strflocaltime($fmt)
The filters strftime($fmt) and strflocaltime($fmt) take as input either
a number that is interpreted as Unix epoch, or
a BDT array.
The filters yield a string representation of the input time, using the format $fmt.
If the input is a Unix epoch,
both strftime and strflocaltime interpret it as UTC timestamp.
If the input is a BDT array, then
strftime interprets input as UTC and
strflocaltime interprets input as user local time.
strftime outputs the time as UTC and
strflocaltime outputs the time as user local time.
For example, if the user is in the CET zone (+0100):
-
0 | strftime("%T %z (%Z)") ⟼ "00:00:00 +0000 (UTC)" -
[1970, 0, 1, 0, 0, 0] | strftime("%T %z (%Z)") ⟼ "00:00:00 +0000 (UTC)" -
0 | strflocaltime("%T %z (%Z)")yields"01:00:00 +0100 (CET)" -
[1970, 0, 1, 0, 0, 0] | strflocaltime("%T %z (%Z)")yields"00:00:00 +0100 (CET)"
jq prints GMT instead of UTC in the examples above; however,
GMT is not the same as UTC.
strptime($fmt)
The filter strptime($fmt) takes a string and parses it using the format $fmt,
yielding a BDT array.
If no time zone is inferred from the input (e.g. via %Z), it is assumed to be UTC.
For example:
gmtime, localtime
The filters gmtime and localtime take a Unix epoch as input and
yield a corresponding BDT array, containing
the time in UTC (gmtime) or in the user local time (localtime).
For example, if the user is in the CET zone (+0100):
mktime
The filter mktime takes a BDT array that is assumed to be in UTC,
and yields the corresponding Unix epoch.
For example, [1970, 0, 1, 0, 0, 0] | mktime ⟼ 0.
Regular expressions
All the filters in this section, such as test, take a string as input and
fail if they receive any other type of value.
Furthermore, they all take two string arguments, namely
the regular expression $re and
the $flags that determine how the regular expression is interpreted.
Omitting $flags is equivalent to passing "" as $flags.
For example,
test($re) is equivalent to test($re; "").
The supported flags are:
-
g: global search -
n: ignore empty matches -
i: case-insensitive -
m: multi-line mode:^and$match begin/end of line -
s: single-line mode: allow.to match\n -
l: greedy -
x: extended mode: ignore whitespace and allow line comments (starting with#)
jaq uses the regex-lite crate to
compile and run regular expressions (regexes).
See the crate documentation for a description of the supported regex syntax.
test
The filter test yields
true if some part of the input matches the regular expression, else false.
For example:
scan
The filter scan yields all parts of the input that match the regular expression.
For example:
match
The filter match yields an object for every part of the input that matches the regular expression, containing:
-
"offset": the character index of the start of the match -
"length": the number of characters of the match -
"string": the contents of the match -
"captures": an array with an object for every capture group, containing:-
"offset", -
"length", -
"string": as above, but for the capture group instead of the whole match -
"name": the name of the capture group if it has one, else this key is omitted
-
Example:
"v2.0, v3.0" | match("v(?<maj>[0-9]+)\\.([0-9]+)"; "g") ⟼
{
"offset": 0,
"length": 4,
"string": "v2.0",
"captures": [
{
"offset": 1,
"length": 1,
"string": "2",
"name": "maj"
},
{
"offset": 3,
"length": 1,
"string": "0"
}
]
}
{
"offset": 6,
"length": 4,
"string": "v3.0",
"captures": [
{
"offset": 7,
"length": 1,
"string": "3",
"name": "maj"
},
{
"offset": 9,
"length": 1,
"string": "0"
}
]
}
capture
The filter capture yields an object for every part of the input that matches the regular expression, containing
for each named capture group an entry with
the group name as key and its matched string as value.
Example:
"v2.0, v3.0" | capture("v(?<maj>[0-9]+)\\.(?<min>[0-9]+)"; "g") ⟼
{
"maj": "2",
"min": "0"
}
{
"maj": "3",
"min": "0"
}
split, splits
The filter split($re; $flags) yields an array of
those parts of the input string that do not match the regular expression $re.
For example:
-
"Here be\tspaces" | split("\\s" ; "") ⟼ ["Here", "be", "spaces"] -
" More\n\n" | split("\\s+"; "") ⟼ ["", "More", ""] -
"" | split("\\s" ; "") ⟼ [""]
Note that split($re; $flags) is equivalent to split($re; "g" + $flags),
meaning that the string is split not only by the first match, but by all matches.
Furthermore, unlike all other filters in this section,
split($s) is not equivalent to split($s; $flags), because
split($s) splits a string by
a separator that is not interpreted as regular expression;
see split.
The filter splits($re; $flags) yields the elements of the array yielded by split($re; $flags).
For example,
"Here be\tspaces" | splits("\\s") ⟼ "Here" "be" "spaces".
The filter splits($re) is equivalent to splits($re; "").
sub, gsub
The filter sub($re; f; $flags) replaces
all parts of the input string that match $re by
the output of f.
Here, f receives an object as returned by capture; that is,
for every named capture group, it contains
its name as key and its matched string as value.
For example:
"Mr. 高橋 & Mrs. 嵯峨" | sub("(?<title>(Mr|Ms|Mrs)\\.) (?<name>\\S+)"; "\(.name) (\(.title))"; "g") ⟼
"高橋 (Mr.) & 嵯峨 (Mrs.)"
When the filter f yields multiple outputs,
then all potential combinations are output.
For example:
"Thanks, fine." | sub("(?<word>\\w+)"; .word, (.word | ascii_upcase); "g") ⟼
"Thanks, fine."
"Thanks, FINE."
"THANKS, fine."
"THANKS, FINE."
We have following short forms:
-
The filter
gsub($re; f; $flags)is equivalent tosub($re; f; "g" + $flags). -
The filter
gsub($re; f)is equivalent togsub($re; f; ""). -
The filter
sub($re; f)is equivalent tosub($re; f; "").
I/O
This section contains filters that interact with the system. These filters may yield different outputs when given equal inputs.
input, inputs
The filter inputs yields all the inputs in the current input file.
For example, jaq -n '[inputs]' <<< 1 2 3 yields [1, 2, 3].
This can be useful to fold over large (potentially infinite) amounts of values;
for example, to create a cumulative sum over all input integers, you can use
jaq -n 'foreach inputs as $x (0; .+$x)'.
The filter input yields the next input in the current input file.
When there is no more input value left,
in jq, input yields an error, whereas in jaq, it yields no output value.
That is, in jaq, input is equivalent to first(inputs).
Both input and inputs have a side effect, i.e. they advance the input stream.
That means that unlike most jq filters, inputs is not referentially transparent.
It is advised to use it sparingly and with caution,
lest you are devoured by the evil dragons of evaluation order.
debug, debug(f)
The filter debug(f) prints a debug message for every output of f
to the standard error stream (stderr), then yields its input.
For example, the filter 0 | debug(1, 2) yields
the following output on the command-line:
["DEBUG:",1]
["DEBUG:",2]
0
The filter debug is equivalent to debug(.).
stderr
The filter stderr prints its input to the standard error stream
in raw and compact mode without newline.
It then yields its input.
halt, halt($exit_code)
The filter halt($exit_code) terminates jaq with the given exit code.
The filter halt terminates jaq with exit code 0.
jq does not implement halt($exit_code), only halt.
halt_error, halt_error($exit_code)
The filter halt_error($exit_code) prints its input via stderr.
It then quits the jaq process with the given exit code.
# jaq -n '"Hi!\n" | halt_error(42)'
Hi!
# echo $?
42
The filter halt_error is equivalent to halt_error(5).
jq prints a newline after the input if it is not a string.
now
This filter yields the Unix epoch as floating-point number.
$ENV, env
The variable $ENV holds an object that contains an entry for every environment variable, where
the key is the name of the variable and the value is its value.
For example, {"EDITOR": "vim", "SHELL": "/usr/bin/bash"}.
The filter env is equivalent to $ENV.
Unsupported
This section lists filters present in jq, but not in jaq.
-
combinations -
$__loc__ -
modulemeta -
have_literal_numbers -
have_decnum -
$JQ_BUILD_CONFIGURATION -
builtins -
input_filename -
input_line_number
jaq supports none of jq’s SQL-style operators, mostly for aesthetic reasons (uppercase-names) and because jq is not SQL:
-
INDEX -
JOIN -
IN
jaq does not support jq’s --stream option;
therefore, it also does not implement the related filters:
-
truncate_stream -
fromstream -
tostream
Advanced features
Assignments
jq allows for assignments of the form p |= f, where p is an arbitrary filter.
This makes assignments in jq uniquely powerful compared to other languages.
For example, a program from the jq manual that blew my mind was the following:
(.posts[] | select(.author == "stedolan") | .comments) += ["terrible."]
This iterates over all posts, selects those whose author is “stedolan”, takes its comments, and adds a not very flattering comment to it. (This does not reflect my opinion about Stephen Dolan — I think that he did a great job creating jq.)
jaq and jq pursue different approaches to execute assignments:
-
Path-based: In
jq, an assignmentp |= fconstructs paths to all values that matchpand applies the filterfto these values. -
Pathless: In jaq, an assignment
p |= fis transformed to a different filter that does not construct any paths.
For example, consider the update
[1, 2, 3] | .[] |= .+1 ⟼ [2, 3, 4]:
When jq executes this, it calculates
[1, 2, 3] | path(.[]) ⟼ [0] [1] [2] and applies
.+1 on each value at these paths.
On the other hand, jaq transforms the update to
[1, 2, 3] | [.[] | .+1] ⟼ [2, 3, 4] —
the assignment does not involve any path construction.
Fortunately, like in the example above, in most cases,
the result of the both approaches is the same.
The following sections explain the two approaches in more detail,
and how to write updates that behave the same in both jq and jaq.
Path-based
The path-based update model used by jq executes p |= u by
first collecting the paths corresponding to p,
then updating the input at these paths by u.
We can approximate this behaviour by getpath(path(p)) |= u —
the actual jq update behaviour is much more complex,
only sparely documented, and
has changed in backwards-incompatible ways between minor versions.
We have a few equivalences for path(f):
p |
path(p) |
|---|---|
. |
[] |
.[] |
keys_unsorted[] | [.] |
.[$i] |
[$i] |
.[$i:$j] |
[{start: $i, end: $j}] |
f, g |
path(f), path(g) |
f | g |
path(f) as $p | $p + (getpath($p) | path(g)) |
f as $x | g |
f as $x | path(g) |
if $p then f else g end |
if $p then path(f) else path(g) end |
Examples:
-
true | path(.) ⟼ [] -
[1, 2, 3] | path(.[]) ⟼ [0] [1] [2] -
[1, 2, 3] | keys_unsorted[] | [.] ⟼ [0] [1] [2] -
{a: 1, b: 2} | path(.[]) ⟼ ["a"] ["b"] -
{a: 1, b: 2} | keys_unsorted[] | [.] ⟼ ["a"] ["b"] -
[1, 2, 3] | path(.[0]) ⟼ [0] -
[1, 2, 3] | path(.[1:-1]) ⟼ [{"start": 1, "end": -1}] -
{a: 1, b: 2} | path(.a, .b) ⟼ ["a"] ["b"] -
{a: 1, b: 2} | path(.a), path(.b) ⟼ ["a"] ["b"] -
[[1], [2]] | path(.[][]) ⟼ [0, 0] [1, 0] -
[[1], [2]] | path(.[]) as $p | $p + (getpath($p) | path(.[])) ⟼ [0, 0] [1, 0] -
[1, 2, 3] | path(0, 2 as $x | .[$x]) ⟼ [0] [2] -
[1, 2, 3] | 0, 2 as $x | path(.[$x]) ⟼ [0] [2]
The filters reduce / foreach are both defined in jaq
in terms of simpler filters f that path(f) can evaluate.
Therefore, in jaq, you can use reduce / foreach inside path(f),
as well as on the left-hand side of updates.
jq does not support this.
Pathless
The pathless update model that is used by jaq
reduces updates p |= u to simpler expressions, depending on p.
It yields the same results as path-based updates in most common cases,
while having the following advantages:
- It does not need to construct paths, resulting in higher performance.
-
It considers multiple outputs by
uwhere possible, whereas path-based updates consider at most one output. For example,0 | (., .) |= (., .+1) ⟼ 0 1 1 2in jaq, whereas it yields only0in jq. However,{a: 1} | .a |= (2, 3) ⟼ {"a": 2}in both jaq and jq, because an object can only associate a single value with any given key, so we cannot use multiple outputs in a meaningful way here. - It avoids iterator invalidation problems that path-based updates are prone to.
However, pathless updates do not support a few filters on the left-hand side of updates that path-based updates support, such as:
For example, the following filters all yield an error in jaq:
-
[1, 2, 3] | try .[] -= 1yields[0, 1, 2]in jq. -
[1, 2, 3] | first( .[]) -= 1yields[0, 2, 3]in jq. -
[1, 2, 3] | limit(2; .[]) -= 1yields[0, 1, 3]in jq. -
[1, 2, 3] | skip (1; .[]) -= 1yields[1, 1, 2]in jq. -
[1, 2, 3] | last ( .[]) -= 1yields an error in jq.
In such cases, you can fall back to path-based updates in jaq by writing
getpath(path(p)) |= u instead of p |= u.
For example, the following filters yield the same outputs in jaq and jq:
-
[1, 2, 3] | getpath(path(try .[] )) -= 1 ⟼ [0, 1, 2] -
[1, 2, 3] | getpath(path(first( .[]))) -= 1 ⟼ [0, 2, 3] -
[1, 2, 3] | getpath(path(limit(2; .[]))) -= 1 ⟼ [0, 1, 3] -
[1, 2, 3] | getpath(path(skip (1; .[]))) -= 1 ⟼ [1, 1, 2] -
[1, 2, 3] | getpath(path(last ( .[]))) -= 1 ⟼ [1, 2, 2](this example yields an error injq, whereas it works in jaq)
The following table shows how jaq executes an update p |= u.
In this table, the case for
f as $x | g assumes that
f yields single outputs f1, …, fn.
p |
p |= u |
|---|---|
. |
u |
.. |
def rec_up: (.[]? | rec_up), .; rec_up |= u |
(f | g) |
f |= (g |= u) |
(f , g) |
f |= u | g |= u |
f as $x | g |
(f1 as $x | g) |= u | ... | (fn as $x | g) |= u |
f // g |
if first(f // false) then f |= u else g |= u |
if $p then f else g end |
if $p then f |= u else g |= u end |
.[] |
iter_upd( u; error) |
.[$i] |
index_upd($i; u; error) |
.[$i:$j] |
slice_upd($i; $j; u; error) |
.[]? |
iter_upd( u; .) |
.[$i]? |
index_upd($i; u; .) |
.[$i:$j]? |
slice_upd($i; $j; u; .) |
It follows from the table that empty |= f is equivalent to . (identity).
We now give definitions for iter_upd, index_upd, and slice_upd:
# .[] |= u
def iter_upd(u; fail):
if isarray then [.[] | u]
elif isobject then with_entries(.value |= u)
else fail end;
all(
([1, 2, 3] | iter_upd(.+1 ; .) == [2, 3, 4]),
([1, 2, 3] | iter_upd(.+1,. ; .) == [2, 1, 3, 2, 4, 3]),
([1, 2, 3] | iter_upd(select(.%2 == 1); .) == [1, 3]),
({a: 1, b: 2} | iter_upd(.+1 ; .) == {"a": 2, "b": 3}),
({a: 1, b: 2} | iter_upd(.+1,. ; .) == {"a": 2, "b": 3})
; .) ⟼ true
# .[$i] |= u
def index_upd($i; u; fail):
if isarray then
if 0 <= $i and $i < length then .[:$i] + [.[$i] | first(u)] + .[$i+1:]
elif -length <= $i and $i < 0 then index_upd(length + $i; u; fail)
else fail end
elif isobject then
if has($i) then with_entries(if .key == $i then {key, value: first(.value | u)} end)
else . + ([{key: $i, value: first(null | u)}] | from_entries) end
else fail end;
all(
([1, 2, 3] | index_upd( 0 ; .+1 ; .) == [2, 2, 3]),
([1, 2, 3] | index_upd( 0 ; .+1,.; .) == [2, 2, 3]),
([1, 2, 3] | index_upd( 0 ; empty; .) == [ 2, 3]),
([1, 2, 3] | index_upd(-1 ; .+1 ; .) == [1, 2, 4]),
([1, 2, 3] | index_upd(-3 ; .+1 ; .) == [2, 2, 3]),
({a: 1, b: 2} | index_upd("a"; .+1 ; .) == {"a": 2, "b": 2}),
({a: 1, b: 2} | index_upd("a"; .+1,.; .) == {"a": 2, "b": 2}),
({a: 1, b: 2} | index_upd("a"; empty; .) == { "b": 2})
; .) ⟼ true
# .[$i:$j] |= u
def slice_upd($i; $j; u; fail):
first(.[:$i] + (.[$i:$j] | u) + .[$j:]) // .[:$i] + .[$j:];
all(
([1, 2, 3, 4] | slice_upd(1; -1; map(.+1) ; .) == [1, 3, 4, 4]),
([1, 2, 3, 4] | slice_upd(1; -1; map(.+1),. ; .) == [1, 3, 4, 4]),
([1, 2, 3, 4] | slice_upd(1; -1; empty ; .) == [1, 4]),
("abcd" | slice_upd(1; -1; ascii_upcase; .) == "aBCd" )
; .) ⟼ true
In jq, . |= empty yields null for any input, whereas jaq yields no output.
Similarily,
in jq, . |= (., .) yields its input once, whereas jaq yields its input twice.
In jq, [0, 1] | .[3] = 3 yields [0, 1, null, 3]; that is,
jq fills up the list with nulls if we update beyond its size.
In contrast, jaq fails with an out-of-bounds error in such a case.
In jq,
null | .a = 1 yields {"a": 1} and
null | .[0] = 1 yields [1], meaning that
jq treats null as empty array or object when
updating it with a string or integer index.
Because jaq supports non-string object keys, this is ambiguous, because
it is not clear whether
null | .[0] = 1 should yield
{0: 1} or [1].
For that reason, jaq yields an error on updates of null with any kind of index.
Patterns
The filter f as $x | g binds the outputs of f to a
variable $x.
At the place of $x, we can use a pattern to
destructure the input into multiple variables.
Consider the following filter:
[ 1, {a: 2}] |
.[0] as $x |
.[1].a as $y |
$x, $y ⟼
1 2
We can write this more compactly using a pattern as follows:
[ 1, {a: 2}] as
[$x, {a: $y}] |
$x, $y ⟼
1 2
Here, [$x, {a: $y}] is a pattern that is used to match the value [1, {a: 2}].
It binds
1 to $x and
2 to $y.
Similarly to object construction,
{$x} is equivalent to {x: $x} also for object patterns.
For example, we could have written the previous example equivalently as
[1, {a: 2}] as [$x, {$a}] | $x, $a ⟼ 1 2
When a pattern does not exist in its input,
its corresponding variables are bound to null:
-
[1, {b: 2}] as [$x, {$a}] | $x, $a ⟼ 1 null -
[1 ] as [$x, $y] | $x, $y ⟼ 1 null -
[1, {a: 2}] as [$x, [$y]] | $x, $y ⟼ 1 null
If the types of a pattern and its input do not match, an error is thrown:
Patterns do not have to match their whole input:
Patterns can be arbitrarily nested:
-
{a: [1, {b: [2]}]} as {a: [$x, {b: [$y]}]} | $x, $y ⟼ 1 2 -
[[[1]]] as [[[$x]]] | $x ⟼ 1 -
{a: {b: {c: 1}}} as {a: {b: {c: $x}}} | $x ⟼ 1
We can write any filter (f) as object key in a pattern:
{a: 1, b: 2, c: 3, d: 4} as
{("a", "b"): $x, ("c", "d"): $y} |
[$x, $y] ⟼
[1, 3] [1, 4] [2, 3] [2, 4]
We can also use patterns in reduce and foreach:
[{"a": 1, "b": 2}, {"a": 3, "b": 4}] |
foreach .[] as {("a", "b"): $x} ([]; . + [$x]) ⟼
[1]
[1,2]
[1,2,3]
[1,2,3,4]
jaq does not support jq’s destructuring alternative operator
?//.
A pattern p is either:
-
a variable
$x, -
an array pattern
[p1, ..., pn]containingnpatterns, or -
an object pattern
{e1, ..., en}containingnobject entries. An object entryeis either:-
a variable
$xor -
a key-value pair
(k): p(wherekis a filter andpis a pattern).
-
a variable
An array pattern [p1, ..., pn] is equivalent to
an object pattern {(0): p1, ..., (n): pn}.
Because of this, you can use
object patterns with integer keys to destructure arrays, or
array patterns to destructure objects with integer keys.
Furthermore, you can also destructure byte strings:
-
[1, 2, 3] as {(0): $x, (2): $y} | $x, $y ⟼ 1 3 -
{(0): 1, (2): 3} as [$x, $_, $y] | $x, $y ⟼ 1 3 -
[1, 2, 3] | tobytes as [$x, $y] | $x, $y ⟼ 1 2
When using a filter (f) as object key in a pattern, then
f is run with the input that was matched
by its parent object pattern, not
by the whole pattern.
For example,
[{"k": "a", "a": 1}] as [{(.k): $x}] | $x ⟼ 1
This is equivalent to:
[{"k": "a", "a": 1}] |
.[0] as $p0 |
$p0[$p0 | .k] as $x |
$x
⟼ 1
Here, we can see that (.k) is run with the input $p0, which is
the value that the parent object pattern of (.k), namely {(.k): $x},
is trying to match.
Compare this with the following wrong transformation, where
(.k) would be run with the input matched by the whole pattern:
[{"k": "a", "a": 1}] |
try (
.[0] as $p0 |
$p0[.k] as $x | # fails here because .k is run with whole input
$x
) catch "fail" ⟼ "fail"
Modules
jq allows dividing programs into multiple files that are called modules.
At the beginning of any jq module, there is a module header that consists of
a (potentially empty) sequence of instructions listed in this section.
The module header is then followed by a sequence of definitions.
Finally, the main module that is called from the command-line interface
(via --from-file or inline)
must contain a single filter at the end, which is the filter that is executed.
All include/import instructions search for files as explained in the
search paths section.
Module metadata
The instruction module meta; sets the metadata of the current module to
the output of meta, where meta is a filter.
This instruction may occur
only at the beginning of the module header and
only once.
For example, module "My module"; 1 ⟼ 1.
jaq ignores this instruction, whereas
jq uses it to provides the output of meta via the modulemeta filter.
Module inclusion
The instructions
include "mod"; and
include "mod" meta;
make all filters defined in the module mod.jq accessible in the current module.
For example, if foo.jq in the current working directory contains def bar: 1;, then
jaq -L . -n 'include "foo"; bar' yields 1.
Module import
The instructions
import "mod" as name; and
import "mod" as name meta;
make all definitions in the module mod.jq accessible in the current module
with the prefix name::.
For example, if foo.jq in the current working directory contains def bar: 1;, then
jaq -L . -n 'import "foo" as myfoo; myfoo::bar' yields 1.
Data import
The instructions
import "data" as name; and
import "data" as name meta;
load all JSON values in data.json to an array,
bind it to the variable $name, and
make it accessible in the current module.
For example, if foo.json in the current working directory contains 1 2 3, then
jaq -L . -n 'import "foo.json" as $myfoo; $myfoo' yields [1, 2, 3].
Search paths
An include/import instruction searches for its given file
in the following directories, in the following order:
-
The global search paths given via
--library-path. They are interpreted relative to the current working directory. -
The local search paths given via metadata:
When an
include/importinstruction hasmetaof the shape{..., search: ..., ...}, then the value at the key"search"sets the local search paths for that instruction to:- If the value is a string: Just that string.
- If the value is an array: All strings in the array.
- Otherwise: Nothing.
include/importinstruction is a file, then these paths are interpreted relative to the parent directory of that module. Otherwise, these paths are interpreted relative to the current working directory. (That is the case if the module containing theinclude/importinstruction is given inline on the command-line.)
Every global and local search path is substituted as follows:
-
If it starts with
~, then~is substituted with the user’s home directory, given by the environment variableHOMEon Linux andUSERPROFILEon Windows. -
If it starts with
$ORIGIN, then$ORIGINis substituted by the directory in which thejaqexecutable resides.
For example,
jaq -L ~/foo -L bar 'include "decode" {search: ["baz", "$ORIGIN/quux"]}; 1'
searches for the file decode.jq at the following paths in the given order:
-
~/foo/decode.jq(where~is substituted by the user’s home directory) -
./bar/decode.jq -
./baz/decode.jq -
$ORIGIN/quux/decode.jq(where$ORIGINis substituted by the parent directory of thejaqexecutable)
The first path that corresponds to an existing file is taken.
Now, suppose that decode.jq contains an instruction
import "binary" as $binary {search: "."}.
This searches for binary.json at the following paths:
-
~/foo/binary.json -
./bar/binary.json(relative to the current working directory) -
./binary.json(relative to the parent directory ofdecode.jq)
If a file to load has been given without extension,
such as decode and binary above, then
jaq adds an extension (.jq for modules or .json for data).
jq adds an extension unconditionally; that is,
even if an extension has been given as part of the file name,
jq adds an extension.
jaq’s behaviour is motivated by allowing instructions like
import "binary.cbor" as $binary in the future.
Here, unconditionally adding the .json extension would be counterproductive.
Formats
jaq supports reading and writing several data formats. This section describes these data formats.
You can load and write data in these formats using either:
-
the command-line options
--from/--to, or -
(de-)serialisation filters such as
fromjsonandtoyaml
The command-line options --from and --to always yield
better or equal performance than the corresponding filters.
JSON
JSON (JavaScript Object Notation) is specified in RFC 8259.
jaq can read all valid JSON values; however, like jq,
it also accepts certain values that are invalid JSON.
This set of values is documented in the XJON section.
XJON
The native data format of jaq is a superset of JSON called XJON (eXtended JavaScript Object Notation, pronounced like “action”).
XJON extends JSON with following constructs:
-
Line comments:
# ... \nis interpreted as comment -
Special floating-point numbers:
NaN,Infinity,-Infinity -
Numbers starting with
+: Every number that may be prefixed with-(minus) may also be prefixed with+(plus), e.g.+7,+Infinity. -
UTF-8 strings with invalid code units:
The JSON standard is slightly ambiguous whether
strings may contain invalid UTF-8 code units.
XJON explicitly allows for invalid code units in UTF-8 strings,
e.g. the output of
printf '"\xFF"'. This increases compatibility with tools that output such strings (e.g. file names). Furthermore, it allows for constant-time loading of strings via--rawfile, where jq takes linear time due to UTF-8 validation. -
Byte strings:
A byte string is created via
b"...", where...is a sequence of:-
bytes in the range 0x20 to (including) 0xFF,
excluding the ASCII characters
'"'and'\' -
an escape sequence, starting with a backslash (
'\') and followed byb,f,n,r,t,'"','\', orxHH, whereHHis a hexadecimal number For example:b"Here comes \xFF, dadadada\nHere comes \xFF\nAnd I say: \"It's alright\"\x00". Byte strings of this shape can also be found in other languages, like Rust & Python (with leadingb) and JavaScript & C (without leadingb).
-
bytes in the range 0x20 to (including) 0xFF,
excluding the ASCII characters
-
Objects with non-string keys:
Where JSON limits object keys to strings,
XJON allows arbitrary values as object keys.
For example:
{null: 0, true: 1, 2: 3, "str": 4, ["arr"]: 5, {}: 6}
The goal behind XJON was to support a set of values present in YAML and CBOR, namely byte strings and objects with non-string keys, while keeping the format both human-readable and simple & performant to parse, like JSON.
XJON can losslessly encode any jaq value; in particular, decoding an XJON-encoded value is equivalent to the original value. For example:
That means that tojson | fromjson is equivalent to . in jaq, whereas
it is not equivalent in jq, in particular because of NaN and Infinity.
Currently, wherever jaq accepts JSON, it also accepts XJON.
That means that
jaq --from json <<< 'NaN b"Bytes" {1: 2} # Over and out' yields
'NaN b"Bytes" {1: 2}, although the input is XJON, not valid JSON.
YAML
YAML (YAML Ain’t Markup Language™) is “a human-friendly data serialization language for all programming languages”. It is also a JSON and XJON superset. That means that every jaq value can be encoded as YAML value.
jaq supports reading YAML with anchors (&foo) and aliases (*foo).
These allow the creation of shared data structures. For example:
-
"[&a 1, &b 2, *a, *b]" | fromyaml ⟼ [1, 2, 1, 2] -
"[&b [&a [], *a], *b]" | fromyaml ⟼ [[[], []], [[], []]]
jaq validates tags for scalar YAML values, such as
null, booleans, numbers, and strings:
On the other hand, jaq ignores tags for arrays and objects:
jaq produces YAML that is very close to JSON/XJON.
It differs from XJON only by writing
byte strings as Base64-encoded !!binary string and
special floating-point values as .inf, -.inf, and .nan:
[infinite, -infinite, nan, ("a" | tobytes), {"a": 1}] | tojson, toyaml ⟼
"[Infinity,-Infinity,NaN,b\"a\",{\"a\":1}]"
"[.inf,-.inf,.nan,!!binary YQ==,{\"a\":1}]"
jaq preserves invalid UTF-8 sequences in text strings when writing YAML. However, jaq yields an error when trying to parse YAML containing invalid UTF-8 sequences.
When using --to yaml, jaq writes
--- before every output value and
... after every output value.
This is done to indicate the start/end of YAML documents.
For example:
$ jaq --to yaml <<< '1 2'
---
1
...
---
2
...
Both --from yaml and the filter fromyaml
load the full input into memory before parsing it.
CBOR
CBOR (Concise Binary Object Representation) is a binary format specified in RFC 8949.
CBOR values are a superset of jaq values. That means that there are CBOR values for which there are no equivalent jaq values, for example:
jaq fails when trying to decode such CBOR values.
Every jaq value can be encoded losslessly as a CBOR value, except for
text strings with invalid UTF-8 code units.
Invalid UTF-8 sequences are replaced with U+FFFD, which looks like this: “�”.
jaq writes sequences of CBOR values by concatenating them without any separator.
That means that --to cbor is equivalent to --to cbor --join-output.
jaq can also read sequences of concatenated CBOR values.
TOML
TOML is a configuration file format.
Compared to jaq values,
TOML has date-time values, but
TOML has neither
null,
byte strings, nor
non-string object keys.
When writing TOML, jaq converts invalid UTF-8 sequences as for CBOR.
XML
jaq reads data adhering to the XML 1.0 standard. However, it treats only XML data encoded as UTF-8.
jaq can read XHTML files, but it cannot directly read HTML files. You can use tools such as html2xhtml to convert HTML to XHTML.
Mappings between XML to JSON generally have to make a compromise between “friendliness” and round-tripping; see “Experiences with JSON and XML Transformations”. Here, “friendliness” means that JSON generated from XML has a flat structure, making it easy to consume it. Stefan Goessner gives a nice discussion of different “friendly” mappings in “Converting Between XML and JSON”. The take-away message is: “Friendly” mappings lose information. For that reason, jaq does not use a “friendly” mapping, but rather a mapping that preserves XML information perfectly, making it suitable for round-tripping.
As an example, consider the following input:
<a href="https://www.w3.org">World Wide Web Consortium (<em>W3C</em>)</a>
We can see its internal representation in jaq by:
$ echo '<a href="https://www.w3.org">World Wide Web Consortium (<em>W3C</em>)</a>' | jaq --from xml .
This yields the following JSON:
{
"t": "a",
"a": { "href": "https://www.w3.org" },
"c": [
"World Wide Web Consortium (",
{ "t": "em", "c": [ "W3C" ] },
")"
]
}
TAC objects
Tags are represented by TAC objects. A TAC object may have the following fields:
-
t: Name of the tag, such ash1for<h1>...</h1>. This field must always be present in a TAC object. -
a: Attributes of the tag, such as{"id": "foo", style: "color:blue;"}. If this field is present, it must contain an object with string values. -
c: Children of the tag. If this field is not present, this tag will be interpreted as self-closing (such as<br/>). When a TAC object produced by jaq (either via--from xmlorfromxml) has thecfield, it always holds an array of XML values. When writing XML values (either via--to xmlortoxml), jaq accepts any XML value at thecfield.
An example query to obtain all links in an XHTML file:
.. | select(.t? == "a") | .a.href
We can also transform input XML and yield output XML.
For example, to transform all em tags to i tags:
(.. | select(.t? == "em") | .t) = "i"
To yield XML output instead of JSON output, use the option --to xml:
$ echo '<a href="https://www.w3.org">World Wide Web Consortium (<em>W3C</em>)</a>' | jaq --from xml --to xml '(.. | select(.t? == "em") | .t) = "i"'
<a href="https://www.w3.org">World Wide Web Consortium (<i>W3C</i>)</a>
Finally, we can extract all text from an XML file (discarding CDATA blocks):
def xml_text: if isstring then . else .c[]? | xml_text end; [xml_text]
Other values
-
Strings are neither escaped nor unescaped; that means,
Tom & Jerryin the source XML becomes"Tom & Jerry"in the target JSON. The@html/@htmldfilters can be used for manual (un-)escaping. -
A comment such as
<!-- this comment -->is converted to{"comment": " this comment "}. -
A CDATA block such as
<![CDATA[Tom & Jerry]]>is converted to{"cdata": "Tom & Jerry"}. -
An XML declaration such as
<?xml version="1.0" encoding="UTF-8" standalone="yes"?>is converted to{"xmldecl": {"version": "1.0", "encoding": "UTF-8", "standalone": "yes"}}. (Note that the values given in this declaration, such as the encoding, are ignored by jaq’s XML parser.) -
A processing instruction such as
<?xml-stylesheet href="common.css"?>is converted to{"pi": {"target": "xml-stylesheet", "content": "href=\"common.css\""}}.
To put all of this together, consider the following XML file (examples/test.xhtml):
<?xml version='1.0'?>
<?xml-stylesheet href="common.css"?>
<!DOCTYPE html>
<html xmlns="http://www.w3.org/1999/xhtml">
<body>
<!-- CDATA blocks do not require escaping -->
<![CDATA[Hello & goodbye!]]><br/>
</body>
</html>
Running jaq . examples/test.xhtml yields the following output:
{
"xmldecl": {
"version": "1.0"
}
}
{
"pi": {
"target": "xml-stylesheet",
"content": "href=\"common.css\""
}
}
{
"doctype": {
"name": "html"
}
}
{
"t": "html",
"a": {
"xmlns": "http://www.w3.org/1999/xhtml"
},
"c": [
"\n ",
{
"t": "body",
"c": [
"\n ",
{
"comment": " CDATA blocks do not require escaping "
},
"\n ",
{
"cdata": "Hello & goodbye!"
},
{
"t": "br"
},
"\n "
]
},
"\n"
]
}
The output contains several values consisting only of whitespace, such as "\n ".
These are conserved by jaq because XML is a whitespace-sensitive format.
Examples
The following examples should give an impression of what jaq can currently do. You should obtain the same outputs by replacing jaq with jq.
Access a field:
$ echo '{"a": 1, "b": 2}' | jaq '.a'
1
Add values:
$ echo '{"a": 1, "b": 2}' | jaq 'add'
3
Construct an array from an object in two ways and show that they are equal:
$ echo '{"a": 1, "b": 2}' | jaq '[.a, .b] == [.[]]'
true
Apply a filter to all elements of an array and filter the results:
$ echo '[0, 1, 2, 3]' | jaq 'map(.*2) | [.[] | select(. < 5)]'
[0, 2, 4]
Read (slurp) input values into an array and get the average of its elements:
$ echo '1 2 3 4' | jaq -s 'add / length'
2.5
Repeatedly apply a filter to itself and output the intermediate results:
$ echo '0' | jaq '[recurse(.+1; . < 3)]'
[0, 1, 2]
Lazily fold over inputs and output intermediate results:
$ seq 1000 | jaq -n 'foreach inputs as $x (0; . + $x)'
1 3 6 10 15 [...]
Lewis’s Puzzle
The following puzzle was communicated to me at a workshop by a certain Mr. Lewis, where I solved it together with him in jq. It goes as follows:
We have a sequence of strings:
X
XYZX
XYZXABXYZX
For example, the 4th letter of the 2nd string (always counting from zero) is ‘A’. What is the 10244th letter of the 30th string?
First, let us understand how this sequence is built. To get the next sequence of letters, we take the previous sequence, concatenate it with the next two letters in the alphabet, then concatenate it with the previous sequence again.
If we take numbers instead of letters, we can write this down as:
X(0 ) = 0
X(N+1) = X(N) (m+1) (m+2) X(N), where m is the largest element in X(N)
We can now write the strings as JSON arrays.
The first array is [0], and we can produce each following array by
a filter next, on which we recurse to get an sequence of all arrays.
def next: . + [max + (1, 2)] + .;
[0] | limit(3; recurse(next)) ⟼
[0]
[0,1,2,0]
[0,1,2,0,3,4,0,1,2,0]
However, this does not scale well — getting to the 30th array will take a very long time, because the arrays grow exponentially. Feel free to try it, but watch out not to get your RAM eaten. :) (I recommend monitoring RAM usage when doing this experiment. Otherwise, you may very well crash your computer due to memory exhaustion. Guess how I know?)
To solve this problem, we can exploit jq’s sharing.
Note that each array contains
a portion to the left that is equal to
a portion on the right;
for example, [0, 1, 2, 0] in the 2nd array.
We can therefore choose a slightly different array representation that
allows us to share all the equal parts of the array, just by
inserting the previous arrays into a new array.
def next: [., .[2] + (1,2), .];
[0] | limit(3; recurse(next)) ⟼
[0]
[[0],1,2,[0]]
[[[0],1,2,[0]],3,4,[[0],1,2,[0]]]
In all arrays produced by next, the first and the last elements are now
shared, meaning that they are stored only a single time in memory.
That allows us to store exponentially large data in linear memory,
thus cracking the puzzle.
To get the largest number of the previous array, we used the fact that
the 2nd element of each array contains the largest number in the array.
For example,
the 2nd element of the 1st array is 2, and
the 2nd element of the 2nd array is 4.
(For the 0th array, the 2nd element is null,
but the maximum element of that array is 0.
However, null is interpreted by addition just like 0,
so this difference does not matter to us.)
We can therefore get the two next largest numbers very elegantly via
.[2] + (1, 2).
We can now get the numbers of any such array with .. | numbers.
For example, the numbers of the 2nd array are:
[[[0],1,2,[0]],3,4,[[0],1,2,[0]]] | .. | numbers ⟼
0 1 2 0 3 4 0 1 2 0
Putting all this together, we get our solution via:
def next: [., .[2] + (1,2), .];
[0] | nth(30; recurse(next)) | nth(10244; .. | numbers) ⟼
2
This now runs almost instantaneously, and gives us the answer 2.
Going back to the original puzzle, because X = 0, Y = 1, Z = 2,
the final answer to the puzzle is Z.
HTML scraping
I wanted to extract the
list of examples from the CBOR specification,
in order to create a test suite for the CBOR encoder/decoder in jaq.
For this, I copied the relevant section from the HTML source code and
pasted it into examples/cbor-examples.xhtml.
The interesting parts look like this:
<table>
<tbody>
<tr>
<td class="text-left" rowspan="1" colspan="1">25</td>
<td class="text-left" rowspan="1" colspan="1">0x1819</td>
</tr>
<tr>
<td class="text-left" rowspan="1" colspan="1">100</td>
<td class="text-left" rowspan="1" colspan="1">0x1864</td>
</tr>
</tbody>
</table>
Here, I was interested in the pairs 25 / 0x1819 and 100 / 0x1864,
meaning that the number 25 is encoded in CBOR as 0x1819.
Finally, I came up with a jq program to extract this data. It consists of the following tasks:
-
Select all
<tr>elements with.. | select(.t? == "tr"). -
Get its children with
.c. -
Iterate over the children with
.[]and get their children with.c?[].
(I use .c? here instead of .c because XML is whitespace-sensitive, so in
<tr> <td></td> </tr>, the <tr> element has actually three children, namely
two space strings " " and the <td> element.
Indexing the strings with .c yields an error, whereas
indexing them with .c? just yields nothing, allowing us to ignore the space.)
We can see the effects of that on a slightly simplified version of the HTML:
"<table><tr> <td>25</td> <td>0x1819</td> </tr><tr> <td>100</td> <td>0x1864</td> </tr></table>" | fromxml |
.. | select(.t? == "tr").c | [.[].c?[]] ⟼
[ "25", "0x1819"]
["100", "0x1864"]
To run this via the CLI:
jaq '.. | select(.t? == "tr").c | [.[].c?[]]' examples/cbor-examples.xhtml
We can create a series of tests as follows:
jaq '.. | select(.t? == "tr").c | [.[].c?[]] | @json "jc(\(.[0]), \(.[1][2:]));"' examples/cbor-examples.xhtml -r
I used exactly this command to create a draft for jaq’s CBOR parsing test suite.
Comments
A comment starts with
#and ends with the first newline that is not preceded by an uneven number of backslashes (\). For example:This is equivalent to
[1, 2, 3, 4, 5].