jaq manual

Michael Färber

jaq is an interpreter for the jq programming language. It is designed to be usable as a drop-in replacement for the jq program, which is the reference interpreter for the jq language written in C.

Written in Rust, jaq focuses on correctness, high performance, and simplicity. In addition, jaq adds some functionality not present in jq:

This manual aspires to:

In case that this manual falls short of these goals, please open an issue or create a pull request. The same holds if you wish to propose new functionality for jaq. This project lives from your contributions!

The creation of this manual was funded through the NGI0 Commons Fund, a fund established by NLnet with financial support from the European Commission’s Next Generation Internet programme, under the aegis of DG Communications Networks, Content and Technology under grant agreement No 101135429. Additional funding is made available by the Swiss State Secretariat for Education, Research and Innovation (SERI).

This manual uses “compatibility” blocks to point out occasions where jaq diverges from jq. In this manual, “jq” refers to the C implementation, whereas “jq” refers to the jq language.

These “advanced” blocks are a kind of “making of” for this manual. They document experiments I made or ideas I had during the writing of the manual. As such, they are not essential for understanding the jq language. However, they might be useful if you wish to embark on a journey to become a master of the jq language.

To satisfy your hunger for an even deeper understanding of jq semantics, my jq language specification should provide sufficient material.

Feel free to skip these “advanced” blocks if you do not seek enlightenment.

Command-line interface

Running jaq [OPTION]… [FILTER] [FILE]… performs the following steps:

  • Parse FILTER as jq program; see jq language
  • For each FILE:
    • Parse FILE to a stream of values
    • For each input value in the file:
      • Run FILTER on the input value and print its output values
  • If an uncaught error is encountered at any point, jaq stops.

For example, jaq '.name?' persons.json parses the filter .name?, then reads all values in persons.json one-by-one. It then executes the filter .name? on each of the values and prints the outputs of the filter as JSON.

Input

If no FILE is given, jaq reads from standard input. jaq determines the format to parse a FILE as follows:

  • If --from FORMAT is used, jaq uses that format.
  • Otherwise, if FILE has a file extension known by jaq, such as .json, .yaml, .cbor, .toml, .xml, jaq uses the corresponding format.
  • Otherwise, jaq assumes JSON.

--from FORMAT

Interpret all input files as FORMAT. For example, jaq --from yaml . myfile.yml parses myfile.yml as YAML. Possible values of FORMAT include: raw, json, yaml, cbor, toml, xml.

jaq automatically chooses the corresponding input format for files with the extensions .json, .yaml, .cbor, .toml, .xml, .xhtml. That means that jaq --from cbor . myfile.cbor is equivalent to jaq . myfile.cbor.

jq does not have this option.

-n, --null-input

Feed null as input to the main program, ignoring any input files. For example, yes | jaq -n yields null, which shows that this does indeed not read any input.

The inputs can still be obtained via the inputs filter; for example, yes true | jaq -n 'first(inputs)' yields true. This can be useful to fold over all inputs with reduce / foreach.

-R, --raw-input

Read lines of the input as sequence of strings. For example, echo -e "Hello\nWorld" | jaq -R yields two outputs; "Hello" and "World".

When combined with --slurp, this yields the whole input as a single string. For example, echo -e "Hello\nWorld" | jaq -Rs yields "Hello\nWorld\n". See --rawfile.

This is equivalent to --from raw.

-s, --slurp

Read (slurp) all input values into one array. For example, jaq -s <<< "1 2 3" yields a single output, namely the array [1, 2, 3], whereas jaq <<< "1 2 3" yields three outputs, namely 1, 2, and 3.

When combined with --raw-input, jaq reads the full input as a single string. For example, jaq -Rs <<< "1 2 3" yields the single output "1 2 3\n". See --rawfile.

When multiple files are slurped in, jq combines the inputs of all files into one single array, whereas jaq yields an array for every file. This is motivated by jaq’s --in-place option, which could not work with the behaviour implemented by jq. The behaviour of jq can be approximated in jaq; for example, to achieve the output of jq -s . a b, you may use jaq -s . <(cat a b).

Output

--to FORMAT

Print all output values in the given FORMAT. Any FORMAT accepted by --from can be used here.

Note that not every value can be printed in every format. For example, TOML requires that the root value is an object, so jaq --to toml <<< [] yields an error.

jq does not have this option.

-c, --compact-output

Print JSON compactly, omitting whitespace. For example, jaq -c <<< '[1, 2, 3]' yields the output [1,2,3].

-r, --raw-output

Write (text and byte) strings without escaping them and without surrounding them with quotes. For example, jaq -r <<< '"Hello\nWorld"' outputs two lines; Hello and World, whereas jaq <<< '"Hello\nWorld"' outputs a single line; "Hello\nWorld".

This does not impact strings contained inside other values, i.e. arrays and objects. For example, jaq -r <<< '["Hello\nWorld"]' outputs ["Hello\nWorld"].

This is equivalent to --to raw.

-j, --join-output

Do not print a newline after each value. For example, jaq -j <<< 'true false' yields the output truefalse, without trailing newline.

This is particularly useful in combination with --raw-output (-r); for example, jaq -jr <<< '"Hello" " " "World" "\n"' yields the output Hello World (with trailing newline).

-i, --in-place

Overwrite input file with its output. For example, jaq -i . myfile.json reads the file myfile.json and overwrites it with a formatted version of it. Note that the input file is overwritten only once there is no more output and if there has not been any error.

jq does not have this option.

-S, --sort-keys

Print objects sorted by their keys. For example, jaq -Sc <<< '{"b": {"d": 3, "c": 2}, "a": 1}' yields {"a":1,"b":{"c":2,"d":3}}, whereas jaq -c <<< '{"b": {"d": 3, "c": 2}, "a": 1}' yields {"b":{"d":3,"c":2},"a":1}.

-C, --color-output

Always color output, even if jaq does not print to a terminal. For example, jaq -C <<< '{}' | jaq --from raw tobytes yields the byte string b"\x1b[1m{\x1b[0m\x1b[1m}\x1b[0m", containing ANSI color sequences, whereas jaq <<< '{}' | jaq --from raw tobytes yields b"{}". (Here, jaq --from raw tobytes prints a byte representation of its input.)

-M, --monochrome-output

Do not color output.

--tab

Use tabs for indentation rather than spaces.

For example, jaq --tab <<< '[1, [2]]' | jaq -Rs yields "[\n\t1,\n\t[\n\t\t2\n\t]\n]\n", whereas jaq <<< '[1, [2]]' | jaq -Rs yields "[\n 1,\n [\n 2\n ]\n]\n".

--indent N

Use N spaces for indentation (default: 2).

Compilation

If no FILTER is given, jaq uses . (the identity filter) as filter.

When passing filters directly as FILTER argument on the command-line, care has to be taken to properly escape the filter. How to do this depends from platform to platform, but on Unixoid systems, surrounding the filter with single quotes (') and replacing occurrences of ' in filters by '\'' suffices. For example, to run the filter "'" that produces a string containing a single quote, you can use jaq -n '"'\''"'.

Running filters that start with the negation operator, such as jaq '-1', fails because - is interpreted as start of a command-line switch rather than negation. You can remedy this by using jaq -- '-1' instead, or by surrounding the filter in parentheses, i.e. jaq '(-1)'.

-f, --from-file

Read filter from a file given by filter argument.

With this option, jaq interprets the FILTER argument as name of a file containing the filter. Note that the file name may not directly succeed this option. For example, jaq --from-file -n script.jq uses the contents of the file script.jq as filter.

-L, --library-path DIR

Search for modules and data in given directory.

jaq searches for modules and data in a set of directories called “search paths”. Using --library-path adds a new directory to the global search paths.

For example, jaq -L . -L .. 'include "script"; foo' looks for script.jq first in the current directory, then in the parent directory.

If --library-path is not given, the following global search paths are used:

  • ~/.jq
  • $ORIGIN/../lib/jq
  • $ORIGIN/../lib

See the modules section for more details.

Variables

--arg A V

Set variable $A to string V.

For example, jaq --arg name "John Doe" -n '"Welcome, " + $name' yields "Welcome, John Doe".

--argjson A V

Set variable $A to JSON value V.

For example, jaq --argjson song '{"name": "One of Us", "artist": "ABBA", "year": 1981}' -n '"Currently playing: \($song.name) (\($song.year))"' yields "Currently playing: One of Us (1981)".

If V contains more than a single value, e.g. 1 2, then jaq yields an error.

--slurpfile A F

Set variable $A to array containing the JSON values in file F.

For example, if values.json contains 1 2 3, then jaq --slurpfile xs values.json -n '$xs' yields [1, 2, 3].

--rawfile A F

Set variable $A to string containing the contents of file F.

jaq tries to load the file via memory mapping, taking constant time and allowing to load files that do not fit into memory. If this fails, jaq loads the file regularly, taking linear time. This is also what happens when using -Rs (–raw-input and –slurp) to load a file (as opposed to standard input).

Unlike jq, jaq does not verify that the file is valid UTF-8. That permits loading arbitrary binary files; these can be processed as byte strings via tobytes.

--args

Collect remaining positional arguments into $ARGS.positional.

If this option is given, then all further arguments that would have been interpreted as input files are instead collected into an array at $ARGS.positional.

For example, if the file input.json exists, then jaq '$ARGS.positional' input.json --args foo -n bar -- baz -c qux yields ["foo", "bar", "baz", "-c", "qux"]. Note that here, input.json and -n are not collected into the array — the former because it comes before --args, and the latter because it would not have been interpreted as input file. However, -c is collected into the array because it comes after --, which leads every argument after it to be interpreted as input file.

Miscellanea

-e, --exit-status

Use the last output value as exit status code.

This enables the use of the exit codes 1 and 4, which are not used otherwise.

jaq uses the following exit codes:

  • 0: No errors.
  • 1: The last output value is false or null.
  • 2: I/O or CLI error, e.g. file not found or unknown CLI option.
  • 3: Filter parse/compilation error.
  • 4: The filter did not yield any output.
  • 5: Any other error, e.g. call to the filter error.

The filters halt and halt_error can be used to exit jaq with arbitrary exit codes.

For example:

$ jaq -n empty; echo $?
0
$ jaq -n false >/dev/null; echo $?
0
$ jaq -en false >/dev/null; echo $?
1
$ jaq . does_not_exit.json 2>/dev/null; echo $?
2
$ jaq --foo 2>/dev/null; echo $?
2
$ jaq '+' 2>/dev/null; echo $?
3
$ jaq -en empty; echo $?
4
$ jaq -n error 2>/dev/null; echo $?
5
$ jaq -n 'halt(9)'; echo $?
9

-h, --help

Print summary of CLI options.

-V, --version

Print jaq version.

Unsupported

The following command-line options are supported by jq, but not by jaq:

  • --ascii-output, -a
  • --raw-output0
  • --unbuffered
  • --stream
  • --stream-errors
  • --seq
  • --jsonargs

Core language

The jq language is a lazy, functional streaming programming language originally designed by Stephen Dolan. The jq language is Turing-complete and can therefore be used to write any program that can be written in any other programming language. jq programs can be executed with several interpreters, including jq, gojq, fq, and jaq.

A program written in the jq language is called a jq program or filter. A filter is a function that takes an input value and yields a stream of output values.

The stream of output values can be infinite; for example, the jq filter repeat("Hi") yields an infinite sequence of strings "Hi".

The following sections document all filters with built-in syntax in jq. Examples are written like 1 + 2, true or false ⟼ 3 true, which means that running jaq -n '1 + 2, true or false' yields the outputs 3 and true.

An atomic filter is a filter that is not a binary operator. For example, the filters -1, map(.+1), and .[0] are all atomic, whereas the filters 1 + 2, explode | .[], and .[0] += 1 are all non-atomic. To turn a non-atomic filter f into an equivalent atomic filter, surround it with parentheses, i.e. (f).

Values

This section lists all potential values that jq filters can process, and how to produce them.

jaq extends the set of values that jq can process by byte strings and objects with non-string values. Where jq reads and writes values by default as JSON, jaq reads and writes values by default as XJON, which is an extension of JSON. See those sections for how jaq serialises values.

null

The filter null returns the null value.

The null value can be also obtained in various other ways, such as indexing a non-existing key in an array or object, e.g. [] | .[0] ⟼ null or {} | .a ⟼ null.

Booleans

The filters true and false return the boolean values true and false. Booleans can also be produced by comparison operations, e.g. 0 == 0 ⟼ true or [] == {} ⟼ false.

Every jq value can be mapped to a boolean value, namely null and false have the boolean value false, all other values have the boolean value true. This is important for filters such as if-then-else and //.

You can get the boolean value of a value by not | not; i.e. "" | not | not ⟼ true.

Numbers

Numbers are filters that return their corresponding value, e.g. 0 ⟼ 0, 3.14 ⟼ 3.14, and 2.99e6 ⟼ 2.99e6. Negative numbers can be constructed by applying the negation operator to a number, e.g. -1 ⟼ -1.

Internally, jaq distinguishes integers, floating-point numbers (floats), and decimal numbers:

  • A number without a dot (.) and without an exponent (e/E) is losslessly stored as integer. jaq can store and calculate with integers of arbitrary size, e.g. 340282366920938463463374607431768211456 (2^128).
  • Any non-integer number is stored initially as decimal number, which is a string representation of the number. That means that jaq losslessly preserves any number corresponding to the regular expression above, such as 1.0e500, if it occurs in a JSON file or jq filter.
  • When calculating with a decimal number, jaq converts it transparently to a 64-bit IEEE-754 floating-point number. For example, 1.0e500 + 1 ⟼ Infinity, because jaq converts 1.0e500 to the closest floating-point number, which is Infinity.

The rules of jaq are:

  • The sum, difference, product, and remainder of two integers is integer, e.g. 1 + 2 ⟼ 3.
  • Any other operation between two numbers yields a float, e.g. 10 / 2 ⟼ 5.0 and 1.0 + 2 ⟼ 3.0.

You can convert an integer to a floating-point number e.g. by adding 0.0, by multiplying with 1.0, or by dividing with 1. You can convert a floating-point number to an integer by round, floor, or ceil, e.g. 1.2 | floor, round, ceil ⟼ 1 1 2.

jq uses floats for any number, meaning that it does not distinguish integers from floats. Many operations in jaq, such as array indexing, check whether the passed numbers are indeed integer. The motivation behind this is to avoid rounding errors that may silently lead to wrong results. For example, [0, 1, 2] | .[1] ⟼ 1, whereas [0, 1, 2] | .[1.0000000000000001] yields an error in jaq as opposed to 1 in jq.

Furthermore, jq prints NaN as null; e.g. nan | tojson yields null. In contrast, jaq prints NaN as NaN; e.g. nan | tojson ⟼ "NaN". See the XJON section for details.

A number corresponds to the regular expression [0-9]+(\.[0-9]+)?([eE][+-]?[0-9]+)?.

Text strings

A text string is an array of bytes that can be constructed using the syntax "...". Here, ... may contain any UTF-8 characters in the range from U+0020 to U+10FFFF, excluding '"' and '\'. For example, "Hello 東京!" ⟼ "Hello 東京!".

Furthermore, ... may contain the following escape sequences:

  • \b, \f, \n, \r, or \t
  • \" or \\
  • \uHHHH, where HHHH is a hexadecimal number
  • \(f), where f is a jq filter (string interpolation)

Most characters below U+0020 can only be produced via a Unicode escape sequence, i.e. "NUL = \u0000" ⟼ "NUL = \u0000".

A string containing an interpolated filter, such as "...\(f)...", is equivalent to "..." + (f | tostring) + "...". For example:

  • "Hello \("Alice", "Bob") 😀!\n" ⟼ "Hello Alice 😀!\n" "Hello Bob 😀!\n"
  • 41 | "The successor of \(.) is \(.+1)." ⟼ "The successor of 41 is 42.".

A string is identifier-like if it matches the regular expression [a-zA-Z_][a-zA-Z_0-9]*.

Strings may be prefixed with @x, where x is an identifier; e.g. @uri. In such a case, the filters of interpolated strings are piped through @x instead of tostring, i.e. @x "...\(f)..." is equivalent to @x "..." + (f | @x) + @x "...". When "..." does not contain any interpolated filter, then @x "..." is equivalent to "...". For example, "-[]?" | @uri "https://gedenkt.at/jaq/?q=\(.)" ⟼ "https://gedenkt.at/jaq/?q=-%5B%5D%3F".

Byte strings

A byte string is an array of bytes that is not interpreted as (UTF-8) text. It can be produced from a text string via the filter tobytes, e.g. "Hello, world! 🙂" | tobytes ⟼ b"Hello, world! \xf0\x9f\x99\x82". Currently, there is no native syntax in jaq to produce a byte string directly.

Byte strings differ from text strings in a few regards; in particular, they can be indexed and sliced in constant time. That makes byte strings interesting e.g. for parsing binary formats.

For compatibility reasons, jaq considers both text strings and byte strings as strings. That means that "Hello" | isstring and (tobytes | isstring) ⟼ true. Furthermore, a text string and a byte string that contain equivalent bytes are considered equal, e.g. "Hello" | . == tobytes ⟼ true.

Byte strings do not exist in jq; however, they exist in fq.

To find out whether a value is a byte string, we can use:

def isbytes: isstring and try (.[0] | true) catch false;
"Hello" | ., tobytes | isbytes  ⟼  false true

This works because we can index byte strings, but not text strings.

Arrays

An array is a finite sequence of values.

An empty array can be constructed with the filter [], which is a short form for [empty] ⟼ []. An array of values can be constructed using the syntax [f], where f is a filter. The filter [f] passes its input to f and runs it, returning an array containing all outputs of f. If f throws an error, then [f] returns that error instead of an array.

This syntax allows constructing arrays such as [1, 2, 3] ⟼ [1, 2, 3]. Here, 1, 2, 3 is just a filter that yields three numbers, see concatenation. There is no dedicated array syntax [x1, ..., xn]. That means that you can use arbitrary filters in [f]; for example, [limit(3; repeat(0))] ⟼ [0, 0, 0]. You can use the input passed to [f] inside f; for example, we can write the previous filter equivalently as {count: 3, elem: 0} | [limit(.count; repeat(.elem))] ⟼ [0, 0, 0].

Objects

An object is a mapping from keys to values.

An empty object can be constructed with the filter {}. An object with a single key and value can be constructed by {(k): v}, where k and v are both filters. For example, {("a"): 1} ⟼ {"a": 1}.

If k is a string such as "a" or a variable such as $k, then you can omit the parentheses, e.g. {"a": 1} or {$k: 1}. If the string is identifier-like, then you can also omit the quotes, e.g. {a: 1}.

To construct an object with multiple key-value pairs, you can add multiple objects; e.g. {(k1): v1} + ... + {(kn): vn}. You can write this more compactly as {(k1): v1, ..., (kn): vn}. For example, {a: 1, b: 2} ⟼ {"a": 1, "b": 2}.

Instead of {k: .k} (see indexing), you can also write {k}; e.g., {a: 1, b: 2} | {a} ⟼ {"a": 1}. Instead of {k: $k}, you can also write {$k}; e.g., 1 as $k | {$k} ⟼ {"k": 1}.

The filter {(k): v} is equivalent to k as $k | v as $v | {$k: $v}. That means that when either k or v yield multiple output values, an object is produced for every output combination; for example, {("a", "b"): (1, 2)} ⟼ {"a": 1} {"a": 2} {"b": 1} {"b": 2}. Note that here, it is necessary to surround 1, 2 with parentheses, in order to clarify that , does not start a new key-value pair.

In jq, keys must be strings, whereas in jaq, keys can be arbitrary values. Because object keys can be arbitrary values in jaq, you can write e.g.:

{(0): 1, "2": 3, ([4]): 5, ({}): 6}  ⟼ 
{ 0 : 1, "2": 3,  [4] : 5,  {} : 6}

Note that in a jq filter, non-string keys must be surrounded with parentheses, whereas in XJON, parentheses must not be used.

This yields an error in jq.

Path operators

jq’s path operators retrieve parts of their input.

Indexing (.[x])

The indexing operator .[x] yields the x-th element of the input. For example:

  • [1, 2, 3] | .[1] ⟼ 2
  • {a: 1, b: 2} | .["a"] ⟼ 1

If there is no element at that index in the input, then this operator returns null. For example:

  • [1, 2, 3] | .[3] ⟼ null
  • {a: 1, b: 2} | .["c"] ⟼ null

This operator treats byte string input like an array of numbers in the range 0..256. For example, "@ABC" | tobytes | .[0] ⟼ 64, because the character '@' is encoded in UTF-8 as single byte 64 (0x40).

This operator yields an error either if the input is neither an array or an object, or if the input is an array and the index is not an integer.

For identifier-like strings like name, you can write .name as short form for .["name"]. For example, {a: 1} | .a ⟼ 1. This works also for strings that are the same as reserved keywords; e.g. {if: 1} | .if ⟼ 1.

For array input, you can also use negative numbers as index, which will be interpreted as index counting from the end of the array. For example, [1, 2, 3] | .[-1] ⟼ 3.

jq accepts floating-point numbers as array indices; e.g. [1, 2] | .[1.5] yields 2. In jaq, indexing an array with a floating-point number yields an error. The same applies to the slicing operator below.

Slicing (.[x:y])

The slicing operator .[x:y] yields a slice of a string or an array, from the x-th element to (excluding) the y-th element. For example, [1, 2, 3] | .[1:3] ⟼ [2, 3], and "Hello World!" | .[6:11] ⟼ "World".

When slicing a text string, the x-th element refers to the x-th UTF-8 character. For example, "老虎" | .[1:2] ⟼ "虎". On the other hand, when slicing a byte string, the x-th element refers to the x-th byte. For example, "Färber" | tobytes, . | .[2:] | [., length] ⟼ [b"\xa4rber", 5] ["rber", 4], because the letter ‘ä’ takes two bytes, but only one character.

Like the indexing operator, the slicing operator treats byte string input like an array of numbers and interprets negative indices as indices counting from the end.

The short form .[x:] creates a slice from the index x to the end, and the short form .[:y] creates a slice from the beginning to (excluding) the index x. For example, "Hello World!" | .[:5], .[6:] ⟼ "Hello" "World!".

Iterating (.[])

The iteration operator .[] yields all values v1 ... vn when given an array [v1, ..., vn] or an object {k1: v1, ..., kn: vn}. For example, [1, 2, 3] | .[] ⟼ 1 2 3 and {a: 1, b: 2} | .[] ⟼ 1 2.

It holds that .[] has the same effect as .[keys[]]. For example, [1, 2, 3] | .[keys[]] ⟼ 1 2 3.

Compound paths

For each of the filters in this section, you can employ any atomic filter instead of the leading . (identity). For example, you can write explode[] as short form for explode | .[].

You can also terminate any filter in this section with a ?, which silences errors from that operator. For example, .[] yields an error if the input is neither an array nor an object, but .[]? silences such errors, yielding no output instead.

You can chain together an arbitrary number of these operators; for example, .[0][]?[:-1] is the same as .[0] | .[]? | .[:-1].

When you combine the techniques above, be aware that the filters x and y in .[x] and .[x:y] are executed on the original input and are not influenced by error suppression with ?. That means that f[][x]?[y:z] is equivalent to f as $f | x as $x | y as $y | z as $z | $f | .[] | .[$x]? | .[$y:$z].

Nullary

A nullary filter does not take any arguments.

Identity (.)

The filter . yields its input as single output. For example, 1 | . ⟼ 1.

Recursion (..)

The filter .. yields its input and all recursively contained values. For example:

{"a": 1, "b": [2, ["3"]]} | ..  ⟼ 
{"a": 1, "b": [2, ["3"]]}
1
[2, ["3"]]
2
["3"]
"3"

Unary

A unary filter takes a single argument.

Negation (-)

The prefix operator -f runs the atomic filter f and negates its outputs. For example, -1 ⟼ -1 and -(1, 2) ⟼ -1 -2.

Error suppression (?)

The postfix operator f? runs the atomic filter f and returns all its outputs until (excluding) the first error. For example, (1, error, 2)? ⟼ 1.

This operator is equivalent to try f (see try-catch). For example, try (1, error, 2) ⟼ 1.

We can see that error suppression has a higher precedence than negation by try -[]? catch -1 ⟼ -1, which shows us that -[]? yields the same as -([]?), which is equivalent to -[] and yields an error. If negation would have a higher precedence, then -[]? would be equivalent to (-[])?; however, that filter yields no output, as we can see by (-[])? ⟼ .

Binary (complex)

A binary filter takes two arguments.

All binary operators that contain the characters | or = are right-associative. All other binary operators are left-associative. That means that for example, f | g | h is equivalent to f | (g | h), whereas f - g - h is equivalent to (f - g) - h.

This section lists all binary infix filters sorted by increasing precedence.

Composition (|)

The filter f | g runs f, and for every output y of f, it runs g with y as input and yields its outputs. For example, (1, 2) | (., 3) ⟼ 1 3 2 3.

If either f or g yields an error, then f | g yields that error, followed by nothing. For example, (1, 2) | (., error) yields the same as 1, error.

Variable binding (as $x |)

The filter f as $x | g runs f, and for every output x of f, it binds the value of x to the variable $x. Here, x is an identifier. It then runs g on the original input, replacing any reference to $x in g by the value x. For example, "Hello" | length as $x | . + " has length \($x)" ⟼ "Hello has length 5".

Variables can be shadowed, such as 0 as $x | (1 as $x | $x), $x ⟼ 1 0.

Concatenation (,)

The filter f, g yields the concatenation of the outputs of f and g. For example, 1, 2 ⟼ 1 2.

Plain assignment (=)

The filter f = g runs g with its input, and for every output y of g, it returns a copy of the input whose values at the positions given by f are replaced by y. For example:

[1, 2, 3] | .[0] = (length, 4)  ⟼ 
[3, 2, 3]
[4, 2, 3]

Update assignment (|=)

The filter f |= g returns a copy of its input where each value v at a position given by f is replaced by the output of g run with input v. For example, [1, 2, 3] | .[] |= .*2 ⟼ [2, 4, 6].

When g yields no outputs, then the value at the position is deleted; for example, [1, 2, 3] | .[0] |= empty ⟼ [2, 3] and {a: 1, b: 2} | .a |= empty ⟼ {"b": 2}.

When g yields multiple outputs, then it depends on the input type and on f whether more than one output of g is considered. For example, the following updates consider multiple outputs:

  • 1 | . |= (., .*2) ⟼ 1 2
  • [1, 2, 3] | .[ ] |= (., .*2) ⟼ [1, 2, 2, 4, 3, 6]

On the other hand, the following updates consider only the first output:

  • [1, 2, 3] | .[0] |= (., .*2) ⟼ [1, 2, 3]
  • {a: 1, b: 2} | .a |= (., .*2) ⟼ {"a": 1, "b": 2}

Arithmetic update assignment (+=, -=, …)

The filters f += g, f -= g, f *= g, f /= g, f %= g, and f //= g are short-hand forms for f = . + g, … For example:

[1, 2, 3] | .[0] += (length, 4)  ⟼ 
[4, 2, 3]
[5, 2, 3]

Alternation (//)

The filter f // g runs f and yields all its outputs whose boolean value is true; that is, all outputs that are neither null nor false. If f yields no such outputs, then this filter yields the outputs of g. For example:

  • (null, 1, false, 2) // (3, 4) ⟼ 1 2
  • (null, false) // (3, 4) ⟼ 3 4
  • empty // 3 ⟼ 3

Logic (or, and)

The filter f or g (disjunction) evaluates f and returns true if its boolean value is true, else it returns the boolean values of g. The filter f and g (conjunction) evaluates f and returns false if its boolean value is false, else it returns the boolean values of g.

For example:

(true, false) or (true, false)  ⟼ 
true
true
false
(true, false) and (true, false)  ⟼ 
true
false
false

The filter f and g has higher precedence than f or g.

We can see the higher precedence of and by false and true or true ⟼ true, which yields the same as (false and true) or true ⟼ true, but not the same as false and (true or true) ⟼ false.

To find this formula, I used the following program:

def bool: true, false;
{x: bool, y: bool, z: bool} | select(
  ((.x and  .y) or .z ) !=
  ( .x and (.y  or .z))
)  ⟼ 
{
  "x": false,
  "y": true,
  "z": true
}
{
  "x": false,
  "y": false,
  "z": true
}

It holds that f or g is equivalent to f as $x | $x or g; similar for and.

Binary (simple)

Every simple binary operator such as + in this section fulfills the property that f + g is equivalent to f as $x | g as $y | $x + $y. For example, (1, 2) + (1, 3) ⟼ 2 4 3 5 and (1, 2) * (1, 3) ⟼ 1 3 2 6. This property does not hold for the complex binary filters above.

In jq, a slightly different property holds, namely that f + g is equivalent to g as $y | f as $x | $x + $y. That means that in jq, (1, 2) + (1, 3) yields 2 3 4 5.

As a result, in jaq, (1,2) * (3,4) ⟼ 3 4 6 8 and {a: (1,2), b: (3,4)} | .a * .b ⟼ 3 4 6 8 yield the same outputs, whereas jq yields 3 6 4 8 for the former example.

Equality (==, !=)

The operator $x == $y returns true if the two values $x and $y are equal, else false. Similarly, $x != $y returns the negation of $x == $y, i.e. $x == $y | not.

Interesting cases include:

  • NaN does not equal any value, including itself; i.e. nan == nan ⟼ false
  • An integer i equals a float f if f is finite and i converted to a float is equal to f; i.e. 1 == 1.0 ⟼ true.
  • Arrays are equal if they have the same values; i.e. [1, 2, 3] == [1, 2, 3], but [3, 2, 1] != [1, 2, 3].
  • Objects are equal if they have the same keys and for every key, the associated value is equal; i.e. {a: 1, b: 2} == {b: 2, a: 1} ⟼ true

There are values that are equal, yet yield different results when fed to the same filter. For example:

{a: 1, b: 2} as $x |
{b: 2, a: 1} as $y |
$x == $y, ($x | tojson) == ($y | tojson)  ⟼ 
true false

Ordering (<, >, <=, >=)

The operator $x < $y returns true if $x is smaller than $y. Similarly:

  • $x > $y is equivalent to $y < $x,
  • $x <= $y is equivalent to $x < $y or $x == $y, and
  • $x >= $y is equivalent to $x > $y or $x == $y.

Values are ordered as follows:

  • null
  • Booleans: false < true ⟼ true
  • Numbers:
    • NaN is smaller than any other number, including itself; i.e. nan < nan ⟼ true
    • -Infinity, e.g. -infinite < -99999999999999999999999999 ⟼ true
    • Finite numbers, e.g. 0 < 1 ⟼ true
    • Infinity, e.g. infinite > 99999999999999999999999999 ⟼ true
  • Strings: lexicographic ordering by underlying bytes, e.g. "Hello" < "Hello World" ⟼ true and "@B" < "A" ⟼ true.
  • Arrays: lexicographic ordering, e.g. [1, 2] < [1, 2, 3] ⟼ true and [0, 2] < [1] ⟼ true.
  • Objects: An object $x is smaller than an object $y either if:
    • the keys of $x are smaller than the keys of $y or
    • the keys of $x are equal to the keys of $y and the values of $x are smaller than the values of $y.

More precisely, an object $x is smaller than an object $y if:

($x | to_entries | sort_by(.key)) as $ex |
($y | to_entries | sort_by(.key)) as $ey |
[$ex[].key]   < [$ey[].key] or
[$ex[].key]  == [$ey[].key] and
[$ex[].value] < [$ey[].value]

Addition / subtraction (+, -)

The filter $x + $y adds two values as follows:

  • null + $x and $x + null yields $x.
  • Adding numbers yields their sum, which is integer if both numbers are integer, else a floating-point number. For example, 1 + 2 ⟼ 3 and 1 + 2.0 ⟼ 3.0.
  • Adding strings or arrays concatenates them, i.e. "Hello, " + "World!" ⟼ "Hello, World!" and [1, 2] + [3, 4] ⟼ [1, 2, 3, 4].
  • Adding objects yields their union. If a key is present in both objects, then the resulting object will contain the key with the value of the object on the right; i.e. {a: 1, b: 2} + {b: 3, c: 4} ⟼ {"a": 1, "b": 3, "c": 4}.
  • Adding anything else yields an error.

The filter $x - $y subtracts $y from $x as follows:

  • Subtracting numbers yields their difference, similar to addition.
  • Subtracting arrays yields the left array with all elements contained in the right array removed; i.e. [1, 2, 3, 4] - [2, 4] ⟼ [1, 3].
  • Subtracting anything else yields an error.

Multiplication / division (*, /)

The filter $x * $y multiplies two values as follows:

  • Multiplying numbers yields their product, similar to addition.
  • Multiplying a string with an integer $n yields the $n-fold concatenation of the string, i.e. "abc" * 3 ⟼ "abcabcabc". If $n <= 0, then this yields null, i.e. 0 * "abc" ⟼ null.
  • Multiplying two objects merges them recursively. In particular, $x * {k: v, ...} yields ($x + {k: $x[k] * v}) * {...} if both v and $x[k] are objects, else ($x + {k: v}) * {...}. For example, {a: {b: 0, c: 2}, e: 4} * {a: {b: 1, d: 3}, f: 5} ⟼ {"a": {"b": 1, "c": 2, "d": 3}, "e": 4, "f": 5}.
  • Multiplying anything else yields an error.

The filter $x / $y divides two values as follows:

  • Dividing a number by a number yields their quotient as floating-point number. To perform this operation, both arguments are first converted to floating-point numbers. For example, 1 / 2 ⟼ 0.5.
  • Dividing a string by a string splits $x by $y, yielding an array of strings. For example, "foobarfoobazfoo" / "foo" ⟼ ["", "bar", "baz", ""]. If $y is empty, then $x / $y yields an array with each character of the input as separate string. For example, "🧑‍🔬 is 🤔" / "" ⟼ ["🧑","‍","🔬"," ","i","s"," ","🤔"].

You can round-trip string division with join($y). For example:

  • "foobarfoobazfoo" / "foo" | join("foo") ⟼ "foobarfoobazfoo"
  • "🧑‍🔬 is 🤔" / "" | join("") ⟼ "🧑‍🔬 is 🤔"

In jq, division by 0 yields an error, whereas in jaq, n / 0 yields nan if n == 0, infinite if n > 0, and -infinite if n < 0. jaq’s behaviour is closer to the IEEE standard for floating-point arithmetic (IEEE 754).

Modulus (%)

The filter $x % $y calculates the modulus of two numbers, and fails for anything else. For example, 5 % 2 ⟼ 1. Any of the two numbers can also be a floating-point number; however, the result of this may be unexpected. For example, 5.1 % 2 ⟼ 1.0999999999999996 and 5.5 % 2 ⟼ 1.5.

Keywords

This section lists all filters that start with a reserved keyword.

if-then-else

The filter if p then f else g end runs the filter p with its input. For every output of p, if its boolean value is true, the outputs of f run with the original input are returned, else the outputs of g run with the original input are returned.

Examples:

  • if true then 0 else 1 end ⟼ 0
  • if [], null then 0 else 1 end ⟼ 0 1

There exists a longer form of this filter, namely if p1 then f1 elif p2 then f2 ... else g end. This is equivalent to if p1 then f1 else (if p2 then f2 else (... else g end) end) end.

When the else g part is omitted, it is equivalent to else .; for example, 0 | if false then .+1 end ⟼ 0.

try-catch

The filter try f catch g runs the atomic filter f with its input, and returns its outputs until (excluding) the first error. If f yields an error, then the atomic filter g is run with the error value as input, and its outputs are returned.

Examples:

  • try (1, error(42), 2) catch (. + 1) ⟼ 1 43
  • try (1, 2) catch (. + 1) ⟼ 1 2

A short form of this filter is try f, which is equivalent to try f catch empty as well as to f? (error suppression).

label-break

The filter label $x | f binds the label $x in f, runs f and yields its outputs. If the evaluation of f calls break $x, then the evaluation of label $x | f is stopped and returns no more outputs. For example, label $x | 1, break $x, 2 ⟼ 1.

Labels are distinct from variables, which means that 0 as $x | label $x | $x, break $x ⟼ 0.

Like variables, labels can be shadowed; e.g. label $x | 1, (label $x | 2, break $x, 3), 4, break $x, 5 ⟼ 1 2 4.

It is possible to break from a filter argument; e.g. def f(g): 1, g, 2; label $x | f(break $x) ⟼ 1.

reduce / foreach

The filters reduce xs as $x (init; update) and foreach xs as $x (init; update; project) both run the atomic filter xs on its input. Suppose that the outputs of xs are x1, …, xn. Then the filters are equivalent to:

reduce x1, ..., xn as $x (init; update) :=
init
| x1 as $x | update
| ...
| xn as $x | update
foreach x1, ..., xn as $x (init; update; project) :=
init |
( x1 as $x | update | project,
( ...
( xn as $x | update | project,
( empty ))...))

Here, both update and project have access to the current $x.

The filter foreach xs as $x (init; update) is equivalent to foreach xs as $x (init; update; .).

As example, we can calculate the sum and the cumulative sum using reduce and foreach, respectively:

  • reduce (1, 2, 3) as $x (0; . + $x) ⟼ 6
  • foreach (1, 2, 3) as $x (0; . + $x) ⟼ 1 3 6
  • foreach (1, 2, 3) as $x (0; . + $x; [$x, .]) ⟼ [1, 1] [2, 3] [3, 6]

Let us expand the first and the last example using the equivalences above to see what is calculated:

# reduce  (1, 2, 3) as $x (0; . + $x)
0
| 1 as $x | . + $x  # 1
| 2 as $x | . + $x  # 3
| 3 as $x | . + $x  # 6
 ⟼  6
# foreach (1, 2, 3) as $x (0; . + $x; [$x, .])
0 |
( 1 as $x | . + $x | [$x, .],  # [1, 1]
( 2 as $x | . + $x | [$x, .],  # [2, 3]
( 3 as $x | . + $x | [$x, .],  # [3, 6]
( empty ))))
 ⟼  [1, 1] [2, 3] [3, 6]

We can also reverse a list via [1, 2, 3] | reduce .[] as $x ([]; [$x] + .) ⟼ [3, 2, 1]. (However, note that this has quadratic runtime and is thus quite inefficient.)

Note that when xs yields no outputs, then reduce yields init, whereas foreach yields no output. For example:

  • reduce empty as $x (0; . + $x) ⟼ 0 and
  • foreach empty as $x (0; . + $x) ⟼ (no output).

The execution of reduce and foreach differs between jq and jaq when update yields multiple outputs. However, the precise behaviour of jq in that case is quite difficult to describe.

The interpretation of reduce/foreach in jaq has the following advantages over jq:

  • It deals very naturally with filters that yield multiple outputs. In contrast, jq discriminates outputs of f, because it recurses only on the last of them, although it outputs all of them.
  • It makes the implementation of reduce and foreach special cases of the same code, reducing the potential for bugs.

Consider the following example for an update yielding multiple values: foreach (5, 10) as $x (1; .+$x, -.) ⟼ 6 16 -6 -1 9 1 in jaq, whereas it yields 6 -1 9 1 in jq. We can see that both jq and jaq yield the values resulting from the first iteration (where $x is 5), namely 1 | 5 as $x | (.+$x, -.) ⟼ 6 -1. However, jq performs the second iteration (where $x is 10) only on the last value returned from the first iteration, namely -1, yielding the values -1 | 10 as $x | (.+$x, -.) ⟼ 9 1. jaq yields these values too, but it also performs the second iteration on all other values returned from the first iteration, namely 6, yielding the values 6 | 10 as $x | (.+$x, -.) ⟼ 16 -6.

def

The filter def x: f; g binds the filter f to a filter with the name x. Here, x is an identifier. The filter g can contain calls to the filter x, and any such calls will be replaced by the filter f.

For example, we can define a filter iter by def iter: .[]; and use it subsequently by def iter: .[]; [1, 2, 3] | iter ⟼ 1 2 3. This is equivalent to writing [1, 2, 3] | .[] ⟼ 1 2 3.

Definitions can be chained and nested. For example:

def foo:
  def bar: 1;
  def baz: 2;
  bar + baz;
foo  ⟼  3

Here, we chained the definitions of bar and baz. These definitions are only visible inside the definition of foo; that means that at the place where we call foo, we can use neither bar nor baz.

Definitions can be recursive, meaning that they call themselves. For example, def f: 0, f; f yields an infinite sequence of 0 values. An example of a non-terminating filter is def r: r; r. Finally, the following filter yields an infinite stream of integers starting from its input:

def ints_from: ., (. + 1 | ints_from);
1 | limit(3; ints_from)  ⟼  1 2 3

Definitions can also take arguments: The filter def x(x1; ...; xn): f; g binds the filter f to a filter with the name x and the arity n. Here, x1 to xn are identifiers that are the arguments of x, and f can contain references to these arguments. The filter g can contain calls of the shape x(g1; ...; gn), where g1 to gn are filters. Any such calls will be replaced by the filter f, where every argument xi is replaced by its corresponding filter gi.

For example, the filter map(f) in the standard library is defined by def map(f): [.[] | f]. We can use it via [1, 2, 3] | map( .+1) ⟼ [2, 3, 4], which is equivalent to [1, 2, 3] | [.[] | .+1] ⟼ [2, 3, 4].

We can use variables as arguments of definitions. For example, we can write def singleton($x): [$x] as short form of def singleton(x): x as $x | [$x]. Note that this is not the same as def singleton(x): [x]:

  • def singleton($x): [$x]; singleton(1, 2, 3) ⟼ [1] [2] [3]
  • def singleton( x): [ x]; singleton(1, 2, 3) ⟼ [1, 2, 3]

Arguments of definitions may capture variables, labels, and other definitions. For example:

def iter_map(f): .[] | f;
def double: .+.;
3 as $threshold  |
label $lbl |
[1, 2, 3] | iter_map(if . < $threshold then double else break $lbl end)  ⟼ 
 2  4

Here, the argument to iter_map captures the definition double, the variable $threshold, and the label $lbl.

We can always transform a definition with variable arguments to an equivalent definition without variable arguments. For that, suppose that $x is the rightmost variable argument in a definition def x(...; $x; ...): g. We can replace it by def x(...; x; ...): x as $x | g.

For example, consider the definition def f($x1; x2; $x3; x4): g. This is equivalent to def f(x1; x2; x3; x4): x1 as $x1 | x3 as $x3 | g.

Standard library

This section lists all named filters that are available by default in any jq module. These filters are also known as “builtins”.

Basic

error, error(f)

The filter error(f) throws an error for every output of f, with the output as payload. The filter error is equivalent to error(.). It is possible to use error(f) also on the left-hand side of assignments.

Examples:

  • error(empty) ⟼ (no output)
  • try error(41) catch (. + 1) ⟼ 42
  • try (41 | error ) catch (. + 1) ⟼ 42
  • try (error(41) = 1) catch (. + 1) ⟼ 42

length

The output of the filter length depends on its input type:

  • null: 0, i.e. null | length ⟼ 0
  • boolean: error, i.e. true | try length catch "fail" ⟼ "fail"
  • number: the absolute value of the number, i.e. -1, 1 | length ⟼ 1 1
  • text string: the number of characters, i.e. "ゼノギアス" | length ⟼ 5
  • byte string: the number of bytes, i.e. "ゼノギアス" | tobytes | length ⟼ 15
  • array: the number of values, i.e. [1, [2, 3], 4] | length ⟼ 3
  • object: the number of key-value pairs, i.e. {a: 0, b: 1} | length ⟼ 2

keys, keys_unsorted

The filter keys_unsorted yields an array that contains all keys if the input is an object or all indices if the input is an array. The filter keys is equivalent to keys_unsorted | sort. For example:

  • {c: 1, b: 2, a: 1} | keys_unsorted ⟼ ["c", "b", "a"]
  • {c: 1, b: 2, a: 1} | keys ⟼ ["a", "b", "c"]
  • [1, 2, 3] | keys_unsorted ⟼ [0, 1, 2]
  • [1, 2, 3] | keys ⟼ [0, 1, 2]

The filter keys_unsorted is equivalent to to_entries | .[] |= .key; for example, {a: 1, b: 2} | to_entries | .[] |= .key ⟼ ["a", "b"].

to_entries, from_entries, with_entries(f)

The filter to_entries takes as input an array or an object. It converts them to an array of objects of the shape {key: k, value: v}, such that .[k] on the original input yields v. For example:

  • [ 1, 2] | to_entries ⟼ [{"key": 0 , "value": 1}, {"key": 1 , "value": 2}]
  • {a: 1, b: 2} | to_entries ⟼ [{"key": "a", "value": 1}, {"key": "b", "value": 2}]

The filter from_entries constructs an object from an array of entries as given by to_entries. For example, {a: 1, b: 2} | to_entries | from_entries ⟼ {"a": 1, "b": 2}.

The filter with_entries(f) is equivalent to to_entries | map(f) | from_entries. For example, {"a": 1, "b": 2} | with_entries(.key |= ascii_upcase) ⟼ {"A": 1, "B": 2}

type

The filter type returns the type of its input value as string. For example:

  • null | type ⟼ "null"
  • false | type ⟼ "boolean"
  • 0 | type ⟼ "number"
  • "foo" | type ⟼ "string"
  • [1] | type ⟼ "array"
  • {} | type ⟼ "object"

Note that both text strings and byte strings both have the same type "string".

The type filter can be relatively slow to run; if you use it for simple comparisons such as type == "string", then you can also use filters like isstring.

Stream consumers

first, first(f), last, last(f)

The filter first(f) yields the first output of f if there is one, else nothing. For example, first(1, 2, 3) ⟼ 1 and first(empty) ⟼ (no output).

This filter stops evaluating f after the first output, meaning that it yields an output even if f yields infinitely many outputs. For example, first(repeat(0)) ⟼ 0 and first(1, def f: f; f) ⟼ 1.

Similarly, last(f) yields the last output of f if there is one, else nothing. If f yields an error, then the first error of f is yielded. For example, last(1, 2, 3) ⟼ 3, last(empty) ⟼ (no output), and try last(1, error("fail"), 3) catch . ⟼ "fail".

The filters first and last are short forms for first(.[]) and last(.[]), respectively. You can use them to retrieve the first/last element of an array, such as [1, 2, 3] | first, last ⟼ 1 3.

limit($n; f)

The filter limit($n; f) yields the first $n outputs of f. If $n <= 0, it yields no outputs. For example:

  • limit( 3; 1, 2 ) ⟼ 1 2
  • limit( 3; 1, 2, 3, 4) ⟼ 1 2 3
  • limit(-1; 1, 2 ) ⟼ (no output)

When $n < 0, jq yields an error instead.

skip($n; f)

The filter skip($n; f) yields all outputs after the first $n outputs of f. If $n <= 0, it yields all outputs of f. For example:

  • skip( 3; 1, 2 ) ⟼ (no output)
  • skip( 1; 1, 2, 3, 4) ⟼ 2 3 4
  • skip(-1; 1, 2 ) ⟼ 1 2

nth($i), nth($i; f)

The filter nth($i; f) yields the $i-th output of f. If f yields less than $i outputs, then this filter yields no output. For example:

  • nth(0; 1, 2, 3) ⟼ 1
  • nth(2; 1, 2, 3) ⟼ 3
  • nth(3; 1, 2, 3) ⟼ (no output)

The filter nth($i) is a short form for .[$i]; e.g. [1, 2, 3] | nth(0) ⟼ 1.

isempty(f)

The filter isempty(f) yields true if f yields no outputs, else false. If the first output of f is an error, isempty(f) yields that error instead. For example:

  • isempty( empty) ⟼ true
  • isempty(1, 2) ⟼ false
  • isempty(1, error) ⟼ false
  • try isempty(error, 1) catch "fail" ⟼ "fail"

any, any(p), any(f; p)

The filter any(f; p) yields true if any output of f | p has the boolean value true, else false. For example:

  • any(0, 1, 2; . == 42) ⟼ false
  • any(0, 1, 2; . == 42, . == 2) ⟼ true

The filters any(p) and any are short forms of any(.[]; p) and any(.), respectively. For example:

  • [1, 2, 3] | any(. % 2 == 0) ⟼ true
  • [false, true] | any ⟼ true

all, all(p), all(f; p)

The filter all(f; p) yields true if all outputs of f | p have the boolean value true, else false. For example:

  • all(0, 1, 2; . > 0) ⟼ false
  • all(0, 1, 2; . >= 0) ⟼ true

The filters all(p) and all are defined analogously to any(p) and any.

add, add(f)

The filter add(f) yields the sum of all elements yielded by f, or null if f yields no outputs. For example:

  • add(1, 2, 3) ⟼ 6
  • add(empty) ⟼ null

The filter add is a short form of add(.[]). You can use it to add all values of an array or object:

  • [1, 2, 3] | add ⟼ 6
  • {a: 1, b: 2} | add ⟼ 3

The filter add(f) is equivalent to reduce f as $x (null; . + $x). For example: reduce (1, 2, 3) as $x (null; . + $x) ⟼ 6.

Stream generators

empty

The filter empty yields no output.

This filter is defined as ([][] as $x | .). While a simpler filter like [][] also yields no outputs, this rather contrived-looking definition guarantees that empty can be used on the left-hand side of assignments. This comes into play when you use select(p), which uses empty under the hood.

range($upto), range($from; $upto), range($from; $upto; $step)

The filter range($from; $upto; $step) adds $step to $from until it exceeds $upto. For example:

  • range(1; 9; 2) ⟼ 1 3 5 7
  • range(1; 10; 2) ⟼ 1 3 5 7 9
  • range(9; 1; -2) ⟼ 9 7 5 3
  • range(9; 0; -2) ⟼ 9 7 5 3 1

The filter range($from; $upto) is a short form of range($from; $upto; 1) and the filter range($upto) is a short form of range(0; $upto). For example:

  • range(5) ⟼ 0 1 2 3 4
  • range(2; 5) ⟼ 2 3 4

In jq, range/1 and range/2 are more restrictive versions of range/3 that prohibit non-numeric arguments.

The filter is equivalent to:

def range($from; $to; $by): $from |
   if $by > 0 then while(.  < $to; . + $by)
 elif $by < 0 then while(.  > $to; . + $by)
   else            while(. != $to; . + $by)
   end;
range(1; 10; 2)  ⟼ 
1 3 5 7 9

For that reason, we can also use it with other values than numbers:

  • range(""; "aaa"; "a") ⟼ "" "a" "aa"
  • range([]; [1, 1, 1]; [1]) ⟼ [] [1] [1, 1]

This makes it quite easy to accidentally create an infinite sequence, e.g. by range(""; "b"; "a").

recurse, recurse(f)

The filter recurse(f) is equivalent to ., (f | recurse(f)). It first outputs its input, then runs f and recurse(f) on its outputs. This is useful to create infinite sequences. You can create a finite sequence by having f return empty, e.g. via select. For example:

  • 0 | limit(5; recurse(.+1)) ⟼ 0 1 2 3 4
  • 0 | recurse(.+1 | select(. < 5)) ⟼ 0 1 2 3 4

The filter recurse(f; p) is equivalent to recurse(f | select(p)). That means that it recurses only on outputs of f for which p yields a true output. For example:

  • 0 | recurse(.+1; . < 5) ⟼ 0 1 2 3 4

The filter recurse is a short form for recurse(.[]?). It returns all values recursively contained in the input, e.g. [1, [2], {a: 3}] | recurse ⟼ [1, [2], {"a": 3}] 1 [2] 2 {"a":3} 3.

We can write a Fibonacci generator as follows:

def fib: def next: [.[1], add]; [0, 1] | recurse(next)[1];
limit(5; fib)  ⟼  1 1 2 3 5

The next filter takes an array with the two previous values (.[0], .[1]), and yields a new array containing the second previous value (.[1]) as well as the sum of the previous two values (add).

repeat(f)

The filter repeat(f) runs f and yields its outputs over and over again. For example, 2 | limit(7; repeat(1, ., 3)) ⟼ 1 2 3 1 2 3 1.

This filter does not cache the outputs of f.

while(p; f), until(p; f)

The filter while(p; f) yields its input and applies f to it, while p returns true.

The filter until(p; f) applies f to its input until p returns true, at which point the filter returns its input.

Examples:

  • 0 | while(. <= 3; . + 1) ⟼ 0 1 2 3
  • 0 | until(. >= 3; . + 1) ⟼ 3

Booleans

The filters in this section classify their inputs or output them selectively.

toboolean

The filter toboolean is for booleans what tonumber is for numbers. For example:

  • true | toboolean ⟼ true
  • "true" | toboolean ⟼ true
  • "[true]" | try toboolean catch "fail" ⟼ "fail"

select(p)

The filter select(p) yields its input for each true output of p. For example, (0, 1, -1, 2, -2) | select(. >= 0) ⟼ 0 1 2.

nulls, booleans, numbers, strings, arrays, objects

Any of these filters yields its input if it is of the given type, else nothing. For example:

  • null, true, 0, "Hi!", [1, 2], {a: 1} | nulls ⟼ null
  • null, true, 0, "Hi!", [1, 2], {a: 1} | booleans ⟼ true
  • null, true, 0, "Hi!", [1, 2], {a: 1} | numbers ⟼ 0
  • null, true, 0, "Hi!", [1, 2], {a: 1} | strings ⟼ "Hi!"
  • null, true, 0, "Hi!", [1, 2], {a: 1} | arrays ⟼ [1, 2]
  • null, true, 0, "Hi!", [1, 2], {a: 1} | objects ⟼ {"a": 1}

These filters are equivalent to select(. == null), select(isboolean), …, select(isobject).

isboolean, isnumber, isstring, isarray, isobject

For every filter in this section, like isboolean, …, isobject, there is a corresponding filter in the previous section like, booleans, …, objects. Any of these filters yields true if its corresponding filter in the previous section yields some output, else false. For example:

  • null | isboolean ⟼ false, because null | booleans ⟼ (no output).
  • true | isboolean ⟼ true , because true | booleans ⟼ true.

jq does not implement these filters.

normals, finites

These filters return its input if isnormal or isfinite is true, respectively, else false.

values, iterables, scalars

The filter values yields its input if it is not null, else nothing.

If a value is either an array or an object, it is said to be iterable; otherwise, it is said to be scalar. (The iteration filter .[] succeeds on any iterable value, whereas it fails on any scalar.)

The filters iterables and scalars yield their input if it is iterable or scalar, respectively, else nothing.

Examples:

  • null, true, 0, "Hi!", [1, 2], {a: 1} | values ⟼ true 0 "Hi!" [1, 2] {"a": 1}
  • null, true, 0, "Hi!", [1, 2], {a: 1} | scalars ⟼ null true 0 "Hi!"
  • null, true, 0, "Hi!", [1, 2], {a: 1} | iterables ⟼ [1, 2] {"a": 1}

isnan, isinfinite, isfinite, isnormal

The filter isnan yields true if its input is NaN, else false. Note that it is not equivalent to . == nan, because nan is not equal to itself; see equality.

The filter isinfinite yields true if its input is either Infinity or -Infinity, else false.

The filter isfinite yields true if its input is a number that is not infinite, else false.

The filter isnormal yields true if its input is a number that is neither 0, NaN, nor infinite.

Examples:

  • nan | isnan, isinfinite, isfinite, isnormal ⟼ true false true false
  • infinite | isnan, isinfinite, isfinite, isnormal ⟼ false true false false
  • 0 | isnan, isinfinite, isfinite, isnormal ⟼ false false true false
  • 1 | isnan, isinfinite, isfinite, isnormal ⟼ false false true true

not

The filter not converts its input to its boolean value and returns its negation. For example:

  • true | not ⟼ false
  • false | not ⟼ true
  • "foo" | not ⟼ false

The filter not is equivalent to if . then false else true end. We can obtain the boolean value of a value by not | not; i.e. "" | not | not ⟼ true.

Membership

contains($x), inside($x)

The filter contains($x) yields true if any of the following conditions holds, else false.

  • The input is a string and $x is a substring of it.
  • The input is an array, $x is an array, and for every value v in $x, there is some value in the input that contains(v).
  • The input is an object, $x is an object, and for every key-value pair {k: v} in $x, there is a value for the key k in the input that contains(v).
  • The input is null, boolean, or a number, and $x is equal to the input.

Examples:

  • "Hello, world!" | contains("world") ⟼ true
  • [1, 2, 3] | contains([1, 3]) ⟼ true
  • [[1, 2], 3] | contains([3, [1]]) ⟼ true
  • {a: 1, b: 2} | contains({a: 1}) ⟼ true
  • {a: [1, 2]} | contains({a: [1]}) ⟼ true
  • 0 | contains(0) ⟼ true

The filter inside($x) is a flipped version of contains. For example, "world" | inside("Hello, world") ⟼ true.

The filter inside($x) is equivalent to . as $i | $x | contains($i).

indices($x)

The filter indices($x) yields the following:

  • If the input and $x are either both strings or both arrays, then it yields the indices i for which .[i:][:$x | length] == $x; e.g. "Alice, Bob, and Carol" | indices(", ") ⟼ [5, 10] and [0, 1, 2, 3, 1, 2, 3] | indices([1, 2]) ⟼ [1, 4].
  • If the input is an array and $x is not an array, then it yields the indices i for which .[i] == $x; e.g. [0, 1, 2, 3, 1, 2, 3] | indices(1) ⟼ [1, 4].
  • Otherwise, it yields an error.

This means that [[1, 2], 3] | indices([1, 2]) ⟼ [], because the input array has neither 1 nor 2, just [1, 2] and 3.

We can verify the property given above:

def verify($x): all(indices($x)[] as $i | .[$i:][:$x | length]; . == $x);
("Alice, Bob, and Carol" | verify(", "  )),
([0, 1, 2, 3, 1, 2, 3]   | verify([1, 2]))
 ⟼  true true

index($x), rindex($x)

The filters index($x) and rindex($x) are shorthand for indices($x) | first and indices($x) | last, respectively. For example:

  • "Alice, Bob, and Carol" | index(", ") ⟼ 5
  • [0, 1, 2, 3, 1, 2, 3] | rindex([1, 2]) ⟼ 4
  • "Hello world!" | index(", ") ⟼ null
  • [0, 1, 2] | rindex(3) ⟼ null

has($k), in($x)

The filter has($k) yields true if $k is among the keys of the input, else false. For example:

  • [1, 2, 3] | has( 0 , 3 ) ⟼ true false
  • {a: 1, b: 2} | has("a", "c") ⟼ true false

The filter in($x) is a flipped version of has, just like inside is a flipped version of contains. For example, "a" | in({a: 1, b: 2}) ⟼ true.

The filter has($k) is a more performant version of any(keys[]; . == $k). For example:

  • [1, 2, 3] | any(keys[]; . == 0 ) ⟼ true
  • {a: 1, b: 2} | any(keys[]; . == "a") ⟼ true

Updates

map(f), map_values(f)

The filter map(f) obtains all values of the input (via .[]), applies f to the values, and collects all results into an array. For example:

  • [1, 2, 3] | map(., .*2) ⟼ [1, 2, 2, 4, 3, 6].
  • {a: 1, b: 2} | map(., .*2) ⟼ [1, 2, 2, 4].
  • [1, 2, 3, 4] | map(select(. % 2 == 0)) ⟼ [2, 4].

The filter map_values(f) has the same effect as map(f) when the input is an array, but when the input is an object, map_values(f) also outputs an object. For example:

  • [1, 2, 3, 4] | map_values(.*2) ⟼ [2, 4, 6, 8]
  • {a: 1, b: 2} | map_values(.*2) ⟼ {"a": 2, "b": 4}

The filter map(f) is equivalent to [.[] | f] and the filter map_values(f) is equivalent to .[] |= f.

walk(f)

The filter walk(f) recursively updates its input with f. For example:

  • [[1, 2], [3]] | walk(numbers += 1) ⟼ [[2, 3], [4]]

In jaq, walk(f) is defined as .. |= f, whereas in jq, a definition similar to the following is used:

def walk(f): def rec: (.[]? |= rec) | f; rec;

This is a more efficient version of:

def walk(f): (.[]? |= walk(f)) | f;

We can show that in jaq, .. |= f and jq’s definition of walk(f) are equivalent. For this, we will use equivalences about pathless updates.

First, let us recall that .. |= f is equivalent to the following in jaq:

def rec_up: (.[]? | rec_up), .; rec_up |= f

We can thus unfold .. |= f:

..                   |= f  === (unfolding .. |= f)
rec_up               |= f  === (unfolding rec_up)
((.[]? | rec_up), .) |= f  === (because (l, r) |= f  ===  (l |= f) | (r |= f))
((.[]? | rec_up) |= f)  | (. |= f)  === (because . |= f  ===  f)
((.[]? | rec_up) |= f)  | f         === (because (l | r) |= f  ===  l |= (r |= f))
(.[]? |= (rec_up |= f)) | f         === (because rec_up |= f  ===  .. |= f)
(.[]? |= (.. |= f))     | f

We can see thus that .. |= f is equivalent to (.[]? |= (.. |= f)) | f. In the same sense, walk(f) is equivalent to (.[]? |= walk(f)) | f. We can conclude that .. |= f is equivalent to walk(f).

Note, however, that this equivalence does not hold in jq, because jq‘s updates work differently than jaq’s. The difference shows in particular when f returns multiple values.

del(f)

The filter del(f) deletes values at the locations given by f. It is equivalent to f |= empty. For example:

  • [1, 2, 3, 4] | del(.[] | select(. % 2 == 0)) ⟼ [1, 3]
  • [1, 2, 3] | del(.[1]) ⟼ [1, 3]
  • [1, 2, 3] | del(.[1:]) ⟼ [1]
  • {a: 1, b: 2} | del(.a) ⟼ {"b": 2}

Paths

path(f)

The filter path(f) records for each output of f its position in the input, and yields that position as a path. A path is an array that may contain indices or “slice objects”. The latter must contain a "start" and/or an "end" key with an integer value. For example:

  • [{a: 1}, {a: 2}] | path(.[].a) ⟼ [0, "a"] [1, "a"]
  • [1, 2, 3] | path(.[1:][:-1]) ⟼ [{"start": 1}, {"end": -1}]
  • [1, 2, 3] | path(.[1: -1]) ⟼ [{"start": 1, "end": -1}]

If f returns values that do not point to the input, then path(f) yields an error.

  • try path(0 ) catch "fail" ⟼ "fail"
  • try path(. as $x | $x) catch "fail" ⟼ "fail"

The filter path(f) is at the heart of how jq executes assignments such as p |= u, whereas jaq pursues a different, “pathless” approach. See the section on path-based updates for details on how path(f) is calculated.

paths, paths(p)

The filter paths yields the paths to all ancestor values of the input. It is equivalent to skip(1; path(..)).

The filter paths(p) yields the paths to all ancestor values of the input for which p yields true.

Examples:

  • [1, {a: 2}] | paths ⟼ [0] [1] [1, "a"]
  • [1, {a: 2}] | paths(isnumber) ⟼ [0] [1, "a"]
  • [1, {a: 2}] | paths(isobject) ⟼ [1]

We have that paths is equivalent to paths(true). Furthermore, paths(p) is equivalent to:

def paths(p): paths as $path | if getpath($path) | p then $path else empty end;
[1, {a: 2}] | paths(isnumber)  ⟼  [0] [1, "a"]

getpath($path)

The filter getpath($path) is the inverse filter for path(f). If path(f) yields no error, then getpath(path(f)) yields the same outputs as f. For example:

  • [{a: 1}, {a: 2}] | getpath([0, "a"], [1, "a"]) ⟼ 1 2
  • [1, 2, 3] | getpath([{"start": 1}, {"end": -1}]) ⟼ [2]
  • [1, 2, 3] | getpath([{"start": 1, "end": -1}]) ⟼ [2]

setpath($path; $v)

The filter setpath($path; $v) sets the value at $path to $v. It is equivalent to getpath($path) = $v. For example:

  • [[1, 2], [3, 4]] | setpath([1, 0]; 5) ⟼ [[1, 2], [5, 4]]

delpaths($paths)

The filter delpaths($paths) takes an array of paths and deletes all corresponding values in the order given by the array. For example:

  • [1, 2, 3] | delpaths([[0]]) ⟼ [2, 3]
  • [{a: 1, b: 2}, 3] | delpaths([[0, "b"], [1]]) ⟼ [{"a": 1}]

In jq, the $paths are interpreted relative to the original input value, whereas in jaq, they are interpreted relative to the current value. For example, [1, 2, 3] | delpaths([[0], [0]]) ⟼ [3] in jaq, because it first deletes the 0-th element 1 (yielding [2, 3]), then it deletes the 0-th element 2 (yielding [3]). Here, jq yields [2, 3], because the 0-th element always refers to the 0-th element of the original input, which is 1.

To use delpaths in an interoperable fashion, use $paths such that:

  • Paths to descendants come before paths to their ancestors.
  • Paths to array elements to the right come before paths to elements to the left.

For example, .. returns ancestors before descendants and array elements to the left before elements to the right. To use the output of .. in delpaths, it suffices to reverse the order of its outputs:

  • ["a", 0, "b", 1] | delpaths([path(.. | strings)] | reverse) ⟼ [0, 1 ] (right)
  • ["a", 0, "b", 1] | delpaths([path(.. | strings)] ) ⟼ [0, "b"] (wrong)

pick(f)

The filter pick(f) constructs an object that contains only those parts of the input that f returns. For example:

  • {a: {b: 1, c: 2}, d: 3} | pick(. ) ⟼ {"a": {"b": 1, "c": 2}, "d": 3}
  • {a: {b: 1, c: 2}, d: 3} | pick(.a.c ) ⟼ {"a": { "c": 2} }
  • {a: {b: 1, c: 2}, d: 3} | pick(.a.c, .d) ⟼ {"a": { "c": 2}, "d": 3}

In jq, pick(f) also supports paths to arrays; for example:

  • [1, 2, 3] | pick(.[0 ]) yields [1]
  • [1, 2, 3] | pick(.[1 ]) yields [null, 2]
  • [1, 2, 3] | pick(.[1:]) yields [2, 3]

While implementing this functionality in jaq, I found many corner cases that would have made the proper documentation of this filter very complex. I also found a few surprising behaviours in jq, e.g. that [1, 2, 3] | pick(.[-1]) yields an error. In the end, I decided to support only the simpler and well-understandable subset of paths to objects.

We have the property that pick(f, g) is equivalent to pick(f) * pick(g).

Numbers

tonumber

The filter tonumber takes as input either a number or a string. If the input is a number, it is returned unchanged; if the input is a string, it is parsed to a number, failing if this does not succeed. For example:

  • 42 | tonumber ⟼ 42
  • "42" | tonumber ⟼ 42
  • "[42]" | try tonumber catch "fail" ⟼ "fail"

infinite, nan

The filters infinite and nan yield the floating-point numbers Infinity and NaN:

  • infinite ⟼ Infinity
  • nan | isnan ⟼ true (we cannot test for equality with NaN here, because nan == nan ⟼ false)

We can also produce Infinity and NaN by:

  • 1 / 0 ⟼ Infinity
  • 0 / 0 | isnan ⟼ true

abs

The filter abs yields the negation of the input if the input is smaller than 0, else it yields the input. Note that due to this definition, strings, arrays, and objects are also returned unchanged, because they are larger than 0; see ordering.

Examples:

  • -2.0, -1, 0, 1, 2.0 | abs ⟼ 2.0 1 0 1 2.0
  • "foo", [], {} | abs ⟼ "foo" [] {}

floor, round, ceil

The filters floor, round and ceil round a number to its closest smaller integer, to its closest integer, and to its closest larger integer, respectively. For example:

  • 0.5 | floor, round, ceil ⟼ 0 1 1
  • 0.4 | floor, round, ceil ⟼ 0 0 1
  • 0.0 | floor, round, ceil ⟼ 0 0 0
  • -0.4 | floor, round, ceil ⟼ -1 0 0
  • -0.5 | floor, round, ceil ⟼ -1 -1 0
  • 0, 1 | round ⟼ 0 1
  • nan | round | isnan ⟼ true
  • infinite | round ⟼ Infinity

Math

jaq implements many mathematical functions via libm. If not specified otherwise, these filters take and return floating-point numbers.

Zero-argument filters:

  • acos
  • acosh
  • asin
  • asinh
  • atan
  • atanh
  • cbrt
  • cos
  • cosh
  • erf
  • erfc
  • exp
  • exp10
  • exp2
  • expm1
  • fabs
  • frexp, which returns pairs of (float, integer).
  • gamma
  • ilogb, which returns integers.
  • j0
  • j1
  • lgamma
  • log
  • log10
  • log1p
  • log2
  • logb
  • modf, which returns pairs of (float, float).
  • nearbyint
  • pow10
  • rint
  • significand
  • sin
  • sinh
  • sqrt
  • tan
  • tanh
  • tgamma
  • trunc
  • y0
  • y1

Two-argument filters that ignore .:

  • atan2
  • copysign
  • drem
  • fdim
  • fmax
  • fmin
  • fmod
  • hypot
  • jn, which takes an integer as first argument.
  • ldexp, which takes an integer as second argument.
  • nextafter
  • nexttoward
  • pow
  • remainder
  • scalb
  • scalbln, which takes as integer as second argument.
  • yn, which takes an integer as first argument.

Three-argument filters that ignore .:

  • fma

Examples:

  • (3.141592 | sin) < (-5 | pow10) ⟼ true establishes that sin(pi) is smaller than 10^-5.
  • fmax(2; 3) ⟼ 3.0
  • fma(2; 3; 4) ⟼ 10.0

Arrays

sort, sort_by(f)

The filter sort takes an array and sorts it. For example:

[true, 1, "abc", [1], {"a": 1}, null, false, 0, "ABC", [], {}] | sort  ⟼ 
[null, false, true, 0, 1, "ABC", "abc", [], [1], {}, {"a": 1}]

The filter sort_by(f) evaluates the filter f for each value in the input array, and sorts the values by the output of f. For example:

  • [0, 1, 2, 3] | sort_by(. % 2) ⟼ [0, 2, 1, 3]
  • [{a: 1, b: 2}, {a: 0, b: 3}] | sort_by(. ) ⟼ [{"a": 0, "b": 3}, {"a": 1, "b": 2}]
  • [{a: 1, b: 2}, {a: 0, b: 3}] | sort_by(.a) ⟼ [{"a": 0, "b": 3}, {"a": 1, "b": 2}]
  • [{a: 1, b: 2}, {a: 0, b: 3}] | sort_by(.b) ⟼ [{"a": 1, "b": 2}, {"a": 0, "b": 3}]

We have the following correspondences:

  • sort_by is equivalent to sort_by(.).
  • sort_by(f) is equivalent to sort_by([f]).

group_by(f)

The filter group_by(f) sorts its input array by f, then groups all values for which f produced identical outputs into the same array. For example:

  • ["foo", "", "bar", "quux", "baz"] | group_by(length) ⟼ [[""], ["foo", "bar", "baz"], ["quux"]]
  • [1, 2, 3, 4] | group_by(. % 2) ⟼ [[2, 4], [1, 3]]

unique, unique_by(f)

The filter unique_by(f) sorts its input array by f. If f produces the same outputs for multiple values in the array, only the first is kept. For example:

  • ["foo", "", "bar", "quux", "baz"] | unique_by(length) ⟼ ["", "foo", "quux"]
  • [1, 2, 3, 4] | unique_by(. % 2) ⟼ [2, 1]

The filter unique is equivalent to unique_by(.). It sorts the input array and removes any duplicates; e.g. [3, 2, 1, 3, 4] | unique ⟼ [1, 2, 3, 4].

min, max, min_by(f), max_by(f)

The filters min and max yield the smallest and largest element of an array, respectively. For example:

  • [1, 2, 3] | min ⟼ 1
  • [1, 2, 3] | max ⟼ 3

The filters min_by(f) and max_by(f) evaluate the filter f for each value in the input array, and yield the value for which f produces the smallest or largest output, respectively. For example:

  • ["abc", [1, 2], {"a": 1}] | min_by(length) ⟼ {"a": 1}
  • ["abc", [1, 2], {"a": 1}] | max_by(length) ⟼ "abc"

You can yield multiple values in f to break ties such as:

  • ["abc", [1, 2], {"a": 1, "b": 3}] | min_by(length, add?) ⟼ [1, 2]

We have the following correspondences:

  • min and max are equivalent to min_by(.) and max_by(.), respectively.
  • min_by(f) and max_by(f) are equivalent to min_by([f]) and max_by([f]), respectively.

reverse

The filter reverse takes an array and reverses it. For example, [1, 2, 3] | reverse ⟼ [3, 2, 1].

transpose

The filter transpose takes an array of arrays and yields its transposition.

Examples:

  • [[1 , 2, 3], [4, 5, 6]] | transpose ⟼ [[1, 4], [2, 5], [3, 6]]
  • [[1], [2, 3], [4, 5, 6]] | transpose ⟼ [[1, 2, 4], [null, 3, 5], [null, null, 6]]

More precisely, transpose yields an array $t that contains map(length) | max arrays of length length, such that $t[x][y] == .[y][x] for every x and y. We can verify this:

def verify: transpose as $t |
  ($t | length) == (map(length) | max),
  (range($t | length) as $x |
    ($t[$x] | length) == length,
    (range(length) as $y |
      $t[$x][$y] == .[$y][$x]
    )
  );
[[1,   2, 3], [4, 5, 6]],
[[1], [2, 3], [4, 5, 6]] | all(verify; .)  ⟼  true true

flatten, flatten($depth)

The filter flatten flattens input arrays, and the filter flatten($depth) flattens input arrays up to a certain depth. For example:

  • [1, [2, [3]], {a: [1, [2]]}] | flatten ⟼ [1, 2, 3 , {"a": [1, [2]]}]
  • [1, [2, [3]], {a: [1, [2]]}] | flatten(0) ⟼ [1, [2, [3]], {"a": [1, [2]]}]
  • [1, [2, [3]], {a: [1, [2]]}] | flatten(1) ⟼ [1, 2, [3] , {"a": [1, [2]]}]
  • [1, [2, [3]], {a: [1, [2]]}] | flatten(2) ⟼ [1, 2, 3 , {"a": [1, [2]]}]
  • null, true, 0, "Hi" | flatten ⟼ [null] [true] [0] ["Hi"]

Note that flatten does not impact arrays that are descendants of an object.

We can define flatten/0 and flatten/1 as:

def flattens    : if isarray             then .[] | flattens       end;
def flattens($d): if isarray and $d >= 0 then .[] | flattens($d-1) end;
def flatten    : [flattens    ];
def flatten($d): [flattens($d)];
[1, [2, [3]], {"a": [1, [2]]}] | flatten, flatten(0), flatten(1), flatten(2)  ⟼ 
[1,  2,  3  , {"a": [1, [2]]}]
[1, [2, [3]], {"a": [1, [2]]}]
[1,  2, [3] , {"a": [1, [2]]}]
[1,  2,  3  , {"a": [1, [2]]}]

bsearch($x)

The filter bsearch($x) takes a sorted array and performs a binary search for $x in the array. If the array contains $x, then the filter yields a positive $i such that .[$i] == $x; otherwise, the filter yields a negative $i such that inserting $x at the index -$i-1 in the array would preserve its the ordering.

Examples:

  • [0, 4, 8] | bsearch(8, 4, 0) ⟼ 2 1 0
  • [0, 4, 8] | bsearch(-2, 2, 6, 10) ⟼ -1 -2 -3 -4

If the input array is not sorted, then the output of this filter is meaningless.

We can verify the property above for negative $i. First, let us search for the value 6 that is not in the input array:

[0, 4, 8] | bsearch(6) ⟼ -3.

Now, the definition postulates that we can insert 6 at the index -$i-1, which is --3-1 ⟼ 2:

[0, 4, 8] | .[2:2] = [6] ⟼ [0, 4, 6, 8].

We can see that the resulting array is sorted.

Text strings

Unless stated otherwise, all filters in this section take a text string as input, and fail if the input is of any other type.

tostring

The filter tostring converts its input to a string. Its output depends on the type of its input:

  • Text strings are returned unchanged, i.e. "Hi" | tostring ⟼ "Hi".
  • Byte strings are interpreted as text string, i.e. "Hi" | tobytes | tostring ⟼ "Hi". This takes constant time.
  • Any other value is formatted compactly as if output by jaq -c. For example:
    • null | tostring ⟼ "null",
    • [0, 1] | tostring ⟼ "[0,1]",
    • {a: 1} | tostring ⟼ "{\"a\":1}".

String interpolation without an explicit format, such as "\(null) and \([0, 1])" ⟼ "null and [0,1]", behaves as if the output of every interpolated filter was piped through tostring.

utf8bytelength

The filter utf8bytelength yields the number of bytes of the input string. It is equivalent to tobytes | length, but different from length, which counts the number of characters.

For example, "ゼノギアス" | length, utf8bytelength, (tobytes | length) ⟼ 5 15 15.

startswith($s), endswith($s)

The filter startswith($s) yields true if the input string starts with the string $s, else false. Similar for endswith($s). For example:

  • "ゼノギアス" | startswith("ゼノ") ⟼ true
  • "ゼノギアス" | endswith("ギアス") ⟼ true

trim, ltrim, rtrim

The filters ltrim and rtrim remove from the input string all leading and trailing whitespace, respectively. Here, whitespace corresponds to the White_Space Unicode property. The filter trim is equivalent to ltrim | rtrim.

For example:

  • " \t\n Bonjour !   \r  " | ltrim ⟼ "Bonjour !   \r  "
  • " \t\n Bonjour !   \r  " | rtrim ⟼ " \t\n Bonjour !"

Note that there are a few quite unusual whitespace characters in this string.

ltrimstr($s), rtrimstr($s)

The filters ltrimstr($s) and rtrimstr($s) remove a single occurrence of $s from the start or the end of the string, respectively. If there is no such occurrence, the original string is returned. For example:

  • "foofoobar" | ltrimstr("foo") ⟼ "foobar"
  • "foobarbar" | rtrimstr("bar") ⟼ "foobar"

explode, implode

The filter explode yields an array containing a positive number for each valid Unicode code point of the input string and a negative number for each byte of each invalid Unicode code unit. For example:

"Dear ☀️" + (255 | tobytes | tostring) | explode  ⟼ 
[68,101,97,114,32,9728,65039,-255]

Here, we can see that "☀️" has turned into two code points, namely 9728 and 65039, whereas the invalid FF byte (= 255) has become -255.

The inverse filter of explode is implode:

[68,101,97,114,32,9728,65039, -255] | implode[:-1]  ⟼ 
"Dear ☀️"

(I omitted the FF byte at the end, because it is hard to save in a text editor.)

jq does not permit invalid code units in text strings, so it returns and accepts only natural numbers in explode and implode.

split($s)

This filter yields . / $s if its input . and $s are both strings, else it fails. See the section on division for details.

Note that there is also split($re; $flags) that splits by a regex.

join($s)

The filter join($s) takes as input an array [x1, ..., xn] and yields "" if the array is empty, otherwise "\(x1)" + $s + ... + $s + "\(xn)". That is, it concatenates the string representations of the array values interspersed with $s.

For example, to memorise the hierarchy of values in jq: ["null", "boolean", "number", "string", "array", "object"] | join(" < ") ⟼ "null < boolean < number < string < array < object".

Unlike jq, jaq does not map null values in the array to "", nor does it reject array or object values in the array.

ascii_downcase, ascii_upcase

The filters ascii_downcase and ascii_upcase convert all ASCII letters in the input string to their lower/upper case variants, respectively. For example:

  • "Der λΠ-Kalkül" | ascii_downcase ⟼ "der λΠ-kalkül"
  • "Der λΠ-Kalkül" | ascii_upcase ⟼ "DER λΠ-KALKüL"

Text string formatting

The filters in this section can be prefixed to strings to influence string interpolation. However, these filters can also be used outside of string interpolation. For example:

  • "1 + 2 * 3" | @uri "https://duckduckgo.com/?q=\(.)" ⟼ "https://duckduckgo.com/?q=1%20%2B%202%20%2A%203"
  • "1 + 2 * 3" | @uri ⟼ "1%20%2B%202%20%2A%203"

If not indicated otherwise, all filters in this section convert their input to a string with tostring and yield a single string output. As result, byte strings are treated like equivalent text strings; e.g. "Hello world!" | (tobytes | @base64) == @base64 ⟼ true.

Unlike in jq, you can define you own filters that start with @; for example, def @xml: @html;

@text

The filter @text is equivalent to tostring.

@json

The filter @json is equivalent to tojson.

@html, @htmld

The filter @html escapes a string so that it can be embedded in an HTML document. It replaces the following characters by HTML version:

Text < > & ' "
HTML &lt; &gt; &amp; &apos; &quot;

The filter @htmld reverses the effect of @html. For example:

"\"1 < 2 & 2 > 1\", that's what he said." | @html | ., @htmld  ⟼ 
"&quot;1 &lt; 2 &amp; 2 &gt; 1&quot;, that&apos;s what he said."
"\"1 < 2 & 2 > 1\", that's what he said."

jq does not support @htmld.

@base64, @base64d

The filter @base64 Base64-encodes its input. The filter @base64d reverses this operation. For example:

"Hello world!" | @base64 | ., @base64d  ⟼ 
"SGVsbG8gd29ybGQh"
"Hello world!"

In jaq, @base64d only succeeds if its whole input is a valid Base64 string. In contrast, jq accepts also strings where only a part is valid Base64, thus potentially leading to hidden data corruption. See #282 for a detailed discussion.

@uri, @urid

The filter @uri applies percent-encoding to encode arbitrary data in a uniform resource identifier (URI). The filter @urid reverses this encoding. For example:

  • "Hello, World!" | @uri ⟼ "Hello%2C%20World%21"
  • "Hello, World!" | @uri | @urid ⟼ "Hello, World!"

The HTML version of this manual is created with jaq, and @uri is used to encode the examples to create links to the jaq playground.

@sh

The filter @sh escapes data for constructing Unix command-line prompts. It performs different things depending on its input type:

  • null, boolean, number: Convert to string via tostring.
  • String: Replace occurrences of ' by '\'' and surround by '.
  • Array of scalars: Apply @sh to elements and join the results with " " as separator.
  • Fail for any other type of value.

Examples:

  • null, true, 1 | @sh ⟼ "null" "true" "1"
  • "It's green!" | @sh ⟼ "'It'\\''s green!'"
  • ["jaq", "-n", "--arg", "slogan", "It's green!", "$slogan"] | @sh ⟼ "'jaq' '-n' '--arg' 'slogan' 'It'\\''s green!' '$slogan'"

When copy-pasting the output of the previous example to your terminal, be sure to replace \\ by \ before. That is, you should end up with 'jaq' '-n' '--arg' 'slogan' 'It'\''s green!' '$slogan', which you can execute in good conscience. Unescaping can be avoided by running jaq with --raw-output, which does not escape \ with \\ in the first place.)

@csv

The filter @csv takes an array of scalars. It transforms each array element depending on its type:

  • null: Yield "".
  • Boolean, number: Transform it via tostring.
  • String: Replace occurrences of " by "" and surround by ".
  • Fail for any other type of value.

Finally, the filter joins the transformed elements with "," as separator. For example:

[true, null, false, 1, "Give me \"quotes\" or die"] | @csv  ⟼ 
"true,,false,1,\"Give me \"\"quotes\"\" or die\""

@tsv

The filter @tsv is similar to @csv, with the following differences:

  • In strings, the characters '\n', '\r', '\t', '\', and '\u0000' are replaced by strings "\\n", "\\r", "\\t", "\\\\", "\\0". (In raw output, these look like \n, \r, \t, \\, and \0.) The result is not surrounded by ".
  • The transformed elements are joined with \t (tabulator).

For example:

[true, null, false, 1, "Newline\nBackslash\\NUL\u0000"] | @tsv  ⟼ 
"true\t\tfalse\t1\tNewline\\nBackslash\\\\NUL\\0"

Byte strings

tobytes

The filter tobytes converts its input to a byte string. Its output depends on the type of input:

  • Natural number in the range 0 to 255 (0xFF): Yields a byte string with a single byte, e.g. 0 | tobytes ⟼ b"\x00".
  • Text string: Yields a byte string containing the underlying bytes of the text string, e.g. "Hi" | tobytes ⟼ b"Hi". This takes constant time.
  • Byte string: Yields the byte string unchanged.
  • Array: Converts each element to a byte string and yields their concatenation, e.g. [0, "Hi", [1, 255]] | tobytes ⟼ b"\x00Hi\x01\xFF". This is equivalent to map(tobytes) | add.
  • Anything else: Yields an error.

This is inspired by Erlang’s iolist_to_binary function.

jq does not have byte strings and thus does not have tobytes. fq, which has pioneered the tobytes filter, has both.

Serialisation & Deserialisation

The filters in this section read and write data in all formats supported by jaq. See the formats section for general information about how jaq interprets these formats.

jq supports only JSON, so it only implements the fromjson/tojson filters in this section.

fromjson, tojson

The filter fromjson takes a string as input, parses it to JSON values and yields them. For example:

"null true 0 \"foo\" [1] {\"foo\": 1}" | fromjson  ⟼ 
 null true 0  "foo"  [1] { "foo" : 1}

The filter tojson takes an arbitrary value and outputs a string containing its JSON representation. For example:

 [null,true,0, "foo" ,[1],{ "foo" :1}] | tojson  ⟼ 
"[null,true,0,\"foo\",[1],{\"foo\":1}]"

Note that tojson behaves similarly to tostring, but when its input is a string, it will also encode it to JSON, instead of returning it unchanged; i.e. "Hi" | tojson ⟼ "\"Hi\"".

In jq, fromjson yields an error when its input string contains multiple JSON values. Furthermore, in jaq, tojson | fromjson is equivalent to identity (.), whereas in jq, this is not the case, because nan | tojson | fromjson yields null, not nan.

fromyaml, toyaml

The filter fromyaml takes a text string and parses it as sequence of YAML documents. It can yield an arbitrary number of outputs. For example:

  • "---\n 1 \n...\n---\n 2 \n..." | fromyaml ⟼ 1 2
  • "" | fromyaml ⟼ (no output)

The filter toyaml always yields exactly one output, namely a text string containing the current value encoded as YAML.

fromcbor, tocbor

The filter fromcbor takes a byte string and parses it as sequence of CBOR values. For example:

[0, 1, 32, 64, 96, 128, 160,  244, 245, 246] | tobytes | fromcbor  ⟼ 
 0  1  -1 b""  ""   []   {} false true null

The filter tocbor always yields exactly one output, namely a byte string containing the current value encoded as CBOR.

fromtoml, totoml

The filter fromtoml takes a text string and parses it as a single TOML document. It yields always one output, because every TOML document encodes exactly one value. For example:

"
[database]\n
enabled = true\n
ports = [ 8000, 8001, 8002 ]\n
data = [ [\"delta\", \"phi\"], [3.14] ]\n
temp_targets = { cpu = 79.5, case = 72.0 }\n
" | fromtoml  ⟼ 
{"database": {
  "enabled": true,
  "ports": [8000,8001,8002],
  "data": [["delta", "phi"], [3.14]],
  "temp_targets": {"cpu": 79.5, "case": 72.0}
}}

The filter totoml fails if the input is not an object or the input contains any jaq value not supported by TOML. It converts invalid UTF-8 sequences like CBOR.

fromxml, toxml

The filter fromxml takes a text string and parses it as sequence of XML tags. For example:

"<?xml version='1.0'?>
<html>
<body xmlns='http://www.w3.org/1999/xhtml'>
Hello HTML!
</body>
</html>" | fromxml  ⟼ 
{"xmldecl": {"version": "1.0"}}
{"t": "html","c": [{
  "t": "body",
  "a": {"xmlns": "http://www.w3.org/1999/xhtml"},
  "c": ["Hello HTML!"]
}]}

The filter toxml takes data produced in the format by fromxml and yields a corresponding text string.

[
{"xmldecl": {"version": "1.0"}},
{"t": "html","c": [{
  "t": "body",
  "a": {"xmlns": "http://www.w3.org/1999/xhtml"},
  "c": ["Hello HTML!"]
}]}
] | toxml  ⟼ 
"<?xml version=\"1.0\"?>
<html>
<body xmlns=\"http://www.w3.org/1999/xhtml\">
Hello HTML!
</body>
</html>"

Date & Time

The filters in this section serve to convert between different time formats, such as:

  • Unix epoch: Marks a point in time by the number of seconds passed since January 1, 1970 00:00:00 (UTC). You can obtain the current time as Unix epoch via now.
  • ISO-8601 datetime string: Represents a date, a time, and a time zone as a string, such as "1970-01-01T00:00:00Z" (corresponding to Unix epoch 0).
  • “Broken down time” (BDT) array: Represents a date and a time as an array of the shape [year, month, day, hour, minute, second, weekday, yearday]. All components are integers, except for second, which may be a floating-point number. The month is counted from 0, the weekday is counted from Sunday (which is 0), and the yearday is the day in the year counted from 0. When a BDT array is used as input, only the first six components are considered.

You can convert between these representations via:

  • Unix epoch from/to ISO 8601: fromdate, todate
  • BDT to Unix epoch: mktime
  • Unix epoch to BDT: gmtime, localtime
  • Unix epoch or BDT from/to custom string: strptime, strftime, strflocaltime

As example, let us consider the time where the Hill Valley courthouse’s clock tower was struck by lightning, namely Saturday, November 12, 1955, at 10:04 p.m. PST. The corresponding date can be written in ISO 8601 as "1955-11-12T10:04:00-08:00". We can convert that to a Unix epoch and from there to a (UTC) BDT via:

"1955-11-12T22:04:00-08:00" | fromdate | gmtime  ⟼ 
[
  1955,
  10,
  13,
  6,
  4,
  0,
  0,
  316
]

We can infer that at this moment, in UTC it was November 13 (the BDT month is 10 and not 11, because BDT months are counted from 0), at 06:04:00. Furthermore, that day was a Sunday (because the weekday is 0), which was the 316-th day of the year (where 0 is the first day).

jq does not allow time zone information in ISO 8601 datetime strings.

fromdate, todate, fromdateiso8601, todateiso8601

These filters convert between Unix time and ISO-8601 timestamps.

For example, the Apollo 13 accident happened at 03:08 UTC on April 14, 1970. Its corresponding Unix time is "1970-04-14T03:08:00Z" | fromdate ⟼ 8910480. We can get back the ISO-8601 timestamp via 8910480 | todate ⟼ "1970-04-14T03:08:00Z".

These filters can handle floating-point numbers, e.g. 0.123456 | todate ⟼ "1970-01-01T00:00:00.123456Z" and "1970-01-01T00:00:00.123456Z" | fromdate ⟼ 0.123456. In particular, fromdate yields a floating-point number if the time cannot be represented losslessly as an integer.

The filters fromdateiso8601 and todateiso8601 are synonyms of fromdate and todate, respectively.

strftime($fmt), strflocaltime($fmt)

The filters strftime($fmt) and strflocaltime($fmt) take as input either a number that is interpreted as Unix epoch, or a BDT array. The filters yield a string representation of the input time, using the format $fmt.

If the input is a Unix epoch, both strftime and strflocaltime interpret it as UTC timestamp. If the input is a BDT array, then strftime interprets input as UTC and strflocaltime interprets input as user local time. strftime outputs the time as UTC and strflocaltime outputs the time as user local time.

For example, if the user is in the CET zone (+0100):

  • 0 | strftime("%T %z (%Z)") ⟼ "00:00:00 +0000 (UTC)"
  • [1970, 0, 1, 0, 0, 0] | strftime("%T %z (%Z)") ⟼ "00:00:00 +0000 (UTC)"
  • 0 | strflocaltime("%T %z (%Z)") yields "01:00:00 +0100 (CET)"
  • [1970, 0, 1, 0, 0, 0] | strflocaltime("%T %z (%Z)") yields "00:00:00 +0100 (CET)"

jq prints GMT instead of UTC in the examples above; however, GMT is not the same as UTC.

strptime($fmt)

The filter strptime($fmt) takes a string and parses it using the format $fmt, yielding a BDT array. If no time zone is inferred from the input (e.g. via %Z), it is assumed to be UTC. For example:

  • "1970-01-01 00:00:00" | strptime("%F %T") ⟼ [1970, 0, 1, 0, 0, 0, 4, 0]
  • "1970-01-01 00:00:00 Europe/Vienna" | strptime("%F %T %Q") ⟼ [1970, 0, 1, 0, 0, 0, 4, 0]

gmtime, localtime

The filters gmtime and localtime take a Unix epoch as input and yield a corresponding BDT array, containing the time in UTC (gmtime) or in the user local time (localtime).

For example, if the user is in the CET zone (+0100):

  • 0 | gmtime ⟼ [1970, 0, 1, 0, 0, 0, 4, 0]
  • 0 | localtime yields [1970, 0, 1, 1, 0, 0, 4, 0]

mktime

The filter mktime takes a BDT array that is assumed to be in UTC, and yields the corresponding Unix epoch. For example, [1970, 0, 1, 0, 0, 0] | mktime ⟼ 0.

Regular expressions

All the filters in this section, such as test, take a string as input and fail if they receive any other type of value. Furthermore, they all take two string arguments, namely the regular expression $re and the $flags that determine how the regular expression is interpreted. Omitting $flags is equivalent to passing "" as $flags. For example, test($re) is equivalent to test($re; "").

The supported flags are:

  • g: global search
  • n: ignore empty matches
  • i: case-insensitive
  • m: multi-line mode: ^ and $ match begin/end of line
  • s: single-line mode: allow . to match \n
  • l: greedy
  • x: extended mode: ignore whitespace and allow line comments (starting with #)

jaq uses the regex-lite crate to compile and run regular expressions (regexes). See the crate documentation for a description of the supported regex syntax.

test

The filter test yields true if some part of the input matches the regular expression, else false. For example:

  • "jaq v3.0" | test("v[0-9]+\\.[0-9]+") ⟼ true
  • "jaq V3.0" | test("v[0-9]+\\.[0-9]+") ⟼ false
  • "jaq V3.0" | test("v[0-9]+\\.[0-9]+"; "i") ⟼ true

scan

The filter scan yields all parts of the input that match the regular expression. For example:

  • "v2.0, v3.0" | scan("v[0-9]+\\.[0-9]+" ) ⟼ "v2.0"
  • "v2.0, v3.0" | scan("v[0-9]+\\.[0-9]+"; "g") ⟼ "v2.0" "v3.0"
  • "V2.0" | scan("v[0-9]+\\.[0-9]+") ⟼ (no output)

match

The filter match yields an object for every part of the input that matches the regular expression, containing:

  • "offset": the character index of the start of the match
  • "length": the number of characters of the match
  • "string": the contents of the match
  • "captures": an array with an object for every capture group, containing:
    • "offset",
    • "length",
    • "string": as above, but for the capture group instead of the whole match
    • "name": the name of the capture group if it has one, else this key is omitted

Example:

"v2.0, v3.0" | match("v(?<maj>[0-9]+)\\.([0-9]+)"; "g")  ⟼ 
{
  "offset": 0,
  "length": 4,
  "string": "v2.0",
  "captures": [
    {
      "offset": 1,
      "length": 1,
      "string": "2",
      "name": "maj"
    },
    {
      "offset": 3,
      "length": 1,
      "string": "0"
    }
  ]
}
{
  "offset": 6,
  "length": 4,
  "string": "v3.0",
  "captures": [
    {
      "offset": 7,
      "length": 1,
      "string": "3",
      "name": "maj"
    },
    {
      "offset": 9,
      "length": 1,
      "string": "0"
    }
  ]
}

capture

The filter capture yields an object for every part of the input that matches the regular expression, containing for each named capture group an entry with the group name as key and its matched string as value.

Example:

"v2.0, v3.0" | capture("v(?<maj>[0-9]+)\\.(?<min>[0-9]+)"; "g")  ⟼ 
{
  "maj": "2",
  "min": "0"
}
{
  "maj": "3",
  "min": "0"
}

split, splits

The filter split($re; $flags) yields an array of those parts of the input string that do not match the regular expression $re. For example:

  • "Here be\tspaces" | split("\\s" ; "") ⟼ ["Here", "be", "spaces"]
  • " More\n\n" | split("\\s+"; "") ⟼ ["", "More", ""]
  • "" | split("\\s" ; "") ⟼ [""]

Note that split($re; $flags) is equivalent to split($re; "g" + $flags), meaning that the string is split not only by the first match, but by all matches. Furthermore, unlike all other filters in this section, split($s) is not equivalent to split($s; $flags), because split($s) splits a string by a separator that is not interpreted as regular expression; see split.

The filter splits($re; $flags) yields the elements of the array yielded by split($re; $flags). For example, "Here be\tspaces" | splits("\\s") ⟼ "Here" "be" "spaces". The filter splits($re) is equivalent to splits($re; "").

sub, gsub

The filter sub($re; f; $flags) replaces all parts of the input string that match $re by the output of f. Here, f receives an object as returned by capture; that is, for every named capture group, it contains its name as key and its matched string as value.

For example:

"Mr. 高橋 & Mrs. 嵯峨" | sub("(?<title>(Mr|Ms|Mrs)\\.) (?<name>\\S+)"; "\(.name) (\(.title))"; "g")  ⟼ 
"高橋 (Mr.) & 嵯峨 (Mrs.)"

When the filter f yields multiple outputs, then all potential combinations are output. For example:

"Thanks, fine." | sub("(?<word>\\w+)"; .word, (.word | ascii_upcase); "g")  ⟼ 
"Thanks, fine."
"Thanks, FINE."
"THANKS, fine."
"THANKS, FINE."

We have following short forms:

  • The filter gsub($re; f; $flags) is equivalent to sub($re; f; "g" + $flags).
  • The filter gsub($re; f) is equivalent to gsub($re; f; "").
  • The filter sub($re; f) is equivalent to sub($re; f; "").

I/O

This section contains filters that interact with the system. These filters may yield different outputs when given equal inputs.

input, inputs

The filter inputs yields all the inputs in the current input file. For example, jaq -n '[inputs]' <<< 1 2 3 yields [1, 2, 3]. This can be useful to fold over large (potentially infinite) amounts of values; for example, to create a cumulative sum over all input integers, you can use jaq -n 'foreach inputs as $x (0; .+$x)'.

The filter input yields the next input in the current input file.

When there is no more input value left, in jq, input yields an error, whereas in jaq, it yields no output value. That is, in jaq, input is equivalent to first(inputs).

Both input and inputs have a side effect, i.e. they advance the input stream. That means that unlike most jq filters, inputs is not referentially transparent. It is advised to use it sparingly and with caution, lest you are devoured by the evil dragons of evaluation order.

debug, debug(f)

The filter debug(f) prints a debug message for every output of f to the standard error stream (stderr), then yields its input. For example, the filter 0 | debug(1, 2) yields the following output on the command-line:

["DEBUG:",1]
["DEBUG:",2]
0

The filter debug is equivalent to debug(.).

stderr

The filter stderr prints its input to the standard error stream in raw and compact mode without newline. It then yields its input.

halt, halt($exit_code)

The filter halt($exit_code) terminates jaq with the given exit code.

The filter halt terminates jaq with exit code 0.

jq does not implement halt($exit_code), only halt.

halt_error, halt_error($exit_code)

The filter halt_error($exit_code) prints its input via stderr. It then quits the jaq process with the given exit code.

# jaq -n '"Hi!\n" | halt_error(42)'
Hi!
# echo $?
42

The filter halt_error is equivalent to halt_error(5).

jq prints a newline after the input if it is not a string.

now

This filter yields the Unix epoch as floating-point number.

$ENV, env

The variable $ENV holds an object that contains an entry for every environment variable, where the key is the name of the variable and the value is its value. For example, {"EDITOR": "vim", "SHELL": "/usr/bin/bash"}.

The filter env is equivalent to $ENV.

Unsupported

This section lists filters present in jq, but not in jaq.

jaq supports none of jq’s SQL-style operators, mostly for aesthetic reasons (uppercase-names) and because jq is not SQL:

  • INDEX
  • JOIN
  • IN

jaq does not support jq’s --stream option; therefore, it also does not implement the related filters:

  • truncate_stream
  • fromstream
  • tostream

Advanced features

Assignments

jq allows for assignments of the form p |= f, where p is an arbitrary filter. This makes assignments in jq uniquely powerful compared to other languages. For example, a program from the jq manual that blew my mind was the following:

(.posts[] | select(.author == "stedolan") | .comments) += ["terrible."]

This iterates over all posts, selects those whose author is “stedolan”, takes its comments, and adds a not very flattering comment to it. (This does not reflect my opinion about Stephen Dolan — I think that he did a great job creating jq.)

jaq and jq pursue different approaches to execute assignments:

  • Path-based: In jq, an assignment p |= f constructs paths to all values that match p and applies the filter f to these values.
  • Pathless: In jaq, an assignment p |= f is transformed to a different filter that does not construct any paths.

For example, consider the update [1, 2, 3] | .[] |= .+1 ⟼ [2, 3, 4]: When jq executes this, it calculates [1, 2, 3] | path(.[]) ⟼ [0] [1] [2] and applies .+1 on each value at these paths. On the other hand, jaq transforms the update to [1, 2, 3] | [.[] | .+1] ⟼ [2, 3, 4] — the assignment does not involve any path construction.

Fortunately, like in the example above, in most cases, the result of the both approaches is the same. The following sections explain the two approaches in more detail, and how to write updates that behave the same in both jq and jaq.

Path-based

The path-based update model used by jq executes p |= u by first collecting the paths corresponding to p, then updating the input at these paths by u. We can approximate this behaviour by getpath(path(p)) |= u — the actual jq update behaviour is much more complex, only sparely documented, and has changed in backwards-incompatible ways between minor versions.

We have a few equivalences for path(f):

p path(p)
. []
.[] keys_unsorted[] | [.]
.[$i] [$i]
.[$i:$j] [{start: $i, end: $j}]
f, g path(f), path(g)
f | g path(f) as $p | $p + (getpath($p) | path(g))
f as $x | g f as $x | path(g)
if $p then f else g end if $p then path(f) else path(g) end

Examples:

  • true | path(.) ⟼ []
  • [1, 2, 3] | path(.[]) ⟼ [0] [1] [2]
  • [1, 2, 3] | keys_unsorted[] | [.] ⟼ [0] [1] [2]
  • {a: 1, b: 2} | path(.[]) ⟼ ["a"] ["b"]
  • {a: 1, b: 2} | keys_unsorted[] | [.] ⟼ ["a"] ["b"]
  • [1, 2, 3] | path(.[0]) ⟼ [0]
  • [1, 2, 3] | path(.[1:-1]) ⟼ [{"start": 1, "end": -1}]
  • {a: 1, b: 2} | path(.a, .b) ⟼ ["a"] ["b"]
  • {a: 1, b: 2} | path(.a), path(.b) ⟼ ["a"] ["b"]
  • [[1], [2]] | path(.[][]) ⟼ [0, 0] [1, 0]
  • [[1], [2]] | path(.[]) as $p | $p + (getpath($p) | path(.[])) ⟼ [0, 0] [1, 0]
  • [1, 2, 3] | path(0, 2 as $x | .[$x]) ⟼ [0] [2]
  • [1, 2, 3] | 0, 2 as $x | path(.[$x]) ⟼ [0] [2]

The filters reduce / foreach are both defined in jaq in terms of simpler filters f that path(f) can evaluate. Therefore, in jaq, you can use reduce / foreach inside path(f), as well as on the left-hand side of updates. jq does not support this.

Pathless

The pathless update model that is used by jaq reduces updates p |= u to simpler expressions, depending on p. It yields the same results as path-based updates in most common cases, while having the following advantages:

  • It does not need to construct paths, resulting in higher performance.
  • It considers multiple outputs by u where possible, whereas path-based updates consider at most one output. For example, 0 | (., .) |= (., .+1) ⟼ 0 1 1 2 in jaq, whereas it yields only 0 in jq. However, {a: 1} | .a |= (2, 3) ⟼ {"a": 2} in both jaq and jq, because an object can only associate a single value with any given key, so we cannot use multiple outputs in a meaningful way here.
  • It avoids iterator invalidation problems that path-based updates are prone to.

However, pathless updates do not support a few filters on the left-hand side of updates that path-based updates support, such as:

For example, the following filters all yield an error in jaq:

  • [1, 2, 3] | try .[] -= 1 yields [0, 1, 2] in jq.
  • [1, 2, 3] | first( .[]) -= 1 yields [0, 2, 3] in jq.
  • [1, 2, 3] | limit(2; .[]) -= 1 yields [0, 1, 3] in jq.
  • [1, 2, 3] | skip (1; .[]) -= 1 yields [1, 1, 2] in jq.
  • [1, 2, 3] | last ( .[]) -= 1 yields an error in jq.

In such cases, you can fall back to path-based updates in jaq by writing getpath(path(p)) |= u instead of p |= u. For example, the following filters yield the same outputs in jaq and jq:

  • [1, 2, 3] | getpath(path(try .[] )) -= 1 ⟼ [0, 1, 2]
  • [1, 2, 3] | getpath(path(first( .[]))) -= 1 ⟼ [0, 2, 3]
  • [1, 2, 3] | getpath(path(limit(2; .[]))) -= 1 ⟼ [0, 1, 3]
  • [1, 2, 3] | getpath(path(skip (1; .[]))) -= 1 ⟼ [1, 1, 2]
  • [1, 2, 3] | getpath(path(last ( .[]))) -= 1 ⟼ [1, 2, 2] (this example yields an error in jq, whereas it works in jaq)

The following table shows how jaq executes an update p |= u. In this table, the case for f as $x | g assumes that f yields single outputs f1, …, fn.

p p |= u
. u
.. def rec_up: (.[]? | rec_up), .; rec_up |= u
(f | g) f |= (g |= u)
(f , g) f |= u | g |= u
f as $x | g (f1 as $x | g) |= u | ... | (fn as $x | g) |= u
f // g if first(f // false) then f |= u else g |= u
if $p then f else g end if $p then f |= u else g |= u end
.[] iter_upd( u; error)
.[$i] index_upd($i; u; error)
.[$i:$j] slice_upd($i; $j; u; error)
.[]? iter_upd( u; .)
.[$i]? index_upd($i; u; .)
.[$i:$j]? slice_upd($i; $j; u; .)

It follows from the table that empty |= f is equivalent to . (identity). We now give definitions for iter_upd, index_upd, and slice_upd:

# .[] |= u
def iter_upd(u; fail):
    if isarray  then [.[] | u]
  elif isobject then with_entries(.value |= u)
  else fail end;
all(
([1, 2, 3]    | iter_upd(.+1             ; .) == [2, 3, 4]),
([1, 2, 3]    | iter_upd(.+1,.           ; .) == [2, 1, 3, 2, 4, 3]),
([1, 2, 3]    | iter_upd(select(.%2 == 1); .) == [1,    3]),
({a: 1, b: 2} | iter_upd(.+1             ; .) == {"a": 2, "b": 3}),
({a: 1, b: 2} | iter_upd(.+1,.           ; .) == {"a": 2, "b": 3})
; .)  ⟼  true
# .[$i] |= u
def index_upd($i; u; fail):
    if isarray then
        if 0 <= $i and $i < length then .[:$i] + [.[$i] | first(u)] + .[$i+1:]
      elif -length <= $i and $i < 0 then index_upd(length + $i; u; fail)
      else fail end
  elif isobject then
        if has($i) then with_entries(if .key == $i then {key, value: first(.value | u)} end)
      else . + ([{key: $i, value: first(null | u)}] | from_entries) end
  else fail end;
all(
([1, 2, 3]    | index_upd( 0 ; .+1  ; .) == [2, 2, 3]),
([1, 2, 3]    | index_upd( 0 ; .+1,.; .) == [2, 2, 3]),
([1, 2, 3]    | index_upd( 0 ; empty; .) == [   2, 3]),
([1, 2, 3]    | index_upd(-1 ; .+1  ; .) == [1, 2, 4]),
([1, 2, 3]    | index_upd(-3 ; .+1  ; .) == [2, 2, 3]),
({a: 1, b: 2} | index_upd("a"; .+1  ; .) == {"a": 2, "b": 2}),
({a: 1, b: 2} | index_upd("a"; .+1,.; .) == {"a": 2, "b": 2}),
({a: 1, b: 2} | index_upd("a"; empty; .) == {        "b": 2})
; .)  ⟼  true
# .[$i:$j] |= u
def slice_upd($i; $j; u; fail):
  first(.[:$i] + (.[$i:$j] | u) + .[$j:]) // .[:$i] + .[$j:];
all(
([1, 2, 3, 4] | slice_upd(1; -1; map(.+1)    ; .) == [1, 3, 4, 4]),
([1, 2, 3, 4] | slice_upd(1; -1; map(.+1),.  ; .) == [1, 3, 4, 4]),
([1, 2, 3, 4] | slice_upd(1; -1; empty       ; .) == [1,       4]),
("abcd"       | slice_upd(1; -1; ascii_upcase; .) == "aBCd"      )
; .)  ⟼  true

In jq, . |= empty yields null for any input, whereas jaq yields no output. Similarily, in jq, . |= (., .) yields its input once, whereas jaq yields its input twice.

In jq, [0, 1] | .[3] = 3 yields [0, 1, null, 3]; that is, jq fills up the list with nulls if we update beyond its size. In contrast, jaq fails with an out-of-bounds error in such a case.

In jq, null | .a = 1 yields {"a": 1} and null | .[0] = 1 yields [1], meaning that jq treats null as empty array or object when updating it with a string or integer index. Because jaq supports non-string object keys, this is ambiguous, because it is not clear whether null | .[0] = 1 should yield {0: 1} or [1]. For that reason, jaq yields an error on updates of null with any kind of index.

Patterns

The filter f as $x | g binds the outputs of f to a variable $x. At the place of $x, we can use a pattern to destructure the input into multiple variables.

Consider the following filter:

[ 1, {a:  2}] |
.[0]   as $x  |
.[1].a as $y  |
$x, $y  ⟼ 
 1   2

We can write this more compactly using a pattern as follows:

[ 1, {a:  2}] as
[$x, {a: $y}] |
 $x,     $y  ⟼ 
  1       2

Here, [$x, {a: $y}] is a pattern that is used to match the value [1, {a: 2}]. It binds 1 to $x and 2 to $y.

Similarly to object construction, {$x} is equivalent to {x: $x} also for object patterns. For example, we could have written the previous example equivalently as [1, {a: 2}] as [$x, {$a}] | $x, $a ⟼ 1 2

When a pattern does not exist in its input, its corresponding variables are bound to null:

  • [1, {b: 2}] as [$x, {$a}] | $x, $a ⟼ 1 null
  • [1 ] as [$x, $y] | $x, $y ⟼ 1 null
  • [1, {a: 2}] as [$x, [$y]] | $x, $y ⟼ 1 null

If the types of a pattern and its input do not match, an error is thrown:

  • try ([1] as {$a} | $a) catch "fail" ⟼ "fail"
  • try ( 1 as [$x] | $x) catch "fail" ⟼ "fail"

Patterns do not have to match their whole input:

  • [ 1, 2] as [$x] | $x ⟼ 1
  • {a: 1, b: 2} as {$a} | $a ⟼ 1

Patterns can be arbitrarily nested:

  • {a: [1, {b: [2]}]} as {a: [$x, {b: [$y]}]} | $x, $y ⟼ 1 2
  • [[[1]]] as [[[$x]]] | $x ⟼ 1
  • {a: {b: {c: 1}}} as {a: {b: {c: $x}}} | $x ⟼ 1

We can write any filter (f) as object key in a pattern:

{a: 1, b: 2, c: 3, d: 4} as
{("a",  "b"): $x, ("c", "d"): $y} |
[$x, $y]  ⟼ 
[1, 3] [1, 4] [2, 3] [2, 4]

We can also use patterns in reduce and foreach:

[{"a": 1, "b": 2}, {"a": 3, "b": 4}] |
foreach .[] as {("a", "b"): $x} ([]; . + [$x])  ⟼ 
[1]
[1,2]
[1,2,3]
[1,2,3,4]

jaq does not support jq’s destructuring alternative operator ?//.

A pattern p is either:

  • a variable $x,
  • an array pattern [p1, ..., pn] containing n patterns, or
  • an object pattern {e1, ..., en} containing n object entries. An object entry e is either:
    • a variable $x or
    • a key-value pair (k): p (where k is a filter and p is a pattern).

An array pattern [p1, ..., pn] is equivalent to an object pattern {(0): p1, ..., (n): pn}. Because of this, you can use object patterns with integer keys to destructure arrays, or array patterns to destructure objects with integer keys. Furthermore, you can also destructure byte strings:

  • [1, 2, 3] as {(0): $x, (2): $y} | $x, $y ⟼ 1 3
  • {(0): 1, (2): 3} as [$x, $_, $y] | $x, $y ⟼ 1 3
  • [1, 2, 3] | tobytes as [$x, $y] | $x, $y ⟼ 1 2

When using a filter (f) as object key in a pattern, then f is run with the input that was matched by its parent object pattern, not by the whole pattern. For example, [{"k": "a", "a": 1}] as [{(.k): $x}] | $x ⟼ 1

This is equivalent to:

[{"k": "a", "a": 1}] |
.[0]          as $p0 |
$p0[$p0 | .k] as $x  |
$x
 ⟼  1

Here, we can see that (.k) is run with the input $p0, which is the value that the parent object pattern of (.k), namely {(.k): $x}, is trying to match. Compare this with the following wrong transformation, where (.k) would be run with the input matched by the whole pattern:

[{"k": "a", "a": 1}] |
try (
.[0]     as $p0 |
$p0[.k]  as $x  | # fails here because .k is run with whole input
$x
) catch "fail"  ⟼  "fail"

Modules

jq allows dividing programs into multiple files that are called modules.

At the beginning of any jq module, there is a module header that consists of a (potentially empty) sequence of instructions listed in this section. The module header is then followed by a sequence of definitions. Finally, the main module that is called from the command-line interface (via --from-file or inline) must contain a single filter at the end, which is the filter that is executed.

All include/import instructions search for files as explained in the search paths section.

Module metadata

The instruction module meta; sets the metadata of the current module to the output of meta, where meta is a filter. This instruction may occur only at the beginning of the module header and only once.

For example, module "My module"; 1 ⟼ 1.

jaq ignores this instruction, whereas jq uses it to provides the output of meta via the modulemeta filter.

Module inclusion

The instructions include "mod"; and include "mod" meta; make all filters defined in the module mod.jq accessible in the current module.

For example, if foo.jq in the current working directory contains def bar: 1;, then jaq -L . -n 'include "foo"; bar' yields 1.

Module import

The instructions import "mod" as name; and import "mod" as name meta; make all definitions in the module mod.jq accessible in the current module with the prefix name::.

For example, if foo.jq in the current working directory contains def bar: 1;, then jaq -L . -n 'import "foo" as myfoo; myfoo::bar' yields 1.

Data import

The instructions import "data" as name; and import "data" as name meta; load all JSON values in data.json to an array, bind it to the variable $name, and make it accessible in the current module.

For example, if foo.json in the current working directory contains 1 2 3, then jaq -L . -n 'import "foo.json" as $myfoo; $myfoo' yields [1, 2, 3].

Search paths

An include/import instruction searches for its given file in the following directories, in the following order:

  1. The global search paths given via --library-path. They are interpreted relative to the current working directory.
  2. The local search paths given via metadata: When an include/import instruction has meta of the shape {..., search: ..., ...}, then the value at the key "search" sets the local search paths for that instruction to:
    • If the value is a string: Just that string.
    • If the value is an array: All strings in the array.
    • Otherwise: Nothing.
    If the module containing the include/import instruction is a file, then these paths are interpreted relative to the parent directory of that module. Otherwise, these paths are interpreted relative to the current working directory. (That is the case if the module containing the include/import instruction is given inline on the command-line.)

Every global and local search path is substituted as follows:

  • If it starts with ~, then ~ is substituted with the user’s home directory, given by the environment variable HOME on Linux and USERPROFILE on Windows.
  • If it starts with $ORIGIN, then $ORIGIN is substituted by the directory in which the jaq executable resides.

For example, jaq -L ~/foo -L bar 'include "decode" {search: ["baz", "$ORIGIN/quux"]}; 1' searches for the file decode.jq at the following paths in the given order:

  1. ~/foo/decode.jq (where ~ is substituted by the user’s home directory)
  2. ./bar/decode.jq
  3. ./baz/decode.jq
  4. $ORIGIN/quux/decode.jq (where $ORIGIN is substituted by the parent directory of the jaq executable)

The first path that corresponds to an existing file is taken.

Now, suppose that decode.jq contains an instruction import "binary" as $binary {search: "."}. This searches for binary.json at the following paths:

  1. ~/foo/binary.json
  2. ./bar/binary.json (relative to the current working directory)
  3. ./binary.json (relative to the parent directory of decode.jq)

If a file to load has been given without extension, such as decode and binary above, then jaq adds an extension (.jq for modules or .json for data). jq adds an extension unconditionally; that is, even if an extension has been given as part of the file name, jq adds an extension.

jaq’s behaviour is motivated by allowing instructions like import "binary.cbor" as $binary in the future. Here, unconditionally adding the .json extension would be counterproductive.

Comments

A comment starts with # and ends with the first newline that is not preceded by an uneven number of backslashes (\). For example:

[
  1,
  # comment
  2,
  # comment \\
  3,
  # comment \
    comment
  4,
  # comment \\\
    comment \
    comment
  5
]

This is equivalent to [1, 2, 3, 4, 5].

Formats

jaq supports reading and writing several data formats. This section describes these data formats.

You can load and write data in these formats using either:

The command-line options --from and --to always yield better or equal performance than the corresponding filters.

JSON

JSON (JavaScript Object Notation) is specified in RFC 8259.

jaq can read all valid JSON values; however, like jq, it also accepts certain values that are invalid JSON. This set of values is documented in the XJON section.

XJON

The native data format of jaq is a superset of JSON called XJON (eXtended JavaScript Object Notation, pronounced like “action”).

XJON extends JSON with following constructs:

  • Line comments: # ... \n is interpreted as comment
  • Special floating-point numbers: NaN, Infinity, -Infinity
  • Numbers starting with +: Every number that may be prefixed with - (minus) may also be prefixed with + (plus), e.g. +7, +Infinity.
  • UTF-8 strings with invalid code units: The JSON standard is slightly ambiguous whether strings may contain invalid UTF-8 code units. XJON explicitly allows for invalid code units in UTF-8 strings, e.g. the output of printf '"\xFF"'. This increases compatibility with tools that output such strings (e.g. file names). Furthermore, it allows for constant-time loading of strings via --rawfile, where jq takes linear time due to UTF-8 validation.
  • Byte strings: A byte string is created via b"...", where ... is a sequence of:
    • bytes in the range 0x20 to (including) 0xFF, excluding the ASCII characters '"' and '\'
    • an escape sequence, starting with a backslash ('\') and followed by b, f, n, r, t, '"', '\', or xHH, where HH is a hexadecimal number For example: b"Here comes \xFF, dadadada\nHere comes \xFF\nAnd I say: \"It's alright\"\x00". Byte strings of this shape can also be found in other languages, like Rust & Python (with leading b) and JavaScript & C (without leading b).
  • Objects with non-string keys: Where JSON limits object keys to strings, XJON allows arbitrary values as object keys. For example: {null: 0, true: 1, 2: 3, "str": 4, ["arr"]: 5, {}: 6}

The goal behind XJON was to support a set of values present in YAML and CBOR, namely byte strings and objects with non-string keys, while keeping the format both human-readable and simple & performant to parse, like JSON.

XJON can losslessly encode any jaq value; in particular, decoding an XJON-encoded value is equivalent to the original value. For example:

  • nan | . | isnan ⟼ true in jaq and jq
  • nan | tojson | fromjson | isnan ⟼ true in jaq, but false in jq

That means that tojson | fromjson is equivalent to . in jaq, whereas it is not equivalent in jq, in particular because of NaN and Infinity.

Currently, wherever jaq accepts JSON, it also accepts XJON. That means that jaq --from json <<< 'NaN b"Bytes" {1: 2} # Over and out' yields 'NaN b"Bytes" {1: 2}, although the input is XJON, not valid JSON.

YAML

YAML (YAML Ain’t Markup Language™) is “a human-friendly data serialization language for all programming languages”. It is also a JSON and XJON superset. That means that every jaq value can be encoded as YAML value.

jaq supports reading YAML with anchors (&foo) and aliases (*foo). These allow the creation of shared data structures. For example:

  • "[&a 1, &b 2, *a, *b]" | fromyaml ⟼ [1, 2, 1, 2]
  • "[&b [&a [], *a], *b]" | fromyaml ⟼ [[[], []], [[], []]]

jaq validates tags for scalar YAML values, such as null, booleans, numbers, and strings:

  • "!!bool true" | fromyaml ⟼ true
  • "!!int true" | try fromyaml catch "fail" ⟼ "fail"

On the other hand, jaq ignores tags for arrays and objects:

  • "!!foo []" | fromyaml ⟼ []
  • "!!bar {}" | fromyaml ⟼ {}

jaq produces YAML that is very close to JSON/XJON. It differs from XJON only by writing byte strings as Base64-encoded !!binary string and special floating-point values as .inf, -.inf, and .nan:

[infinite, -infinite, nan, ("a" | tobytes), {"a": 1}] | tojson, toyaml  ⟼ 
"[Infinity,-Infinity,NaN,b\"a\",{\"a\":1}]"
"[.inf,-.inf,.nan,!!binary YQ==,{\"a\":1}]"

jaq preserves invalid UTF-8 sequences in text strings when writing YAML. However, jaq yields an error when trying to parse YAML containing invalid UTF-8 sequences.

When using --to yaml, jaq writes --- before every output value and ... after every output value. This is done to indicate the start/end of YAML documents. For example:

$ jaq --to yaml <<< '1 2'
---
1
...
---
2
...

Both --from yaml and the filter fromyaml load the full input into memory before parsing it.

CBOR

CBOR (Concise Binary Object Representation) is a binary format specified in RFC 8949.

CBOR values are a superset of jaq values. That means that there are CBOR values for which there are no equivalent jaq values, for example:

jaq fails when trying to decode such CBOR values.

Every jaq value can be encoded losslessly as a CBOR value, except for text strings with invalid UTF-8 code units. Invalid UTF-8 sequences are replaced with U+FFFD, which looks like this: “�”.

jaq writes sequences of CBOR values by concatenating them without any separator. That means that --to cbor is equivalent to --to cbor --join-output. jaq can also read sequences of concatenated CBOR values.

TOML

TOML is a configuration file format. Compared to jaq values, TOML has date-time values, but TOML has neither null, byte strings, nor non-string object keys.

When writing TOML, jaq converts invalid UTF-8 sequences as for CBOR.

XML

jaq reads data adhering to the XML 1.0 standard. However, it treats only XML data encoded as UTF-8.

jaq can read XHTML files, but it cannot directly read HTML files. You can use tools such as html2xhtml to convert HTML to XHTML.

Mappings between XML to JSON generally have to make a compromise between “friendliness” and round-tripping; see “Experiences with JSON and XML Transformations”. Here, “friendliness” means that JSON generated from XML has a flat structure, making it easy to consume it. Stefan Goessner gives a nice discussion of different “friendly” mappings in “Converting Between XML and JSON”. The take-away message is: “Friendly” mappings lose information. For that reason, jaq does not use a “friendly” mapping, but rather a mapping that preserves XML information perfectly, making it suitable for round-tripping.

As an example, consider the following input:

<a href="https://www.w3.org">World Wide Web Consortium (<em>W3C</em>)</a>

We can see its internal representation in jaq by:

$ echo '<a href="https://www.w3.org">World Wide Web Consortium (<em>W3C</em>)</a>' | jaq --from xml .

This yields the following JSON:

{
  "t": "a",
  "a": { "href": "https://www.w3.org" },
  "c": [
    "World Wide Web Consortium (",
    { "t": "em", "c": [ "W3C" ] },
    ")"
  ]
}

TAC objects

Tags are represented by TAC objects. A TAC object may have the following fields:

  • t: Name of the tag, such as h1 for <h1>...</h1>. This field must always be present in a TAC object.
  • a: Attributes of the tag, such as {"id": "foo", style: "color:blue;"}. If this field is present, it must contain an object with string values.
  • c: Children of the tag. If this field is not present, this tag will be interpreted as self-closing (such as <br/>). When a TAC object produced by jaq (either via --from xml or fromxml) has the c field, it always holds an array of XML values. When writing XML values (either via --to xml or toxml), jaq accepts any XML value at the c field.

An example query to obtain all links in an XHTML file:

.. | select(.t? == "a") | .a.href

We can also transform input XML and yield output XML. For example, to transform all em tags to i tags:

(.. | select(.t? == "em") | .t) = "i"

To yield XML output instead of JSON output, use the option --to xml:

$ echo '<a href="https://www.w3.org">World Wide Web Consortium (<em>W3C</em>)</a>' | jaq --from xml --to xml '(.. | select(.t? == "em") | .t) = "i"'
<a href="https://www.w3.org">World Wide Web Consortium (<i>W3C</i>)</a>

Finally, we can extract all text from an XML file (discarding CDATA blocks):

def xml_text: if isstring then . else .c[]? | xml_text end; [xml_text]

Other values

  • Strings are neither escaped nor unescaped; that means, Tom &amp; Jerry in the source XML becomes "Tom &amp; Jerry" in the target JSON. The @html/@htmld filters can be used for manual (un-)escaping.
  • A comment such as <!-- this comment --> is converted to {"comment": " this comment "}.
  • A CDATA block such as <![CDATA[Tom & Jerry]]> is converted to {"cdata": "Tom & Jerry"}.
  • An XML declaration such as <?xml version="1.0" encoding="UTF-8" standalone="yes"?> is converted to {"xmldecl": {"version": "1.0", "encoding": "UTF-8", "standalone": "yes"}}. (Note that the values given in this declaration, such as the encoding, are ignored by jaq’s XML parser.)
  • A processing instruction such as <?xml-stylesheet href="common.css"?> is converted to {"pi": {"target": "xml-stylesheet", "content": "href=\"common.css\""}}.

To put all of this together, consider the following XML file (examples/test.xhtml):

<?xml version='1.0'?>
<?xml-stylesheet href="common.css"?>
<!DOCTYPE html>
<html xmlns="http://www.w3.org/1999/xhtml">
  <body>
    <!-- CDATA blocks do not require escaping -->
    <![CDATA[Hello & goodbye!]]><br/>
  </body>
</html>

Running jaq . examples/test.xhtml yields the following output:

{
  "xmldecl": {
    "version": "1.0"
  }
}
{
  "pi": {
    "target": "xml-stylesheet",
    "content": "href=\"common.css\""
  }
}
{
  "doctype": {
    "name": "html"
  }
}
{
  "t": "html",
  "a": {
    "xmlns": "http://www.w3.org/1999/xhtml"
  },
  "c": [
    "\n  ",
    {
      "t": "body",
      "c": [
        "\n    ",
        {
          "comment": " CDATA blocks do not require escaping "
        },
        "\n    ",
        {
          "cdata": "Hello & goodbye!"
        },
        {
          "t": "br"
        },
        "\n  "
      ]
    },
    "\n"
  ]
}

The output contains several values consisting only of whitespace, such as "\n ". These are conserved by jaq because XML is a whitespace-sensitive format.

Examples

The following examples should give an impression of what jaq can currently do. You should obtain the same outputs by replacing jaq with jq.

Access a field:

$ echo '{"a": 1, "b": 2}' | jaq '.a'
1

Add values:

$ echo '{"a": 1, "b": 2}' | jaq 'add'
3

Construct an array from an object in two ways and show that they are equal:

$ echo '{"a": 1, "b": 2}' | jaq '[.a, .b] == [.[]]'
true

Apply a filter to all elements of an array and filter the results:

$ echo '[0, 1, 2, 3]' | jaq 'map(.*2) | [.[] | select(. < 5)]'
[0, 2, 4]

Read (slurp) input values into an array and get the average of its elements:

$ echo '1 2 3 4' | jaq -s 'add / length'
2.5

Repeatedly apply a filter to itself and output the intermediate results:

$ echo '0' | jaq '[recurse(.+1; . < 3)]'
[0, 1, 2]

Lazily fold over inputs and output intermediate results:

$ seq 1000 | jaq -n 'foreach inputs as $x (0; . + $x)'
1 3 6 10 15 [...]

Lewis’s Puzzle

The following puzzle was communicated to me at a workshop by a certain Mr. Lewis, where I solved it together with him in jq. It goes as follows:

We have a sequence of strings:

X
XYZX
XYZXABXYZX

For example, the 4th letter of the 2nd string (always counting from zero) is ‘A’. What is the 10244th letter of the 30th string?

First, let us understand how this sequence is built. To get the next sequence of letters, we take the previous sequence, concatenate it with the next two letters in the alphabet, then concatenate it with the previous sequence again.

If we take numbers instead of letters, we can write this down as:

X(0  ) = 0
X(N+1) = X(N) (m+1) (m+2) X(N), where m is the largest element in X(N)

We can now write the strings as JSON arrays. The first array is [0], and we can produce each following array by a filter next, on which we recurse to get an sequence of all arrays.

def next: . + [max + (1, 2)] + .;
[0] | limit(3; recurse(next))  ⟼ 
[0]
[0,1,2,0]
[0,1,2,0,3,4,0,1,2,0]

However, this does not scale well — getting to the 30th array will take a very long time, because the arrays grow exponentially. Feel free to try it, but watch out not to get your RAM eaten. :) (I recommend monitoring RAM usage when doing this experiment. Otherwise, you may very well crash your computer due to memory exhaustion. Guess how I know?)

To solve this problem, we can exploit jq’s sharing. Note that each array contains a portion to the left that is equal to a portion on the right; for example, [0, 1, 2, 0] in the 2nd array. We can therefore choose a slightly different array representation that allows us to share all the equal parts of the array, just by inserting the previous arrays into a new array.

def next: [., .[2] + (1,2), .];
[0] | limit(3; recurse(next))  ⟼ 
[0]
[[0],1,2,[0]]
[[[0],1,2,[0]],3,4,[[0],1,2,[0]]]

In all arrays produced by next, the first and the last elements are now shared, meaning that they are stored only a single time in memory. That allows us to store exponentially large data in linear memory, thus cracking the puzzle.

To get the largest number of the previous array, we used the fact that the 2nd element of each array contains the largest number in the array. For example, the 2nd element of the 1st array is 2, and the 2nd element of the 2nd array is 4. (For the 0th array, the 2nd element is null, but the maximum element of that array is 0. However, null is interpreted by addition just like 0, so this difference does not matter to us.) We can therefore get the two next largest numbers very elegantly via .[2] + (1, 2).

We can now get the numbers of any such array with .. | numbers. For example, the numbers of the 2nd array are:

[[[0],1,2,[0]],3,4,[[0],1,2,[0]]] | .. | numbers  ⟼ 
0 1 2 0 3 4 0 1 2 0

Putting all this together, we get our solution via:

def next: [., .[2] + (1,2), .];
[0] | nth(30; recurse(next)) | nth(10244; .. | numbers)  ⟼ 
2

This now runs almost instantaneously, and gives us the answer 2. Going back to the original puzzle, because X = 0, Y = 1, Z = 2, the final answer to the puzzle is Z.

HTML scraping

I wanted to extract the list of examples from the CBOR specification, in order to create a test suite for the CBOR encoder/decoder in jaq. For this, I copied the relevant section from the HTML source code and pasted it into examples/cbor-examples.xhtml. The interesting parts look like this:

<table>
<tbody>
<tr>
  <td class="text-left" rowspan="1" colspan="1">25</td>
  <td class="text-left" rowspan="1" colspan="1">0x1819</td>
</tr>
<tr>
  <td class="text-left" rowspan="1" colspan="1">100</td>
  <td class="text-left" rowspan="1" colspan="1">0x1864</td>
</tr>
</tbody>
</table>

Here, I was interested in the pairs 25 / 0x1819 and 100 / 0x1864, meaning that the number 25 is encoded in CBOR as 0x1819.

Finally, I came up with a jq program to extract this data. It consists of the following tasks:

  1. Select all <tr> elements with .. | select(.t? == "tr").
  2. Get its children with .c.
  3. Iterate over the children with .[] and get their children with .c?[].

(I use .c? here instead of .c because XML is whitespace-sensitive, so in <tr> <td></td> </tr>, the <tr> element has actually three children, namely two space strings " " and the <td> element. Indexing the strings with .c yields an error, whereas indexing them with .c? just yields nothing, allowing us to ignore the space.)

We can see the effects of that on a slightly simplified version of the HTML:

"<table><tr> <td>25</td> <td>0x1819</td> </tr><tr> <td>100</td> <td>0x1864</td> </tr></table>" | fromxml |
.. | select(.t? == "tr").c | [.[].c?[]]  ⟼ 
[ "25", "0x1819"]
["100", "0x1864"]

To run this via the CLI:

jaq '.. | select(.t? == "tr").c | [.[].c?[]]' examples/cbor-examples.xhtml

We can create a series of tests as follows:

jaq '.. | select(.t? == "tr").c | [.[].c?[]] | @json "jc(\(.[0]), \(.[1][2:]));"' examples/cbor-examples.xhtml -r

I used exactly this command to create a draft for jaq’s CBOR parsing test suite.