Difference between revisions of "Json queries with JMEPath and jq"
(→jq) |
m (Pio2pio moved page JMEPath - JSON Queries to Json queries with JMEPath and jq without leaving a redirect) |
||
(17 intermediate revisions by the same user not shown) | |||
Line 1: | Line 1: | ||
= What is [http://json.org/ JSON]= | |||
[http://json.org/ JSON] (JavaScript Object Notation) is a lightweight data-interchange format. It is easy for humans to read and write. It is easy for machines to parse and generate. | |||
JSON is built on two structures: | |||
* A collection of name/value pairs. In various languages, this is realized as an object, record, struct, dictionary, hash table, keyed list, or associative array. | |||
* An ordered list of values. In most languages, this is realized as an array, vector, list, or sequence. | |||
;An object: is an unordered set of name/value pairs. An object begins with <code>{</code> (left brace) and ends with <code>}</code> (right brace). Each name is followed by <code>:</code> (colon) and the name/value pairs are separated by <code>,</code> (comma). | |||
;An array: is an ordered collection of values. An array begins with <code>[</code> (left bracket) and ends with <code>]</code> (right bracket). Values are separated by <code>,</code> (comma). | |||
;A value: can be a string in double quotes, or a number, or true or false or null, or an object or an array. These structures can be nested. | |||
;A string: is a sequence of zero or more Unicode characters, wrapped in double quotes, using backslash escapes. A character is represented as a single character string. A string is very much like a C or Java string. | |||
;A number: is very much like a C or Java number, except that the octal and hexadecimal formats are not used. | |||
= Interactive tools = | = Interactive tools = | ||
== [https://jqplay.org/ jqplay] online == | |||
Online jq playground. | |||
[[File:Jqplay.png|700px|none|left|Jq playground]] | |||
== Jiq == | == Jiq == | ||
=== Install Jiq === | === Install Jiq === | ||
Line 14: | Line 29: | ||
<li>Install <code>jiq</code> | <li>Install <code>jiq</code> | ||
<source> | <source> | ||
go get github.com/fiatjaf/jiq/cmd/jiq | go get github.com/fiatjaf/jiq/cmd/jiq #required sudo in Ubuntu 18.04 WSL | ||
export PATH=$PATH:~/go/bin #location where go-golang Ubuntu package installs go-lang binaries | |||
</source></li> | </source></li> | ||
</ol> | </ol> | ||
== jmespath.terminal == | == jmespath.terminal == | ||
This is [https://github.com/jmespath/jmespath.terminal jmespath.terminal] git project without latest activity but still works. | |||
= jq = | = jq = | ||
== | == Turn JSON into CSV == | ||
<source lang="bash"> | |||
aws ec2 describe-snapshots --owner-ids 777781152222 --region us-east-1 --profile prod > prod_snapshots.json | |||
cat prod_snapshots.json \ | |||
| jq -r '.Snapshots[] | [.Description,.Encrypted,.VolumeId,.State,.VolumeSize,.StartTime,.Progress,.OwnerId,.SnapshotId] | @csv' \ | |||
> prod_snapshots.csv | |||
# by using filter '.Snapshots[]' you select "Snapshots" array every object, then | |||
# create an array with [.key1,.key2] as array is required data structure to turn into CSV | |||
# by default double quote will be escaped ' \" ' using '-r' prevents doing it making easier to open in Excel. | |||
</source> | |||
Json blob | |||
<source lang="json"> | |||
{ | |||
"Snapshots": [ | |||
{ | |||
"Description": "Copied for DestinationAmi ami-85921111 from SourceAmi ami-e3045222 for SourceSnapshot snap-0ebc77af111b09222. Task created on 1,487,109,468,219.", | |||
"Encrypted": false, | |||
"VolumeId": "vol-ffffffff", | |||
"State": "completed", | |||
"VolumeSize": 8, | |||
"StartTime": "2017-02-14T21:57:52.000Z", | |||
"Progress": "100%", | |||
"OwnerId": "3335812595337", | |||
"SnapshotId": "snap-58862222" | |||
}, | |||
{ | |||
"Description": "Copied for DestinationAmi ami-ccc875a8 from SourceAmi ami-cfda333 for SourceSnapshot snap-71347333. Task created on 1,485,867,366,917.", | |||
"Encrypted": false, | |||
"VolumeId": "vol-ffffffff", | |||
"State": "completed", | |||
"VolumeSize": 60, | |||
"StartTime": "2017-01-31T12:56:11.000Z", | |||
"Progress": "100%", | |||
"OwnerId": "222557238111", | |||
"SnapshotId": "snap-2314333" | |||
} | |||
] | |||
} | |||
</source> | |||
== Display only Keys that mach value == | |||
Extract AWS volumes | |||
<source lang="bash"> | |||
aws ec2 describe-volumes --region us-east-1 --profile prod > prod_volumes.json | |||
cat prod_volumes.json \ | |||
| cat ftr_volumes_all.json \ | |||
| jq --raw-output '.Volumes[] | [ (.Tags[]? | select(.Key=="Name").Value, select(.Key=="Backup").Value), .VolumeId, .State, .SnapshotId, .Attachments[].InstanceId //"notAttached", .CreateTime, .Size ] | @csv' \ | |||
> ftr_volumes_all.csv \ | |||
> prod_volumes.csv | |||
</source> | |||
* <code>.Tags[]?</code> means that the list/object is optional, prevents ''jq: error - Cannot iterate over null (null)'' error | |||
* <code>(.Tags[]? | select(.Key=="Name").Value, select(.Key=="Backup").Value)</code> taking expression in brackets groups multiple filters into one, so it is not affecting next filter. Remember <code>,</code> coma feeds date through next filter here it's <code>.VolumeId</code>. This produces first 2 items on in the list, see below in results. | |||
* <code>.Attachments[].InstanceId //"notAttached"</code> 2 forward slashes set default value | |||
{| class="wikitable" | |||
|+ Blob and results | |||
!Blob | |||
!Result | |||
|- | |||
|<source lang="json"> | |||
{ | |||
"Volumes": [ | |||
{ | |||
"AvailabilityZone": "us-east-1b", | |||
"Attachments": [], | |||
"Tags": [ | |||
{ | |||
"Value": "Comms Scheduler", | |||
"Key": "Name" | |||
} | |||
], | |||
"Encrypted": false, | |||
"VolumeType": "gp2", | |||
"VolumeId": "vol-1111cd2b12e622222", | |||
"State": "available", | |||
"Iops": 300, | |||
"SnapshotId": "snap-1111810031fa22222", | |||
"CreateTime": "2018-06-20T15:29:32.333Z", | |||
"Size": 100 | |||
}, | |||
{ | |||
"AvailabilityZone": "us-east-1b", | |||
"Attachments": [ | |||
{ | |||
"AttachTime": "2018-06-27T11:13:30.000Z", | |||
"InstanceId": "i-44441a50fc245555", | |||
"VolumeId": "vol-77770ff240e67777", | |||
"State": "attached", | |||
"DeleteOnTermination": true, | |||
"Device": "/dev/sda1" | |||
} | |||
], | |||
"Tags": [ | |||
{ | |||
"Value": "apache", | |||
"Key": "Name" | |||
}, | |||
{ | |||
"Value": "False", | |||
"Key": "Backup" | |||
} | |||
], | |||
"Encrypted": true, | |||
"VolumeType": "gp2", | |||
"VolumeId": "vol-123123ff240e62222", | |||
"State": "in-use", | |||
"KmsKeyId": "arn:aws:kms:us-west-1:0123481158111:key/40222222-4455-7777-8888-c06f27d59999", | |||
"SnapshotId": "snap-999947cb7448d8888", | |||
"Iops": 100, | |||
"CreateTime": "2018-06-27T11:13:30.582Z", | |||
"Size": 8 | |||
} | |||
] | |||
} | |||
</source> | |||
|style="vertical-align:top;" |<source lang="json"> | |||
[ | |||
"Comms Scheduler", | |||
"backupFalse", | |||
"vol-1111cd2b12e622222", | |||
"available", | |||
"snap-1111810031fa22222", | |||
"notAttached", | |||
"2018-06-20T15:29:32.333Z", | |||
100 | |||
] | |||
[ | |||
"apache", | |||
"backupFalse", | |||
"False", | |||
"vol-123123ff240e62222", | |||
"in-use", | |||
"snap-999947cb7448d8888", | |||
"i-44441a50fc245555", | |||
"2018-06-27T11:13:30.582Z", | |||
8 | |||
] | |||
</source> | |||
|} | |||
== Filtered list of instance name, type, virtualization type and instance id == | |||
At the beginning my simple convention for Json is <tt>[]</tt> is an array and <tt>{}</tt> is an object. | At the beginning my simple convention for Json is <tt>[]</tt> is an array and <tt>{}</tt> is an object. | ||
Line 95: | Line 255: | ||
Let's sort out the output by ''_id'' field. Note, only arrays <code>[]</code> aka lists can be sorted using <code>sort_by</function>. That's why we need to wrap our object <code>{}</code> into brackets <code>[]</code>. | Let's sort out the output by ''_id'' field. Note, only arrays <code>[]</code> aka lists can be sorted using <code>sort_by</function>. That's why we need to wrap our object <code>{}</code> into brackets <code>[]</code>. | ||
<source> | <source> | ||
[Filter]> [.Reservations[].Instances[] | {_name: .Tags[] | select(.Key=="Name").Value | match(".*mft.*").string, _id: .InstanceId, _type: .InstanceType, _virt_type: .Hypervisor }] | sort_by(._id) | [Filter]> [.Reservations[].Instances[] | {_name: .Tags[] | select(.Key=="Name").Value | match(".*box.*").string, _id: .InstanceId, _type: .InstanceType, _virt_type: .Hypervisor }] | sort_by(._id) | ||
</source> | |||
Real use: | |||
<source> | |||
$ aws ec2 describe-instances --filters "Name=tag:Name,Values=*mft*" "Name=instance-state-code,Values=16" | jq -r '[.Reservations[].Instances[] | {_name: .Tags[] | select(.Key=="Name").Value | match(".*box.*").string, _id: .InstanceId, _type: .InstanceType, _virt_type: .Hypervisor }] | sort_by(._id)' | |||
</source> | </source> | ||
Line 113: | Line 278: | ||
+----------------+--------------+-------------+ | +----------------+--------------+-------------+ | ||
</source> | </source> | ||
= References = | |||
*[https://programminghistorian.org/en/lessons/json-and-jq jq and json training] | |||
*[https://stedolan.github.io/jq/manual/ Jq manual] Include functions | |||
*[https://jqplay.org/ jqplay] Jq playground online querying tool | |||
*[https://oncletom.io/2016/pipelining-http/ Processing HTTP to HTTP with bash, curl and jq] |
Latest revision as of 17:48, 18 October 2020
What is JSON
JSON (JavaScript Object Notation) is a lightweight data-interchange format. It is easy for humans to read and write. It is easy for machines to parse and generate. JSON is built on two structures:
- A collection of name/value pairs. In various languages, this is realized as an object, record, struct, dictionary, hash table, keyed list, or associative array.
- An ordered list of values. In most languages, this is realized as an array, vector, list, or sequence.
- An object
- is an unordered set of name/value pairs. An object begins with
{
(left brace) and ends with}
(right brace). Each name is followed by:
(colon) and the name/value pairs are separated by,
(comma). - An array
- is an ordered collection of values. An array begins with
[
(left bracket) and ends with]
(right bracket). Values are separated by,
(comma). - A value
- can be a string in double quotes, or a number, or true or false or null, or an object or an array. These structures can be nested.
- A string
- is a sequence of zero or more Unicode characters, wrapped in double quotes, using backslash escapes. A character is represented as a single character string. A string is very much like a C or Java string.
- A number
- is very much like a C or Java number, except that the octal and hexadecimal formats are not used.
Interactive tools
jqplay online
Online jq playground.
Jiq
Install Jiq
- Install
jq
- Install Golang <1.7
- Export environment variables eg. to
.bashrc
or.profile
files.export GOPATH=/home/ubuntu/.local/go export PATH="$PATH:$HOME/.local/go"
- Install
jiq
go get github.com/fiatjaf/jiq/cmd/jiq #required sudo in Ubuntu 18.04 WSL export PATH=$PATH:~/go/bin #location where go-golang Ubuntu package installs go-lang binaries
jmespath.terminal
This is jmespath.terminal git project without latest activity but still works.
jq
Turn JSON into CSV
aws ec2 describe-snapshots --owner-ids 777781152222 --region us-east-1 --profile prod > prod_snapshots.json cat prod_snapshots.json \ | jq -r '.Snapshots[] | [.Description,.Encrypted,.VolumeId,.State,.VolumeSize,.StartTime,.Progress,.OwnerId,.SnapshotId] | @csv' \ > prod_snapshots.csv # by using filter '.Snapshots[]' you select "Snapshots" array every object, then # create an array with [.key1,.key2] as array is required data structure to turn into CSV # by default double quote will be escaped ' \" ' using '-r' prevents doing it making easier to open in Excel.
Json blob
{ "Snapshots": [ { "Description": "Copied for DestinationAmi ami-85921111 from SourceAmi ami-e3045222 for SourceSnapshot snap-0ebc77af111b09222. Task created on 1,487,109,468,219.", "Encrypted": false, "VolumeId": "vol-ffffffff", "State": "completed", "VolumeSize": 8, "StartTime": "2017-02-14T21:57:52.000Z", "Progress": "100%", "OwnerId": "3335812595337", "SnapshotId": "snap-58862222" }, { "Description": "Copied for DestinationAmi ami-ccc875a8 from SourceAmi ami-cfda333 for SourceSnapshot snap-71347333. Task created on 1,485,867,366,917.", "Encrypted": false, "VolumeId": "vol-ffffffff", "State": "completed", "VolumeSize": 60, "StartTime": "2017-01-31T12:56:11.000Z", "Progress": "100%", "OwnerId": "222557238111", "SnapshotId": "snap-2314333" } ] }
Display only Keys that mach value
Extract AWS volumes
aws ec2 describe-volumes --region us-east-1 --profile prod > prod_volumes.json cat prod_volumes.json \ | cat ftr_volumes_all.json \ | jq --raw-output '.Volumes[] | [ (.Tags[]? | select(.Key=="Name").Value, select(.Key=="Backup").Value), .VolumeId, .State, .SnapshotId, .Attachments[].InstanceId //"notAttached", .CreateTime, .Size ] | @csv' \ > ftr_volumes_all.csv \ > prod_volumes.csv
.Tags[]?
means that the list/object is optional, prevents jq: error - Cannot iterate over null (null) error(.Tags[]? | select(.Key=="Name").Value, select(.Key=="Backup").Value)
taking expression in brackets groups multiple filters into one, so it is not affecting next filter. Remember,
coma feeds date through next filter here it's.VolumeId
. This produces first 2 items on in the list, see below in results..Attachments[].InstanceId //"notAttached"
2 forward slashes set default value
Blob | Result |
---|---|
{ "Volumes": [ { "AvailabilityZone": "us-east-1b", "Attachments": [], "Tags": [ { "Value": "Comms Scheduler", "Key": "Name" } ], "Encrypted": false, "VolumeType": "gp2", "VolumeId": "vol-1111cd2b12e622222", "State": "available", "Iops": 300, "SnapshotId": "snap-1111810031fa22222", "CreateTime": "2018-06-20T15:29:32.333Z", "Size": 100 }, { "AvailabilityZone": "us-east-1b", "Attachments": [ { "AttachTime": "2018-06-27T11:13:30.000Z", "InstanceId": "i-44441a50fc245555", "VolumeId": "vol-77770ff240e67777", "State": "attached", "DeleteOnTermination": true, "Device": "/dev/sda1" } ], "Tags": [ { "Value": "apache", "Key": "Name" }, { "Value": "False", "Key": "Backup" } ], "Encrypted": true, "VolumeType": "gp2", "VolumeId": "vol-123123ff240e62222", "State": "in-use", "KmsKeyId": "arn:aws:kms:us-west-1:0123481158111:key/40222222-4455-7777-8888-c06f27d59999", "SnapshotId": "snap-999947cb7448d8888", "Iops": 100, "CreateTime": "2018-06-27T11:13:30.582Z", "Size": 8 } ] } |
[ "Comms Scheduler", "backupFalse", "vol-1111cd2b12e622222", "available", "snap-1111810031fa22222", "notAttached", "2018-06-20T15:29:32.333Z", 100 ] [ "apache", "backupFalse", "False", "vol-123123ff240e62222", "in-use", "snap-999947cb7448d8888", "i-44441a50fc245555", "2018-06-27T11:13:30.582Z", 8 ] |
Filtered list of instance name, type, virtualization type and instance id
At the beginning my simple convention for Json is [] is an array and {} is an object.
Get an JSON output of all instances in your AWS accounty
aws ec2 describe-instances --output json | jiq #pipe into jiq
#jiq's expressions line Filter]> .Reservations[].Instances[] | {_name: .Tags[], _id: .InstanceId, _type: .InstanceType, _virtType: .Hypervisor }
.Reservations[].Instances[] returns all data that we need, therefore we pipe into new object { _key1: XXX, _key2: YYY }. Actually, the keys fields we define now, as: _name, _id, _id, _type.
Next, let's select only Tags we are interested in, from a list(object). Therefore, we pipe Tags via select function. Where the Tag name is Name and we want get .Value field.
{name: .Tags[] | select(.Key=="Name").Value
Full filter command:
[Filter]> .Reservations[].Instances[] | {name: .Tags[] | select(.Key=="Name").Value, id: .InstanceId, type: .InstanceType, virt_type: .Hypervisor } { "_name": "box-2.prod1.example.com", "_id": "i-ecccc666", "_type": "c4.2xlarge", "_virt_type": "xen" } { "_name": "db-2.prod1.example.com", "_id": "i-0aaaaae989a68b6e6", "_type": "m4.large", "_virt_type": "xen" }
Now, we want to display only records, that match our instance name using REGEX expression .*box.*
{_name: .Tags[] | select(.Key=="Name").Value | match(".*box.*")
[Filter]> .Reservations[].Instances[] | {_name: .Tags[] | select(.Key=="Name").Value | match(".*box.*") , _id: .InstanceId, _type: .InstanceType, _virt_type: .Hypervisor } { "_name": { "offset": 0, "length": 25, "string": "box-2.prod1.example.com", "captures": [] }, "_id": "i-0c26072b64b6404dc", "_type": "m4.large", "_virt_type": "xen" } { "_name": { "offset": 0, "length": 24, "string": "db-2.prod1.example.com", "captures": [] }, "_id": "i-0a6f8fe989a68b6e6", "_type": "m4.large", "_virt_type": "xen" }
The output contains now extra info. Butm but we are only interested in string field
{_name: .Tags[] | select(.Key=="Name").Value | match(".*box.*").string
Let's sort out the output by _id field. Note, only arrays []
aka lists can be sorted using sort_by</function>. That's why we need to wrap our object
{}
into brackets []
.
[Filter]> [.Reservations[].Instances[] | {_name: .Tags[] | select(.Key=="Name").Value | match(".*box.*").string, _id: .InstanceId, _type: .InstanceType, _virt_type: .Hypervisor }] | sort_by(._id)
Real use:
$ aws ec2 describe-instances --filters "Name=tag:Name,Values=*mft*" "Name=instance-state-code,Values=16" | jq -r '[.Reservations[].Instances[] | {_name: .Tags[] | select(.Key=="Name").Value | match(".*box.*").string, _id: .InstanceId, _type: .InstanceType, _virt_type: .Hypervisor }] | sort_by(._id)'
Awscli queries
$ aws ec2 describe-instances \
--output table \
--query 'Reservations[].Instances[].[Tags[?Key==`Name`] | [0].Value,InstanceId,InstanceType]'
And we get a nicely formatted table:
-----------------------------------------------
| DescribeInstances |
+----------------+--------------+-------------+
| xxxxxxxxxx | i-a0169xxx | r3.large |
| yyyyyyyyyy | i-11a46xxx | m3.large |
| zzzzzzzzzzzzzz| i-07c4axxx | t2.medium |
+----------------+--------------+-------------+
References
- jq and json training
- Jq manual Include functions
- jqplay Jq playground online querying tool
- Processing HTTP to HTTP with bash, curl and jq