Kusto Detective Agency — The rarest book is missing! (Part 1 of 5)

Soumyadeep Basu
5 min readNov 15, 2022

--

While browsing twitter, I came across Kusto Detective Agency — a gamified way of learning Kusto Query Language (KQL).

There are a set of five challenges that participants are required to solve using their KQL skills and earn badges. Challenges are in the form of cases that need to be investigated by a detective agency (the storyline is awesome :P)

To start, participants are required to create a free Kusto cluster and separate data ingestion scripts are provided on a case to case basis.

Challenges

  • Challenge 1: The rarest book is missing! (Current article)
  • Challenge 2: Election fraud? (Here)
  • Challenge 3: Bank robbery (Here)
  • Challenge 4: Ready to play? (Here)
  • Challenge 5: Big heist (Here)

Onboarding Task

— — — — — — — — —

Free Kusto Cluster

We are will create a free Kusto cluster, data ingestion scripts are provided at the bottom and we are given a challenge.

Let’s run the ingestion script and perform some light data exploration

Onboarding
| summarize count()

Pretty simple! Looks like we have a table with only one column (type: long) and 10,000,000 entries. We need a sum of all the entries.

Onboarding
| summarize Score=sum(toint(Score))

And we have our answer 😋

The rarest book is missing!

— — — — — — — — — — — — — —

This challenge features a digital library where -

  • Each book has its parameters recorded: number of pages, weight.
  • Each book has RFID sticker attached (RFID: radio-transmitter with ID).
  • Each shelve in the Museum sends data: what RFIDs appear on the shelve and also measures actual total weight of books on the shelve.

However there is one particular — “De Revolutionibus Magnis Data” whose RFID has been detached from the book. We are required to locate the shelf number where this book resides.

We are provided with an ingestion script, let’s run the same.

So we have two tables — Books and Shelves. Some logical observations maybe

  • Every book entry in the Books table has a weight_gram associated with it
  • Since every shelve has an array of rf_ids, we can map the individual rf_ids using the Books table and find their weights and finally sum all of them up.
  • If the sum of weights of books (corresponding to the rf_ids) on a shelve is significantly lower from the total_weight entry of the shelve — we know that a book is missing from the shelve

Let’s do some light data exploration…

We will need to expand the rf_id dynamic array so that we can join it with the Books table.

Shelves
| mv-expand rf_ids
| project shelf,rf_id = tostring(rf_ids),total_weight

Next let’s inner join it with the Books table based on the rf_id column and grab the individual weights of the books. Then run an aggregation to sum the weight of each book (weight_gram) by every shelve.

let shelves_expanded = (Shelves
| mv-expand rf_ids
| project shelf,rf_id = tostring(rf_ids),total_weight);
shelves_expanded
| join kind=inner(Books) on rf_id
| summarize weight=sum(weight_gram),books_count=count() by shelf
| order by shelf asc

Next we need to compare the resulting sum of the weights (by summing individual book weights) we just obtained with the total_weight entry in the Shelves table. So, we will again inner join the resulting table with the original Shelves table based on the shelve number. Next we will create a new column which calculates the absolute difference between the sum of weight and obtained from the two tables and display those entries where the weight is a significant large value (I have chosen > 100g)

let shelves_expanded = (Shelves
| mv-expand rf_ids
| project shelf,rf_id = tostring(rf_ids),total_weight);
let shelves_weight = (shelves_expanded
| join kind=inner(Books) on rf_id
| summarize weight=sum(weight_gram),books_count=count() by shelf
| order by shelf asc);
shelves_weight
| join kind=inner(Shelves) on shelf
| extend difference = abs(total_weight - weight)
| where weight != total_weight and difference > 100
| project shelf,weight, total_weight, books_count, difference
| sort by shelf asc

And we have the required shelve number 😄

PS: I started learning KQL recently, feel free to reach out to me and correct me, incase you spot some inconsistencies I would highly appreciate that. And finally thank you very much for taking your time to read this post.

--

--

Soumyadeep Basu

CTF 🚩 ● Hack the Box ● CyberSec Enthusiast ● Snooker Addict