Kusto Detective Agency — Bank Robbery (Part 3 of 5)
While browsing twitter, I came across Kusto Detective Agency — a gamified way of learning Kusto Query Language (KQL).
There are a set of five challenges that participants are required to solve using their KQL skills and earn badges. Challenges are in the form of cases that need to be investigated by a detective agency (the storyline is awesome :P)
To start, participants are required to create a free Kusto cluster and separate data ingestion scripts are provided on a case to case basis.
Challenges
Bank robbery
— — — — — — —
This challenge features a bank robbery performed by a gang of three armed men. The bank is located at 157th Ave / 148th Street.
We are provided with a precise order of events:
- 08:17AM: A gang of three armed men enter a bank located at 157th Ave / 148th Street and start collecting the money from the clerks.
- 08:31AM: After collecting a decent loot (est. 1,000,000$ in cash), they pack up and get out.
- 08:40AM: Police arrives at the crime scene, just to find out that it is too late, and the gang is not near the bank. The city is sealed — all vehicles are checked, robbers can’t escape. Witnesses tell about a group of three men splitting into three different cars and driving away.
- 11:10AM: After 2.5 hours of unsuccessful attempts to look around, the police decide to turn to us, so we can help in finding where the gang is hiding.
We know that the gang has meetup at one final destination and we need to locate the place.
After running the ingestion script we find that data provided to us contains camera recording of all spotted vehicles in the form of — Timestamp, VIN (Vehicular Identification Number), Ave (Avenue) and Street details
Now let’s build our query logic —
- Since the gang left the bank at 8:31 AM (in three different cars) and police arrived at 8:40 AM, all three of those cars must have been spotted at 157th Ave / 148th Street. Let all those cars that fit this description i.e. spotted b/w 8:31 AM and 8:40 am at the said location be L1
- Since the gang entered the bank at 8:17 AM and left only at 8:31 AM, their cars should have been parked at the bank and hence all those cars that have been spotted at places other than 157th Ave/ 148th Street do not match our criteria. Let this list of cars be L2.
- Now since the robbers meet up at one common place after the robbery, they must have one common destination. Hence for cars that remain (L1 — L2), we will find their last know location and look for those (Avenue, Street) combination where no of such cars = 3
Traffic
| where Timestamp > datetime("2022-10-16T08:31:00") and Timestamp < datetime("2022-10-16T08:40:00")
| where Ave == "157" and Street == "148"
| distinct VIN
We have 194 distinct cars that were spotted at 157th Avenue/ 148th Street between 8:31 AM to 8:40 AM. We will inner-join the resulting table (L1) with the original Traffic table based on the VIN column and find out all those cars who were spotted at places other than 157th Avenue/ 148th Street between 8:17 AM to 8:31 AM.
let bank_vehicles = (Traffic
| where Timestamp > datetime("2022-10-16T08:31:00") and Timestamp < datetime("2022-10-16T08:40:00")
| where Ave == "157" and Street == "148"
| distinct VIN);
bank_vehicles
| join kind=inner (Traffic) on VIN
| where Timestamp > datetime("2022-10-16T08:17:00") and Timestamp < datetime("2022-10-16T08:31:00")
| where Ave != "157" or Street != "148"
| distinct VIN
Let this new table be L2. These 150 odd cars do not fit our criteria and must be filtered out from the original L1 table. We will perform a leftanti join (shows records from the left table that don’t have matches from the right) between L1 and L2 so that we are left with only those cars who have been spotted at 157th Avenue/ 148th Street between 8:31 AM to 8:40 AM and have not been spotted anywhere else between 8:17 AM to 8:31 AM. This is similar to L1-L2 in set operations.
let bank_vehicles = (Traffic
| where Timestamp > datetime("2022-10-16T08:31:00") and Timestamp < datetime("2022-10-16T08:40:00")
| where Ave == "157" and Street == "148"
| distinct VIN);
let non_suspicious_vehicles = (bank_vehicles
| join kind=inner (Traffic) on VIN
| where Timestamp > datetime("2022-10-16T08:17:00") and Timestamp < datetime("2022-10-16T08:31:00")
| where Ave != "157" or Street != "148"
| distinct VIN);
bank_vehicles
| join kind=leftanti (non_suspicious_vehicles) on VIN
As expected we are left with 44 cars. Let this table be L3. Next we need to find the last known location of these cars. For that, we will perform a right-semi join the original Traffic table and create an aggregation that gives us the maximum value of the Timestamp column for each vehicle (VIN).
let bank_vehicles = (Traffic
| where Timestamp > datetime("2022-10-16T08:31:00") and Timestamp < datetime("2022-10-16T08:40:00")
| where Ave == "157" and Street == "148"
| distinct VIN);
let non_suspicious_vehicles = (bank_vehicles
| join kind=inner (Traffic) on VIN
| where Timestamp > datetime("2022-10-16T08:17:00") and Timestamp < datetime("2022-10-16T08:31:00")
| where Ave != "157" or Street != "148"
| distinct VIN);
let suspicious_vehicles = (bank_vehicles
| join kind=leftanti (non_suspicious_vehicles) on VIN);
suspicious_vehicles
| join kind=rightsemi (Traffic) on VIN
| summarize max(Timestamp) by VIN
Why right-semi join?
Right-semi join returns all the records from the right side that have matches from the left. In this case since we are joining with the table on the right (Traffic) based on the VIN column, it will have more than multiple records for each vehicle out of which we need the one with maximum value of timestamp
Now we have a list of all vehicles (VIN) that match both our criteria and their corresponding max(Timestamp) i.e last date/time it has been recorded at a security camera. Let the resulting table be L4.
We will perform an inner join with L4 and the original Traffic table based on the timestamp column so that we can find their last known location from the recorded max(timestamp). Finally we will create an aggregation which counts no of cars at each such last know location (avenue/street combination). If car count = 3 we have the hiding place for the gang.
let bank_vehicles = (Traffic
| where Timestamp > datetime("2022-10-16T08:31:00") and Timestamp < datetime("2022-10-16T08:40:00")
| where Ave == "157" and Street == "148"
| distinct VIN);
let non_suspicious_vehicles = (bank_vehicles
| join kind=inner (Traffic) on VIN
| where Timestamp > datetime("2022-10-16T08:17:00") and Timestamp < datetime("2022-10-16T08:31:00")
| where Ave != "157" or Street != "148"
| distinct VIN);
let suspicious_vehicles = (bank_vehicles
| join kind=leftanti (non_suspicious_vehicles) on VIN);
let last_locations= (suspicious_vehicles
| join kind=rightsemi (Traffic) on VIN
| summarize max(Timestamp) by VIN);
last_locations
| join kind=inner (Traffic) on VIN, $left.max_Timestamp == $right.Timestamp
| summarize count() by Ave,Street
| where count_ >=3
PS: I started learning KQL recently, feel free to reach out to me and correct me, incase you spot some inconsistencies I would highly appreciate that. And finally thank you very much for taking your time to read this post.