-
Notifications
You must be signed in to change notification settings - Fork 230
Querying Dynamic Tables with SQL
This session discusses how to write SQL queries to process streaming data.
Specifically, you will learn about
- Flink's concept of Dynamic Tables
- Stream-Table conversions
- Running queries on dynamic tables
- Reasoning about and managing query state
These exercises will teach you how to write queries on dynamic tables.
The following exercises process data by continuously updating their results. Queries produce updates and/or deletions.
For this exercise you should compute the number of rides per number of passengers, i.e., determine how many rides happened with 1, 2, 3, ... passengers.
We are only interested in counts for rides that started in New York City.
Click here for hints.
- Each ride is represented by two events. Filter out all end events for accurate counts.
The output should look similar to:
psgCnt cnt
4 6051
2 31063
3 10812
6 5408
5 11031
1 100367
Click to see the solution.
SELECT
psgCnt,
COUNT(*) AS cnt
FROM Rides
WHERE isInNYC(lon, lat) AND isStart
GROUP BY
psgCnt;
In this exercise we want to count the total number of arriving and departing rides per area for every hour of a day, i.e., the total number of rides per area that happen from 12pm to 1am, 1am to 2am, and so on. Hence, we do not want a separate count for every day but the combined counts of all days for every hour.
We are only interested in events that start or end in New York City. To limit the size of the result, return only areas with counts that include more than 60 events.
Click here for hints.
- The query is similar to the query of the Windowed Ride Count exercise but differs in the GROUP BY clause, because it updates previous counts every day.
- Use the built-in
HOUR
function to extract the hour of a timestamp. - Use the provided
toAreaId
to convert coordinates to an area id.
The output should look similar to:
area isStart hourOfDay cnt
49551 false 0 85
49789 true 0 64
48806 true 0 75
50044 false 0 62
52543 true 0 67
49792 false 1 77
48559 true 1 114
48808 true 1 100
Click to see the solution.
SELECT
toAreaId(lon, lat) AS area,
isStart,
HOUR(rideTime) AS hourOfDay,
COUNT(*) AS cnt
FROM Rides
WHERE isInNYC(lon, lat)
GROUP BY
toAreaId(lon, lat),
isStart,
HOUR(rideTime)
HAVING COUNT(*) > 60;
The query is a typical SELECT FROM GROUP BY
query. After filtering by rides in New York City, the result is grouped by area id, isStart
flag, and hour of the day (computed by the HOUR(timestamp)
function). By using the HAVING
clause, we only return counts that are greater than 60.
Apache Flink, Flink®, Apache®, the squirrel logo, and the Apache feather logo are either registered trademarks or trademarks of The Apache Software Foundation.