Site icon Tour Consultancy

Standard SQL – Work around needed for less than operator not working with blank/NULL date – AthenaS3

Goal: Join multiple attributes using PKs and then add multiple rows with unique Ids; d.activity_date, d.activity_type_id using ON d.lead_id AND d.activity_date between a.effective_start_date, a.effective_end_date.

This will give me a table I can use to pivot and view activity by a specific a.journey_id.

Current Outcome: When I add in the my second AND clause of less than a.effective_end_date the returned results set is missing rows due to NULL values in a.effective_end_date. Which makes sense…but now how do I quickly solve for it without a UNION? (Or maybe a UNION is the best option)

Expected Results: All rows of data where d.activity_date is equal to or between a.effective_start_date, a.effective_end_date and all rows where d.activity_date is greater than a.effective_start_date but a.effective_end_date is null.

This will increase row count and all d.attributes will be unique; Some a., b., c., attributes will duplicate.

SELECT
a.journey_id,
c.description,
a.customer_key_id,
b.lead_id,
a.is_active,
a.effective_start_date,
a.effective_end_date,
d.activity_date,
d.activity_type_id,
d.campaign_id,
d.primary_attribute_value_id,
d.primary_attribute_value,
d.attribute_values

FROM "data_lake"."journey_control" a
LEFT JOIN "data_lake"."mkto_leads" b
ON a.customer_key_id = b.customer_key_id
LEFT JOIN "data_lake"."journey" c
ON a.journey_id = c.journey_id
LEFT JOIN "data_lake"."mkto_activities" d
ON b.lead_id=d.lead_id

WHERE a.customer_key_id = 'XXXX-954b-46c3-92c8-5b47f50af988'
--This is where it breaks

What I’ve Tried:

AND d.activity_date >= a.effective_start_date --used alone returns all rows after start date (as expected) 
AND d.activity_date <= a.effective_end_date --When I add this clause I get no results for all scenarios where a.effective_end_date is blank due to there being no data yet. (As expected)

--If I add AND a.effective_end_date IS NULL I lose my between results (As expected)
--AND d.activity_date BETWEEN a.effective_start_date AND a.effective_end_date - Works but again leaves out rows with no end date

Also Tried: Separate queries with a UNION – WORKS –Looking for something more elegant.

Exit mobile version