NUPANCH
  • Home
  • Services
  • Customers
  • About
  • Contact
  • Blog

Stay informed on the latest in e-commerce analytics

The easy way to read sql

5/29/2018

4 Comments

 
Remember the first time you saw a large SQL query written by somebody else? I do. Meaningless aliases, no comments, all sub-queries and no common-table-expressions. Almost zero readability. What’s worse is that I had to debug this query because it was resulting in a discrepancy. Where on earth do I start?
First - format the query using any one of the million tools available online. I use dpriver.

Second - in the words of my database class professor, “Read the query like the computer reads it.”

When executing a query, the database management system follows a specific order of execution -
  1. FROM (including any joins)
  2. WHERE
  3. GROUP BY
  4. HAVING
  5. SELECT
  6. ORDER BY

In other words -
  • The system first executes the FROM clause i.e. it creates the data set on which the WHERE and SELECT clauses are to be run. This step includes any joins specified in the FROM clause.
  • Then, it eliminates rows from this data set by using the filters specified in the WHERE clause.
  • It then groups the data set by the columns specified in the GROUP BY clause, and finally runs the SELECT clause. This is why you can never use the aliases specified in the SELECT clause to filter in the WHERE clause - because the aliases haven’t been declared yet.

Trying to understand the SELECT clause before understanding which tables are involved in the query is, to the say the least, difficult. Ignore the SELECT and jump straight to the FROM clause. Once you understand this, half the job is done.

Note: When dealing with sub-queries, start with the innermost query and move gradually outwards.

Once you have a good understanding of the query, and have made the changes you need, be a good samaritan and rewrite parts of the query that you think can be made more readable. Specifically, I’m a huge advocate of using the WITH clause in place of sub-queries. They enforce a sense of “sequence” to the query and make it easier to think through the logic of the query.

Let’s say we wanted to identify the customers who have a lifetime revenue of more than $100, and then use this group as an input to our SQL query. Normally, we’d write it as a sub-query and join it in with the rest of the query. While this does the job, the query is much more readable if a WITH clause is used for this purpose. Before the start of the query, define the WITH table:

WITH cust_group AS
(SELECT customer_id, sum(order_total)
FROM orders
GROUP BY customer_id
HAVING sum(order_total)>100)


This temporary table can be joined in to the query just like any other table. Note that this is computationally as efficient as the sub-query method, just a lot more readable.

Hopefully, these tips help in reducing the time and frustration when reading a query written by someone else, or even one that you wrote yourself a long time ago. I’d love to hear more tips to easily read SQL or to make SQL more readable. Please feel free to leave your thoughts in the comments below.
4 Comments
ANAND KUMAR
4/28/2020 12:57:57 pm

I was really helpful as I am a beginner in SQl.

Reply
Akash Agrawal link
4/28/2020 01:35:03 pm

Glad it was helpful, Anand!

Reply
Alex
8/26/2020 08:46:23 pm

Hi! Thanks for insight. Would it be possible for you to help me translate a SQL query?

SELECT week.type,
weekly_avg,
today_so_far
FROM
(SELECT avg(COUNT) AS weekly_avg,
TYPE
FROM
(SELECT date_trunc('day', created_at) AS DAY,

count(external_id),
json_extract_path_text(original_data, 'loanPurpose') AS TYPE
FROM leads.lead_ingestion
WHERE created_at >= getdate() - INTERVAL '7 days'
AND extract(hour
FROM created_at) - 4 < EXTRACT (hour
FROM getdate()) - 4

AND extract(hour
FROM created_at) - 4 >= 0
AND source_name = 'RateTablePalooza'
GROUP BY 1,
3
ORDER BY 1) LOOP
WHERE DAY != CURRENT_DATE
GROUP BY TYPE) week
INNER JOIN
(SELECT count(external_id) AS today_so_far,
json_extract_path_text(original_data, 'loanPurpose') AS TYPE
FROM leads.lead_ingestion
WHERE source_name = 'RateTablePalooza'
AND extract(hour
FROM created_at) - 4 < EXTRACT (hour
FROM getdate()) - 4

AND date_trunc('day', created_at AT TIME ZONE 'US/Eastern') AS DAY =

CURRENT_DATE
GROUP BY 1,
2
ORDER BY 1) ON week.type = day.type

Reply
Akash Agrawal
8/27/2020 12:37:37 pm

Hey Alex - glad it was helpful, and I can certainly help translate that query! Could you please reach out through the "Contact" page?

Reply



Leave a Reply.

      stay up to date with the happenings in e-commerce. 

      Subscribe today!

    Subscribe to Newsletter

    Archives

    June 2018
    May 2018
    April 2018
    March 2018
    February 2018
    January 2018

    Categories

    All

    RSS Feed

Home

About

Contact









​Data migration icon used on the home page is by "unlimicon" from The Noun Project.
Report building, account cleanup and data structuring icons used on the home page are by "Dinosoft Labs" from The Noun Project.
Getting the most out of icon used on the blog page is by "Luis Prado" from The Noun Project. 

  • Home
  • Services
  • Customers
  • About
  • Contact
  • Blog