by @Nattaphat and grammar, sentence revised by Gemini
This week, my deep dive into BigQuery covered several critical architectural and cost-management concepts:
- Dry Running & Cost Estimation: I learned the importance of dry running queries to estimate the bytes processed before executing them, which is essential for cost control.
- BigQuery Architecture: I explored how BigQuery separates storage and compute to handle massive datasets efficiently.
- Cost Computation: I gained a clearer understanding of how Google Cloud calculates costs for various transactions, particularly the difference between native and external tables.
- Optimization Best Practices: To maximize resource efficiency and minimize "wasted" spend, I learned to avoid
SELECT * and instead leverage partitioning and clustering.
- Advanced SQL Syntax: I practiced using specific BigQuery syntax, such as
Select ... as ..., except, replace, and defining External tables
- Columnar Storage: I now understand how BigQuery’s columnar storage format enables high-performance analytical queries by only reading the necessary columns.
I personally recommend you to have some understanding of SQL, Data warehouse concepts, or else you can’t continue to understand more of the details of this week.
Overcoming Technical Hurdles(again and always)
In homework section, I done something which is actually unnecessary to do it! it’s just a repeatitive loading the data using instructor’s .ipynb file template after I had already successfully loaded via .py script, which different environmental variable setup. It took me over a day of troubleshooting to realize the data was already there—I was just fighting my own configuration!
Homework solution url + solution in the .sql file
https://github.com/politto/zoomcamp-data-warehouse-ggcloud