Tag: Big Data
Big Data GCP Certification Sept. 16, 2024How I Passed the Google Cloud Professional Data Engineer Certification Exam — August 2024 - A simple and Comprehensive guide to becoming a GCP Data Engineer.
Big Data BigQuery Looker July 1, 2024Utilizing ClickHouse to Reduce Costs from Your BigQuery and Looker Usage Part 1 - Reduce your Looker and BigQuery Costs by Utilizing Clickhouse to “cache” your data.
Big Data BigQuery Looker July 1, 2024Utilizing ClickHouse to Reduce Costs from Your BigQuery and Looker Usage Part 2 - Reduce your Looker and BigQuery Costs by Utilizing Clickhouse to “cache” your data.
Big Data BigQuery March 18, 2024Efficient BigQuery Data Modeling: A Storage and Compute Comparison - BigQuery storage and compute comparison for normalized, denormalized, and nested design: an in-depth analysis with actionable optimizations.
Big Data BigQuery Billing dbt Feb. 11, 2024Reducing BigQuery Costs by 100–200x with dbt Incremental Models - Reducing costs for dbt models in BigQuery.
Big Data BigQuery Data Science Jan. 8, 2024How Google BigQuery becomes an even more powerful Data Lakehouse - Recap 2023: What were the major Updates and what can we expect in 2024?
Big Data BigQuery dbt Dec. 11, 2023Reduce DBT Incremental Materialization Compute Cost in BigQuery - utilizing partitioned tables and partition pruning to reduce BigQuery cost when using DBT.
Big Data BigQuery Oct. 9, 2023Linting BigQuery SQL with sqlfluff - Using sqlfluff to linter BigQuery queries.
Big Data Sept. 4, 2023Staying Up-to-Date with GCP: The Customizable Release Notes Solution - Stay informed with GCP Release Notes at your schedule and your preferred product with this simple deployment.
Big Data BigQuery GIS July 10, 2023Blueprints to BigQuery: A Deep Dive into Large-Scale Spatial Joins for Building Footprints - Improving data processing efficiency for Geo data in BigQuery.
Big Data BigQuery Storage July 10, 2023BigQuery Storage Billing Models - Can you save on your BigQuery Storage costs? Let’s see by exploring the different pricing models and how to use the information available.
Big Data BigQuery June 5, 2023BigQuery — Best Practices - An in-depth overview of BigQuery.
Big Data BigQuery May 15, 2023BigQuery Data Warehousekeeping: Nested, Repeated, Arrays, Structs… - Cookbook: how to organize data in your Data Warehouse.
Big Data BigQuery May 8, 2023BigQuery — keep fresh data while avoiding large-scale mutations - Avoid merge or join and use deduplication and clone in large dataset updates.
Big Data Dataplex May 1, 2023Data Profiling Using Dataplex - It’s your data but profiler knows it better. Let’s find out how?
Big Data BigQuery Data Science Python April 17, 2023Simplify Data Science Workflows on BigQuery with Fugue and Python - Speed Up Iteration and Cut Computation Cost.
Airflow Big Data Cloud Dataproc Cloud Storage March 13, 2023Event Driven Data Processing on Google Cloud Platform - An example of event-driven data pipeline.
Big Data BigQuery Feb. 13, 2023How to Deal with Wildcard Tables in BigQuery - A couple of tricks to speed up Your Data Warehousing.
Big Data BigQuery Billing Storage Feb. 6, 2023How BigQuery Physical Storage works - Calculating which BigQuery billing model for storage to use.
Big Data BigQuery Jan. 16, 2023BigQuery WINDOW Functions | Advanced Techniques for Data Professionals - A complete guide for maximizing the potential of BigQuery WINDOW functions to manipulate and transform data.
Big Data BigQuery Machine Learning Jan. 16, 2023Streamlining Machine Learning with BigQuery ML: A Comprehensive Overview - Unlocking the Power of Big Data with BigQuery ML: A Beginner’s Guide.
Big Data BigQuery Data Science Dec. 26, 2022How I use BigQuery Analytic Functions as a Data Scientist - Practical examples on how to use advanced SQL to do analyses in BigQuery.
Big Data BigQuery Dec. 19, 2022Deduplication in BigQuery Tables: A Comparative Study of 7 Approaches - Analyzing and comparing 7 ways of deduplicating rows in a BigQuery table.
Big Data BigQuery GCP Experience Aug. 29, 2022BigQuery resource management - A custom solution to monitor BigQuery.
Big Data BigQuery Aug. 22, 2022Google gives BigQuery some new UI Updates - How the new Feature makes work easier for Data Scientists und Engineers.
Big Data BigQuery Data Science July 11, 2022Awesome new Feature: Change History in Google BigQuery - Using The Append Change history TVF in BigQuery.
Big Data Cloud Dataproc June 20, 2022Big Data Processing using Google Dataproc - Google Dataproc is a very powerful option for Hadoop and Spark applications-enabled clusters.
Big Data Python June 13, 2022How to build a DAG based Task Scheduling tool for Multiprocessor systems using python - Scheduling Big Data Workloads and Data Pipelines in the Cloud with pyDag.
Big Data BigQuery Data Science June 6, 2022A Senior’s Guide to Kickstart your BigQuery Journey - Missing basics you need to know when using BigQuery.
Big Data Cloud Dataproc June 6, 2022Tuning Spark Applications to Efficiently Utilize Dataproc Cluster - Have you recently migrated your Spark application from the on-prem Yarn cluster to Dataproc? Then this blog post might help you to tune your Spark applications to efficiently utilize the GCP Dataproc and save cost.
Big Data BigQuery Cloud Functions GIS May 30, 2022BigQuery Remote Functions, Cloud Functions 2.0, and Plus Codes Revisited - Using BigQuery remote Cloud Function to convert Geo coordinates to Plus Code.
Big Data BigQuery May 9, 2022Enhancing BigQuery SEARCH features with SEARCH INDEX - A faster way to find text in unstructured text and semi-structured JSON in BigQuery.
Big Data BigQuery Data Analytics Data Science April 18, 2022Google Data Cloud Summit 2022: Recap - An overview of the many new updates coming to Google Cloud Platform!
Big Data Official Blog April 18, 2022Hands-on learning lab: Stream Google Cloud data into Splunk Cloud - Google Cloud and Splunk’s hands on lab takes you through core scenarios for data ingestion and data input in Google Cloud in 90 minutes or less.
Big Data Data Analytics Official Blog April 11, 2022Limitless Data. All Workloads. For Everyone - Read about the newest innovations in data cloud announced at Google Cloud’s Data Cloud Summit.
Big Data Data Analytics March 14, 2022Building a Data Lake on Google Cloud Platform - Big Data is gaining a lot of popularity. Here we explain how to build a big data pipeline on Google Cloud Platform using Open Source.
Big Data BigQuery Data Analytics Machine Learning March 7, 2022Predicting the Fare on a Billion Taxi Trips with BigQuery - How long time does it take and how much does it cost to analyse and train a model on a billion taxi trips in the cloud?
Apache Beam Big Data Kotlin Feb. 28, 2022Error handling with Apache Beam, Asgarde with Kotlin - In a previous article, we presented a library allowing error handling with Apache Beam with less code :.
Big Data Data Analytics Feb. 28, 2022Data Workflow Modernization - Drive transformational improvement in users’ workflows, not an incremental improvement in the tools you use.
Big Data BigQuery Feb. 7, 2022How to properly play Wordle using Dataflow and BigQuery. - This article will show you how to compute best combination of words for Wordle using Dataflow and BigQuery.
Big Data Cloud Bigtable Feb. 7, 2022Easy CSV importing into Cloud Bigtable - Importing CSV data into Bigtable with cbt tool.
Big Data BigQuery Monitoring Jan. 31, 2022Automated emails and data quality checks for your data - Formatting error messages in BigQuery email notifications.
Big Data Data Analytics GCP Experience Jan. 31, 2022Journey of Transforming and Architecting Data Platforms using Lambda Architecture - An outline of architecting Data Platforms using Lambda architecture on Google Cloud.
Big Data Data Analytics Jan. 10, 202210 reasons why you are not ready to adopt data mesh - The goal of this article is to encourage constructive conversations around Data Mesh adoption by describing where Data Mesh may not be the right solution.
Big Data Machine Learning Vertex AI Jan. 3, 2022How to set up custom Vertex AI pipelines step by step - MLOps using Vertex AI.
Big Data BigQuery NodeJS Dec. 13, 2021Retrieve your BigQuery query history with NodeJS SDK - Retrieving BigQuery history logs to understand which queries are taking the most of the billing account using BigQuery NodeJS SDK.
Beginner Big Data BigQuery Dec. 13, 2021Google BigQuery: An Introduction to Big Data Analytics Platform. - An overview of BigQuery.
Big Data Machine Learning Nov. 29, 2021From Zero to Hero with Databricks on Google Cloud - This article will walk you through the main steps to become efficient with Databricks on Google Cloud.
Big Data BigQuery Nov. 22, 2021How to extract real-time intraday data from Google Analytics 4 and Firebase in BigQuery - Bypassing automatic deletion of an intraday tables to get real time data from Firebase in BigQuery.
Big Data BigQuery Data Science Oct. 4, 2021Mathematical Functions you should know in BigQuery - How to Work with Numbers in BigQuery.
Big Data BigQuery Cloud Dataproc GCP Experience Sept. 27, 2021Comparing BigQuery Processing and Spark Dataproc - Paypal's approaches for evaluation for migrating processes from on-prem to GCP.
Apache Beam Big Data Dataflow Aug. 16, 2021Entity Resolution using Google Cloud Dataflow - This article illustrates how data platform was modernized by implementing an entity resolution pipeline using Cloud Dataflow.
Big Data BigQuery Aug. 2, 2021How to Sync data from MySQL to BigQuery - The purpose of this blog is to provide information on how data can be synced/replicated to BigQuery for data warehouse purposes.
Big Data Cloud Dataflow Cloud Pub/Sub July 5, 2021Building a simple Google Cloud Dataflow pipeline: PubSub to Google Cloud Storage - This article examines building a streaming pipeline with Dataflow templates to feed downstream systems.
Big Data BigQuery Data Science Machine Learning June 28, 2021Machine Learning with Google’s BigQuery - How to easily create and deploy ML Models with SQL.
Big Data BigQuery Data Science Public Datasets June 7, 2021Working with OpenStreetMap Data - Analyzing OpenStreetMap data in BigQuery public dataset.
Big Data BigQuery June 7, 2021Reverse US Geocoding in BigQuery - How to convert GPS coordinates into cities, counties, states and even ZIP codes for free!
Big Data BigQuery GCP Experience May 24, 2021Learnings from Streaming 25 Billion Events to Google BigQuery - Experience of using BigQuery in PayPal.
Big Data BigQuery GCP Experience April 25, 2021Hadoop to BigQuery Migration — New Edition - Process of migrating data from Impala and Hadoop to BigQuery.
Big Data Cloud Dataproc Python April 12, 2021How to migrate your on-premise pyspark jobs to GCP using Dataproc Workflow Templates using Dataproc Workflow Templates with Production-Grade Best Practices Standards - Complete pattern example of how to migrate (or create from scratch) pyspark jobs to GCP with Dataproc Workflow Templates.
Big Data BigQuery April 12, 2021How to build efficient and perfomant Data Structures in BigQuery - Ways of using Denormalization and Nested Data.
Big Data BigQuery GCP Experience Infrastructure April 5, 2021Real-Time data delivery at scale with BigQuery - Using BigQuery Authorized Views to cut storage and processing costs.
Big Data BigQuery March 29, 2021How to process large BigQuery tables/job result in a single memory machine with python - Python library to load large amount of data from BigQuery.
Big Data BigQuery Dataform March 22, 2021Saving money with BigQuery and Dataform - An easy way to reduce cost and increase performance in Data Warehouses — find out how to implement partitioning using Dataform!
Big Data BigQuery Data Science March 1, 2021BigQuery Hack: Flexible Queries For Any Number of Columns - How can we use BigQuery to handle tables with many columns? Here’s how using scripting and table metadata.
Big Data BigQuery Feb. 22, 2021BigQuery repeated fields query optimization. - Optimization techniques for BigQuery queries when table contains repeated fields.
Big Data BigQuery Feb. 15, 2021USING BigQuery’s LAST_VALUE() function to fill missing data - LAST_VALUE function explained.
Big Data BigQuery Tutorial Feb. 8, 2021A Simple Way to Query Table Metadata in Google BigQuery - Effortless approach to determine what is in the BigQuery dataset and which tables are useful for analysis with INFORMATION_SCHEMA and TABLES.
Big Data BigQuery Data Studio Firebase Feb. 8, 2021How to calculate Real Active Users. What are the numbers? - A complete SQL guide for marketers and machine learning engineers. MAU, DAU and WAU, Firebase and BigQuery example with Data Studio template.
Big Data BigQuery Feb. 1, 2021Generating Unique Keys In BigQuery - The Ideal Primary Key For Data Warehousing.
Big Data BigQuery Data Science Jan. 18, 2021BigQuery Hack: 1000x More Efficient Aggregation Using Materialized View - Learn how to supercharge your aggregation queries using Materialized View.
Big Data Cloud Dataflow Jupyter Notebook Jan. 18, 2021Computing Time Series metrics at scale in Google Cloud - This blog post shows how data scientists and engineers can use GCP Dataflow to compute time-series metrics in real-time or in batch to backfill data at scale, for example, to detect anomalies in market data or IoT devices.
AI Platform Notebooks Big Data Data Science GPU Jan. 18, 2021An Accelerated Big Data Workflow for the Data Analyst - Explore and analyze 1B loan records with RAPIDS & Nvidia A100 GPUs on Cloud AI Platform.
Big Data BigQuery Jan. 4, 2021BigTips: Removing Duplicates while Maintaining Row History - Do you have late arriving facts and have a need to maintain row history while removing duplicates in BigQuery? Come look here!
Big Data BigQuery GCP Experience Dec. 21, 2020Our way of dealing with more than 2 billion records in the SQL database - Improving performance on a big MySQL table with GCP products.
Big Data BigQuery Data Analytics Data Studio Public Datasets Dec. 21, 2020How to compute a growth rate in BigQuery using SQL - Analyzing Google Analytics public dataset with BigQuery to obtain various data.
Big Data BigQuery Dec. 14, 2020BigTips: INFORMATION_SCHEMA Views in BigQuery, Part 2, with extra Scripts and Procedures! - Making the INFORMATION_SCHEMA a little easier to use!
Big Data BigQuery Dec. 14, 2020BigTips: Random Numbers and Random Dates - Generating random numbers in a range, and random dates in BigQuery.
Big Data Cloud Dataproc Data Analytics Official Blog Dec. 7, 2020Best practices to use Apache Ranger on Dataproc - Run managed open source like Apache Hadoop and Spark in the cloud. Get tips on secure deployment with Dataproc and the Apache Ranger authorization OSS.
Big Data BigQuery Nov. 22, 2020How to de-duplicate rows in a BigQuery table - Duplicate data sometimes can cause wrong aggregates or results in joins. You probably need to remove those duplicate rows before doing any….
Big Data BigQuery Nov. 16, 2020BigTips: INFORMATION_SCHEMA Views in BigQuery - Working with INFORMATION SCHEMA views in BigQuery.
Big Data BigQuery Security Nov. 16, 2020BigQuery Authorised View verification workflow - Verify your Views in a BigQuery dataset, to make sure the Authorised Views are going to work without disrupting your ETL.
Big Data Data Analytics Docker Nov. 9, 2020A step-by-step guide deploying Amundsen on Google Cloud Platform - Amdunsen is Lyft’s Data Discovery Platform and metadata engine. It helps the data team to be more productive by saving time spent in the discovery phase — less time searching, more time finding.
Apache Beam Big Data Cloud Dataflow Oct. 26, 2020Basic Streaming Data Enrichment on Google Cloud with Dataflow SQL - Learn the basics of Streaming and Batch Data Enrichment with Dataflow SQL.
Big Data Cloud Dataproc Data Analytics Official Blog Oct. 26, 2020Preparing for serverless big data open source software - Serverless capabilities at Google Cloud continue to develop, and serverless is now meeting open source as tools like Dataproc let you build on your open foundation in the cloud.
Big Data BigQuery Sept. 28, 2020Using BigQuery to Track and Estimate Home Heating Oil Deliveries - Google Sheets, Big Query, and Public Data Sets to calculate Degree Days and K-Factor.
API Big Data BigQuery Machine Learning Sept. 7, 2020How we enabled product and pricing-availability feeds as APIs for external partners - This post demonstrates how to package your training application when it needs to connect to an external (On-Prem / Multi-Cloud) database to fetch the required source dataset.
Big Data BigQuery Data Science Aug. 31, 2020Google Cloud for Genomics - Building a scalable, reproducible, and secure data processing pipeline on the cloud.
Big Data BigQuery Data Studio Firebase Aug. 17, 2020I stopped using Firebase Dashboards. I’ve built my own instead. - Displaying Firebase Crashlytics and Performance data in Data Studio.
Big Data BigQuery Infrastructure Terraform Aug. 17, 2020Data lake on GCP using Terraform - Using Terraform to set up infrastructure-as-code for a Data Lake on Google Cloud Platform.
Big Data BigQuery Billing Aug. 10, 2020Big Data in Google Cloud — Cost Monitoring (part II) - The article explains how to analyze Billing data in BigQuery in order to get insights about most expensive queries etc.
Big Data Cloud Data Fusion Tutorial Aug. 10, 2020Building some Data Pipeline with Google Data Fusion - Step by step tutorial on start using Data Fusion and creating pipelines.
Apache Beam Big Data Cloud Dataflow Cloud Pub/Sub Java July 20, 2020Performing Deduplication in Real Time streaming pipeline with Apache Beam stateful processing - An example of doing PubSub message content deduplication in Apache Beam running on Dataflow.
Big Data BigQuery Cloud Dataflow July 6, 2020Kafka to BigQuery using Dataflow - In this article, two different methods to connect Kafka to BigQuery using Dataflow are evaluated.
Big Data Cloud Storage July 3, 2020Migrating HDFS Data to Google Cloud Storage - Moving data from Hadoop cluster to Cloud Storage with Cloud Storage Connector.
Big Data Cloud Data Fusion June 29, 2020I’m your father… Data Lineage with Cloud Data Fusion - How to use data lineage with Cloud Data Fusion, the fully managed, cloud-native, enterprise data integration service for data integration.
Big Data Cloud Dataproc June 22, 2020Sqoop Data Ingestion on GCP - Using Apache Sqoop (bulk data transfer) in Cloud Dataproc.
Big Data BigQuery GCP Experience June 15, 2020DNC Tech Choices: Why we chose BigQuery - Thoughts about migrating to BigQuery.
Big Data Cloud Dataprep Cloud Functions Serverless June 8, 2020How to Automate a Cloud Dataprep Pipeline When a File Arrives - With a better mastery of Cloud Functions, you can trigger a Dataprep job via API when a file lands in a Cloud Storage bucket.
Airflow Big Data BigQuery June 1, 2020Data Pipelines at PasarPolis using Airflow and BigQuery - Use Airflow for data orchestration on BigQuery to maintain a data warehouse.
AI Platform Notebooks Big Data Data Science Machine Learning June 1, 2020Hands-on Big Data Analysis on GCP Using AI Platform Notebooks - Example of working with AI Platform Notebooks.
Big Data BigQuery Cloud Dataproc Jupyter Notebook May 25, 2020Apache Spark BigQuery Connector — Optimization tips & example Jupyter Notebooks - Learn how to use the BigQuery Storage API with Apache Spark on Cloud Dataproc.
Big Data Data Catalog May 25, 2020Google Cloud Data Catalog — Keep Up With Your On-Prem Hive Server - Code samples with a practical approach on how to ingest metadata from an on-premise Hive server into Google Cloud Data Catalog.
Big Data Data Catalog Data Science May 18, 2020Google Cloud Data Catalog — Integrate Your On-Prem RDBMS Metadata - Code samples with a practical approach on how to ingest metadata from on-premise Relational Databases into Google Cloud Data Catalog.
Big Data BigQuery Cloud Dataproc May 18, 2020Import SQL Server data in BigQuery - A list of four approaches for a one-off data dumps from a RDBMS like SQL Server to BigQuery.
Big Data Terraform May 11, 2020Query data in Google Cloud Storage with SQL using Apache Drill - Creating an Apache Drill cluster in GCP and query data stored in GCS.
Big Data Compute Engine May 11, 2020Cloud-native Bioinformatics: HPC to GCP - Describing a process of migrating genomic analysis workflows on HPC to GCP.
Big Data Cloud Dataproc May 4, 2020Migrating Data Processing Hadoop Workloads to GCP - Intro to Dataproc as well as tips for best usage.
Beginner Big Data BigQuery April 27, 2020Introduction to Arrays in BigQuery - Tutorial on working with arrays in BigQuery.
Big Data Cloud Dataflow Data Analytics Official Blog April 20, 2020Introducing Dataflow template to stream data to Splunk - Learn how to set up a streaming pipeline for Google Cloud data into Splunk Cloud or Enterprise with this new Pub/Sub to Splunk Dataflow template.
Big Data BigQuery April 13, 2020BigQuery Materialized Views and Why You Should be Using Them - TL;DR BigQuery materialized views are great. You should use them!
Big Data BigQuery Data Analytics Python April 13, 2020Ibis: A Python Data Analysis Framework for Development and Production - An example of using Ibis (Python Data Analysis Productivity Framework) with BigQuery.
Big Data Cloud Storage Data Catalog March 28, 2020Google Cloud Data Catalog Filesets: unlock it’s full potential - Enrich your Google Cloud Storage Filesets with useful statistics about your files.
Big Data BigQuery March 23, 2020Using BigQuery Execution Plans to Improve Query Performance - Explanation of BigQuery's execution plan.
Big Data BigQuery Public Datasets March 16, 2020Processing 10TB of Wikipedia Page Views - Part 1 - Processing and uploading Wikipedia page views into BigQuery.
Big Data BigQuery Cloud Dataflow March 9, 2020Data ingestion Google Big Query without the headaches - Schema conversions on the fly without the headaches with Dataflow and BigQuery.
Big Data BigQuery GCP Experience Go March 9, 2020Loading and transforming data to BigQuery at large scale - Using serverless data loading to BigQuery to reduce daily costs $8K to $15 per day.
Big Data Cloud Bigtable Cloud Dataflow GCP Experience Feb. 24, 2020How Spotify ran the largest Google Dataflow job ever for Wrapped 2019 - Spotify used Cloud Bigtable with Cloud Dataflow to lower costs of running one of its' biggest jobs.
Big Data Business Feb. 24, 2020Snowflake announces general availability on Google Cloud - Snowflake is now available in the us-central1 (Iowa) and europe-west4 (Netherlands) regions with additional regions coming later this year.
Big Data BigQuery Data Loss Prevention API Data Studio Feb. 3, 2020BigQuery, PII and DLP: The Perfect Match - Analyzing PII data in BigQuery with Data Loss Prevention and viewing results in Data Studio.
Big Data BigQuery Jan. 13, 2020BigQuery Wildcards - The article describes how "*" wildcards can be used in BigQuery.
Big Data BigQuery Jan. 13, 2020Why We Picked Google BigQuery over Snowflake as Our New Data Warehouse Solution - Comparing BigQuery and Snowflake for Data Warehouse selection.
Big Data Cloud AutoML Kaggle Jan. 13, 2020AutoML and Big Data - Or how to use Google AutoML for 40+ GB datasets
Big Data BigQuery Dec. 23, 2019Partition on any field with BigQuery - BigQuery has introduced integer partition capability. Now you can partition on numeric field, but not only, and surprisingly!
Big Data BigQuery Dec. 23, 2019BigQuery Integer Partitioning is in Beta - Demonstrating a new BigQuery integer partition feature on New York Taxi dataset.
Big Data Data Analytics Official Blog Dec. 23, 2019Opening doors, embracing change with cloud data warehouses - Cloud data warehouse migrations bring technology changes and new ways of working for data analysts and administrators. Change management is important.
Big Data BigQuery Dec. 16, 2019k-Means Clustering in BigQuery now does better initialization - The Scalable k-Means++ initialization option in BigQuery ML
Big Data BigQuery Cloud Dataflow Data Analytics Official Blog Dec. 16, 2019Using HLL++ to speed up count-distinct in massive datasets - There’s a better way to do the count distinct function using Google’s HyperLogLog++ algorithm in Dataflow and BigQuery.
Big Data GCP Experience Dec. 9, 2019Democratizing Dataproc — dunnhumby’s journey on Google Cloud Platform - Experience of using Cloud Dataproc on Google Cloud Platform.
Big Data BigQuery Serverless Dec. 2, 2019Write efficient queries on BigQuery - A few tips which improve speed of queries in BigQuery.
Big Data Cloud Dataflow Dec. 2, 2019Trimming down the cost of running Google Cloud Dataflow at scale - Tips and tricks to lower the cost of running Dataflow pipelines
Big Data Business SAP Dec. 2, 2019Google Cloud makes moves to appeal SAP and Oracle Users - A look at a recent development at Google Cloud.
Big Data Compute Engine Puppet Python Dec. 2, 2019New ground — Automatic increase of Kafka LVM on GCP - Adding more storage to each node of Kafka cluster on Google Cloud.
Big Data BigQuery Python Nov. 25, 2019Simplify BigQuery ETL jobs using SQLAlchemy - Extract and move data between BigQuery and relational databases using a plugin for SQLAlchemy.
Big Data BigQuery Cloud Dataproc Nov. 25, 2019Querying External Data with BigQuery - Demonstration of BigQuery querying Parquet files from Google Cloud Storage.
Big Data BigQuery Data Science GCP Experience Nov. 18, 2019Batch Processing Pipelines for Better Data Analysis - An overview of how Gojek is using batch processing to generate useful insights from our data warehouse.
Big Data BigQuery Data Science Nov. 18, 2019BigQuery workflow from the Jupyter notebook - In this article, you will get to know how to create and schedule the BigQuery workflow using the Jupyter Lab and the Cloud Composer.
Apache Beam Big Data BigQuery Cloud Dataflow Nov. 4, 2019How to build a cleaning pipeline with BigQuery and DataFlow on GCP - Creating a small transformation pipeline on Dataflow to clean data in BigQuery.
Big Data BigQuery Data Science Nov. 4, 2019Let the kids into the library - An opinionated attempt at building a data driven company in the cloud.
Big Data BigQuery Nov. 4, 2019Return of the Living Data - A story about BigQuery about underlying data formats.
Big Data BigQuery Data Science Python Oct. 28, 2019How to get into BigQuery analysis on Kaggle with Python? - Exploring ways to use BigQuery in Kaggle.
Big Data BigQuery Data Studio Oct. 28, 2019Unique dashboards for external customers with Google Cloud - Using BigQuery and Data Studio to create dashboards that are shared with different persons.
Big Data Data Science Oct. 28, 2019A gentle introduction to Apache Druid in Google Cloud Platform - The article describes how to set up and use Apache Druid on GCP.
Big Data Oct. 28, 2019Deploying a Production Druid Cluster in Google Cloud Platform - A process of setting Apache Druid Cluster on GCP.
Apache Beam Big Data Java Oct. 28, 2019Testing in Apache Beam Part 1: Batch - A look into how to write unit and end to end tests in Beam.
Big Data BigQuery Official Blog Oct. 21, 2019Migrating data warehouses to BigQuery: Introduction and overview - Solution series that helps you transition from an on-premises data warehouse to BigQuery.
Big Data BigQuery Official Blog Teradata Oct. 21, 2019Migrating Teradata to BigQuery - Solution series that helps you transition from a Teradata data warehouse to BigQuery.
Big Data IoT Oct. 14, 2019IoT Data Pipelines in GCP, multiple ways — Part 1 - Three part series about IoT Data pipelines in Google Cloud Platform.
Big Data BigQuery Oct. 14, 2019Plus Codes (Open Location Code) and Scripting in Google BigQuery - A closer look at why Plus Codes are important, and using Google BigQuery scripting to encode them!
Apache Beam Big Data BigQuery Oct. 6, 2019Type safe BigQuery in Apache Beam with Spotify’s Scio - Using Scala's Beam library for type-safe queries in BigQuery.
Big Data BigQuery Sept. 30, 2019BigQuery DeDuplication — Window Function vs Group by For Stitch - Comparing the performance for BigQuery Deduplicate using window function vs group by. If you are using stitch you can do delete in BQ also.
Big Data Security Sept. 30, 2019Help secure the pipeline from your data lake to your data warehouse - This article discusses the security controls designed to help manage data access to and prevent data exfiltration of the pipeline from data lake to data warehouse.
Big Data BigQuery Teradata Sept. 23, 2019Teradata to Google BigQuery Migration. Converting the code - This article provides instructions on how to extract the schema of tables, views and SQL Queries from Teradata and convert it into BigQuery.
Big Data BigQuery Sept. 16, 2019End-to-End Crypto Shredding (Part II): Data Deletion/Retention with Crypto Shredding - Crypto-deletion in various storages in GCP.
Big Data BigQuery Sept. 16, 2019BigQuery Deduplication - Explore some techniques for deduplication in BigQuery both for the whole table and by partition.
Big Data BigQuery Sept. 16, 2019A Journey into BigQuery Fuzzy Matching — 3 of [1, ∞) — NYSIIS - Another article in ongoing series about fuzzy matching in BigQuery.
Beginner Big Data BigQuery Sept. 9, 2019The Caveat of Loading Data to Partitioned Table on BigQuery - Table Partitioning and Why
Apache Beam Big Data BigQuery Cloud Dataflow Sept. 2, 2019Trimming down over 95% of your BigQuery costs using File Loads - Using BigQuery load jobs in Beam instead of streaming to reduce costs.
Big Data BigQuery Aug. 19, 2019A Journey into BigQuery Fuzzy Matching — 2 of [1, ∞) — More Soundex and Levenshtein Distance - Doing fuzzy matching in BigQuery on first and last names.
Big Data BigQuery Aug. 19, 2019Finding top programming language with BigQuery - Analyzing Github public dataset with BigQuery's to get most popular programming languages based on number of repositories.
Big Data BigQuery Aug. 19, 2019Tips and Tricks to Seamlessly Migrate BigQuery Dataset Across Regions - Description of cross regional BigQuery data migration.
Big Data BigQuery Data Analytics Official Blog Aug. 12, 2019Migrating Teradata and other data warehouses to BigQuery - Migration framework and architecture when moving data warehouse, like Teradata, to Google Cloud BigQuery.
Big Data BigQuery Aug. 5, 2019Efficient Aggregation, Roll-ups with BigQuery HyperLogLog++ functions - Description of incremental count distinct processing using BigQuery’s HyperLogLog++ functions and how they provide fast, scalable, incremental processing properties.
Big Data Data Analytics Machine Learning July 29, 2019Beginners Introduction to Data Lifecycle on Google Cloud Platform - Description of 4 categories of data lifecycle on GCP.
Big Data BigQuery Data Science Java July 15, 2019Beast: Moving Data from Kafka to BigQuery - GOJEK’s open source solution for moving data from Kafka to Google BigQuery.
Big Data Data Analytics Data Catalog Data Science July 8, 2019Google Cloud Data Catalog hands-on guide: templates & tags with Python - This quickstart guide brings a practitioner approach to Data Catalog, covering Templates & Tags management using the Python client library.
Big Data BigQuery July 8, 2019BigQuery for Big Data and AI - A brief intro to start working with BigQuery.
Big Data Data Analytics July 1, 2019Data and Analytics on Google Cloud Platform - Overview of data and analytics services available on Google Cloud Platform.
Big Data BigQuery June 24, 2019Optimising queries in BigQuery for Beginners - Learn what BigQuery contains under the hood and how to run efficient queries from a public session dataset in this step by step guide.
Big Data BigQuery Data Analytics GCP Experience June 17, 2019A Song of Data and Fire: Building Bnext Wall (Data Lake) - Process of building data lake on Google Cloud Platform.
Big Data BigQuery Official Blog June 17, 2019Building hybrid blockchain/cloud applications with Ethereum and Google Cloud - This post describes applications for making internet-hosted data available inside an immutable public blockchain by placing BigQuery data available on-chain using a Chainlink oracle smart contract.
Big Data Official Blog Storage June 10, 2019Announcing Snowflake on Google Cloud Platform - Snowflake (cloud-based data warehouse) will be available on GCP.
Big Data BigQuery Tutorial June 3, 2019How to easy understand Analytics Functions on BigQuery - An in-depth explanation of analytical BigQuery functions.
Big Data BigQuery June 3, 2019Loading Terabytes of Data From Postgres Into BigQuery - The article describes approaches of exporting data from PostgreSQL and loading into BigQuery.
Big Data BigQuery Cloud Dataprep Machine Learning June 3, 2019BigQuery GIS + ML on government open data - Analyzing & visualizing housing data using BigQuery.
Big Data Cloud Data Fusion Kubernetes June 3, 2019Journey Continues — Onward and Upwards! - A brief overview of things that are going on around CDAP (Data Fusion).
Apache Beam Big Data Cloud Dataflow Cloud Pub/Sub Machine Learning May 27, 2019Game of Thrones Twitter Sentiment with Keras, Apache Beam, BigQuery and PubSub - End to end solution to analyze Tweets using GCP products.
Big Data Cloud Data Fusion May 27, 2019Building a Data Lake on Google Cloud Platform with CDAP - Using CDAP (Cask Data Application Platform) on GCP.
Big Data Official Blog May 27, 2019Delivering end-to-end data analytics and data management solutions with Informatica - We’re extending our strategic partnership with Informatica to help more enterprises take advantage of hybrid and multi-cloud data management solutions.
Big Data Official Blog Storage May 6, 2019Principles and best practices for data governance in the cloud - The white paper which outlines best practices and guidelines for organizations to establish data governance in a cloud-first world.
Big Data Cloud Data Fusion April 29, 2019Google Data Fusion - Cloud Data Fusion is the brand-new fully-managed data engineering product from GCP. It will help users to efficiently build and manage…
Big Data Docker Tutorial April 15, 2019Deploy Spark on Google Cloud, (Docker+Swarm) - Deploying Spark cluster on Google Cloud using Docker containers and with Docker-compose.
Big Data BigQuery Cloud Dataflow April 15, 2019From data ingestion to insight prediction: Google Cloud smart analytics accelerates your business transformation - Cloud Next '19 news in more detail related to analytics products.
Big Data BigQuery GCP Experience April 1, 2019Reflections On Designing An Enterprise Data Warehouse - Description of process for Data warehouse development on Google Cloud using BigQuery.
Big Data BigQuery Official Blog March 25, 2019Analyzing 3024 rice genomes characterized by DeepVariant - Exploring Rice genome dataset using BigQuery.
Big Data Python March 11, 2019Enlightened DataLab Notebooks - Starting with Data Science on GCP.
Big Data BigQuery Cloud Marketplace R March 4, 2019RStudio and BigQuery in under 30 minutes - Article describes steps to provision an RStudio instance on Google Compute Engine and use it to do complex analytics on BigQuery.
Big Data March 4, 2019What is Google Snappy? High-speed data compression and decompression - Pros and cons of using Snappy (data compression library from Google) for compression.
Big Data BigQuery Cloud Composer GCP Experience March 4, 2019How did we build a Data Warehouse in six months? - Sharing experience of creating data warehouse on Google Cloud Platform.
Apache Beam Big Data Cloud Dataflow Official Blog Feb. 25, 2019Real-time diagnostics from nanopore DNA sequencers on Google Cloud - A scalable, reliable, and cost effective end-to-end pipeline for fast DNA sequence analysis built on Google Cloud and this new class of nanopore DNA sequencers.
Big Data Cloud Security Command Center Security Feb. 25, 2019Google Cloud Platform Security Operations Center Data Lake - Some thoughts regarding security when building data lake on Google Cloud Platform.
Big Data Google Cloud Platform Official Blog Jan. 28, 2019Google is named a leader in the 2019 Gartner Magic Quadrant for Data Management Solutions for Analytics - Gartner named Google a Leader in the 2019 Gartner Magic Quadrant for Data Management Solutions for Analytics (DMSA).
Big Data Compute Engine Jan. 7, 2019Deploying PySpark ML Model on Google Compute Engine as a REST API - Step-by-step tutorial on Deploying PySpark ML Model on Google Compute Engine.
Big Data Nov. 26, 2018How to capture and store tweets in Real Time with Apache Spark and Apache Kafka. Using cloud Platforms such as Databricks and GCP (Part 1) - Capture and store tweets in Real Time with Apache Spark and Apache Kafka.
Big Data Cloud Dataflow Cloud Datalab Python Serverless June 18, 2018Analyzing Reddit’s Top Posts & Images With Google Cloud (Part 1) - Analyzing everything from Reddit.
Big Data Business May 21, 2018Cask is joining Google Cloud - Cask is behind CDAP - open source big data integration platform.
Big Data Cloud Datalab Cloud Pub/Sub Cloud Storage May 21, 2018Data Science for Startups: Data Pipelines - Example of creating data pipeline on Google Cloud Platform.
Apache Beam Big Data May 14, 2018GCP Podcast - #126 Beam and Spark with Holden Karau
Big Data March 26, 2018Public datasets: how nonprofits can drive social impact with planetary-scale data - Public datasets are freely hosted and accessible via Google BigQuery and Cloud Storage.
Big Data Business March 26, 2018Room to Grow on the Big Data Maturity Curve - Report on Big Data ecosystems.
Big Data Business Official Blog March 19, 2018Solutions : Build a Marketing Data Warehouse on Google Cloud Platform - Using fictional online cosmetics retailer as example of how to leverage Google Cloud Products to get key insights.
Big Data Official Blog March 5, 2018How to handle mutating JSON schemas in a streaming pipeline, with Square Enix - Explore how Square Enix supports handling of mutating JSON schemas in a streaming pipeline.
Big Data Machine Learning TensorFlow Nov. 20, 2017Automating ML and IoT with cloud-based image rendering, training, and device delivery - Architectural solutions for 3D rendering and machine learning.
Big Data Teradata Nov. 20, 2017Transitioning from Data Warehousing in Teradata to GCP Big Data - Article describes how you can transition from on-premises and cloud data warehousing to Google Cloud Platform.
Big Data Sept. 11, 2017Plumbing Big Data Pipelines - Qubit (provides personalization for companies when communicating with customers) describe their experience different Google Cloud Platform products
Big Data Cloud Dataproc Aug. 20, 2017Easier integration with Apache Spark and Hadoop via Google Cloud Dataproc Job IDs and Labels - Best practices to use Job IDs and labels
Big Data Machine Learning July 31, 2017New hands-on labs for scientific data processing on Google Cloud Platform - 7 new labs to try out Google Cloud Platform Big Data and Machine Learning products to solve real-world scientific problems using a variety of public datasets.
Big Data July 24, 2017Moving Thumbtack’s data infrastructure to Google Cloud Platform - Moving data from PostgreSQL and MongoDB to Google Cloud Dataproc and BigQuery
Big Data Cloud Bigtable July 3, 2017How Qubit deduplicates streaming data at scale with Google Cloud Platform - How Qubit solved issue regarding duplicated streaming data using Google Cloud Platform products
Big Data July 3, 2017GCP Podcast - #83 Public Datasets with Mike Hamberg and Will Curran
Big Data Cloud Dataflow July 3, 2017Introducing Cloud Dataflow Shuffle: For up to 5x performance improvement in data analytic pipelines
Big Data BigQuery June 26, 2017The Google Data WareCity - Interesting and unique aspects of BigQuery’s data sharing capability
Big Data BigQuery June 26, 2017GCE BigQuery vs AWS Redshift vs AWS Athena - Basic comparison on data loading and simple queries between Google BigQuery and Amazon Redshift and its cousin Athena.
Big Data Cloud Dataflow June 19, 2017Visualization and large-scale processing of historical weather radar (NEXRAD Level II) data - Processing historical weather data for visualization with Cloud Dataflow
Big Data Business May 8, 2017That giant sucking sound? Hadoop moving into the cloud - Companies are starting to move their Hadoop environments to Google Cloud Platform because of simplicity, stability, maturity
Big Data BigQuery April 10, 2017BI Performance Benchmarks with Google BigQuery
Big Data Cloud Dataflow March 27, 2017Google Cloud Dataflow In the Smart Home Data Pipeline - Handling data from Nest devices via Google Cloud Dataflow
Big Data March 13, 2017Visualizing Big Data with Google Cloud
Big Data BigQuery PubSub March 6, 2017Combining Thomson Reuters data with Google BigQuery and Google Cloud Pub/Sub API - Proof of concept to analyze data with BigQuery ingested from Reuters API
Big Data March 6, 2017Data Science on the Google Cloud Platform: the first book - Interview with Valliappa Lakshmanan author of upcoming book Data Science on Google Cloud Platform
Big DataBuilding a Data Lake on GCP with CDAP - First look on Google-acquired Cask’s open source platform.
Useful Links
Contact
Třebanická 183
Prague, Czech Republic
Phone: +420 777 283 075
Email: [email protected]