Tag: Streaming

Cloud Dataproc Data Analytics Official Blog Streaming Nov. 18, 2024

Dataproc Serverless: Now faster, easier and smarter - Dataproc Serverless now offers faster performance with native query execution in the Premium tier, improving query performance by ~47% in tests. It also introduces a built-in Spark UI for seamless monitoring and troubleshooting, eliminating the need for setting up and maintaining persistent history servers.

BigQuery Data Analytics Official Blog Streaming Oct. 14, 2024

BigQuery tables for Apache Iceberg: optimized storage for the open lakehouse - BigQuery tables for Apache Iceberg, a fully managed, Apache Iceberg-compatible storage engine from BigQuery, offer optimized storage for the open lakehouse. It provides features like autonomous storage optimizations, clustering, and high-throughput streaming ingestion.

BigQuery Data Analytics Official Blog Streaming Oct. 14, 2024

Using BigQuery Omni to reduce log ingestion and analysis costs in a multi-cloud environment - BigQuery Omni helps reduce the cost of log analytics in multi-cloud environments by eliminating the need for Apache Spark workloads and providing a unified querying process across cloud providers. It offers reduced engineering and compute resources, as well as lower egress costs.

Apache Flink Data Analytics Official Blog Streaming Oct. 14, 2024

Real-time data for real-world AI with support for Apache Flink in BigQuery - BigQuery Engine for Apache Flink, now in preview, provides a serverless real-time intelligence platform. It allows users to easily migrate existing streaming applications relying on Apache Flink to Google Cloud without code rewriting or third-party services.

Cloud Dataflow Data Analytics Official Blog Streaming Oct. 7, 2024

Mastering Dataflow: 5 In-Depth Guides to Real-World Applications - Google Cloud's Dataflow offers a range of solutions for real-time data processing. These include machine learning and generative AI, ETL and integration, log replication and analytics, marketing intelligence, and clickstream analytics. Each solution guide provides an overview, detailed sketch, and link to a comprehensive guide with code samples and best practices. With Dataflow's scalability, flexibility, and reliability, developers can build real-time solutions efficiently.

Billing Cloud Dataflow Data Analytics Official Blog Streaming Sept. 9, 2024

Cut costs and boost efficiency with Dataflow's new custom source reads - Dataflow's new custom source reads feature helps cut costs and boost efficiency in streaming environments by better distributing workloads and proactively relieving overwhelmed workers with load balancing.

Airflow Cloud Composer Data Analytics Official Blog Streaming Aug. 26, 2024

Scalable alerting for Apache Airflow to improve data orchestration reliability and performance - This guide reviews the hierarchy of alerting on Cloud Composer and various alerting options available to Google Cloud engineers using Cloud Composer and Apache Airflow.

Cloud Dataflow Data Analytics GCP Experience Official Blog Streaming Aug. 19, 2024

Yahoo compares Dataflow vs. self-managed Apache Flink for two streaming use-cases - Yahoo compared the cost and performance of Apache Flink in a self-managed environment and Google Cloud Dataflow for two streaming use cases: writing Avro to Parquet and data enrichment and calculation. Dataflow was found to be around 1.5 - 2 times more cost-effective than Flink, primarily due to the Streaming Engine's ability to handle heavy computations, resulting in fewer required vCPUs and more consistent throughput.

Data Analytics Official Blog Streaming Aug. 19, 2024

Try the new Managed Service for Apache Kafka and take cluster management off your todo list - Google Cloud has launched a new Managed Service for Apache Kafka, which simplifies the process of running an Apache Kafka cluster. The service takes care of infrastructure management, security, networking, and scaling, allowing users to focus on building and running their applications. It offers built-in security features, automated network design, and flexible sizing options.

Airflow Cloud Composer Data Analytics Official Blog Streaming Aug. 12, 2024

Announcing Apache Airflow operators for Google generative AI - Apache Airflow now has operators to interact with Vertex AI's generative models. These operators enable the integration of Vertex AI's generative models into data pipelines orchestrated by Apache Airflow and Cloud Composer.

BigQuery Data Analytics Official Blog Streaming Aug. 12, 2024

Real-time in no time: Introducing BigQuery continuous queries for up-to-the-minute insights - BigQuery continuous queries, now available in preview, enables real-time data analysis and event-driven processing using SQL. It simplifies real-time pipelines, unlocks AI use cases, streamlines reverse ETL, and provides scalability and performance. With BigQuery continuous queries, businesses can gain real-time insights, make informed decisions, and deliver exceptional customer experiences.

Cloud Load Balancing Infrastructure Networking Streaming Aug. 5, 2024

Load Balancing Blitz — data pipeline - This blog post explores a near real-time data pipeline to gather metrics for a demo game called Load Balancing Blitz. Pub/Sub, BigQuery, and Looker were used to ingest, process, and visualize data in real-time.

Airflow Cloud Composer Data Analytics Official Blog Streaming July 29, 2024

Understanding Airflow DAG and task concurrency on Google Cloud Composer - Airflow DAG and task concurrency are crucial for optimizing Cloud Composer performance. This guide provides comprehensive insights into concurrency settings across four levels: Composer environment, Airflow installation, DAG, and task. By understanding these settings, you can ensure efficient resource utilization, scalability, and fault tolerance in your data pipelines.

Data Analytics Databases Datastream Official Blog Streaming July 29, 2024

Datastream’s SQL Server source is generally available - Datastream, a serverless change data capture (CDC) and replication service, now supports SQL Server as a source for replicating data to BigQuery, Cloud Storage, and other Google Cloud destinations. Key enhancements include change tables CDC, stream recovery, gcloud API and Terraform support, and server-side SSL/TLS encryption.

Analytics Hub Cloud Pub/Sub Data Analytics Official Blog Streaming July 15, 2024

Share your streaming data with Pub/Sub topics in Analytics Hub - Analytics Hub now supports sharing Pub/Sub topics, enabling organizations to curate, share, and monetize their streaming data assets. By leveraging Analytics Hub Exchanges and Listings, businesses can logically categorize and group sets of Pub/Sub topics and provision access at scale.

Cloud Dataproc Data Analytics Official Blog Streaming July 15, 2024

Deployment patterns for Dataproc Metastore on Google Cloud - This blog post explores four DPMS deployment patterns: a single centralized multi-regional DPMS, centralized metadata federation with per-domain DPMS, decentralized metadata federation with per-domain DPMS, and ephemeral metadata federation. Each pattern has its own advantages and disadvantages, and the best choice for an organization will depend on its specific needs and requirements.

Data Analytics Datastream Official Blog Streaming July 8, 2024

Announcing new stream recovery capabilities for Datastream - Datastream stream recovery enables quick resumption of data replication with minimal to no data loss in scenarios like database failovers or network outages.

Data Analytics Datastream Official Blog Streaming June 24, 2024

Simplify historical data tracking in BigQuery with Datastream's append-only CDC - Datastream's append-only mode simplifies change data capture by preserving every change as a new row in your target BigQuery table. It offers cost efficiency, improved data accuracy, and real-time insights. With append-only mode, businesses can maintain a historical record of changes, track data modifications, and gain deeper insights from their data.

Cloud Dataflow Data Analytics Official Blog Streaming June 10, 2024

Boost developer productivity with new pipeline validation capabilities in Dataflow - Dataflow pipeline validation is now generally available. It performs dozens of checks to ensure that your batch or streaming job is error-free and can run successfully.

BigQuery Cloud Dataflow Official Blog Streaming June 3, 2024

Accelerating CDC insights with Dataflow and BigQuery - This post covers how to use BigQuery’s new CDC capability in Dataflow along with the new Dataflow at-least-once streaming mode to simplify your CDC pipeline and reduce costs.

AWS Cloud Pub/Sub Data Analytics Official Blog Streaming June 3, 2024

Easily stream data from AWS Kinesis to Google Cloud with Pub/Sub import topics - Pub/Sub import topics enable streaming ingestion into BigQuery from external sources, with the first supported external source being Amazon Kinesis Data Streams. Import topics provide a simplified way to ingest data from Amazon Kinesis Data Streams directly into Pub/Sub, reducing the complexity of setting up data pipelines between clouds. Once the connection is established, Amazon Kinesis producers can be gradually migrated to Pub/Sub publishers. Data from Amazon Kinesis Data Streams can be routed to BigQuery using BigQuery subscriptions, and Pub/Sub autoscales to adapt to changes in the Amazon Kinesis data stream.

Cloud Dataflow Data Analytics Official Blog Streaming May 27, 2024

More flexibility for your Dataflow jobs with new controls for latency versus cost - Dataflow Streaming Engine users can now choose between lower peak latency or lower streaming costs for their workloads by adjusting the autoscaling utilization hint value. The autoscaling hint value can be set to a higher or lower value using a Dataflow service option. Dataflow’s autoscaling UI provides insights on when it’s worth adjusting the autoscaling behavior and additional dashboards and metrics to monitor the impact of changes.

Data Analytics Official Blog Streaming May 27, 2024

Google Data Cloud innovations for continuous real-time intelligence - Google Cloud offers innovations for continuous real-time intelligence, enabling organizations to harness real-time analytics and make informed decisions. With Dataflow, BigQuery, and Apache Kafka for BigQuery, enterprises can leverage streaming infrastructure for visibility, predictions, and activation. Customers like Spotify, Puma, Compass, and Tyson Foods have achieved significant business impact using Google Cloud's data, AI, and real-time solutions.

Cloud Dataflow Official Blog Streaming May 20, 2024

No work items left unturned: How Dataflow mitigates stragglers

 

Latest Issues




Contact

Zdenko Hrček
Třebanická 183
Prague, Czech Republic
Phone: +420 777 283 075
Email: [email protected]