トレーニングコース

Data Engineering on Google Cloud Platform

  • 4日間
  • 有料

*このコースは現在準備中です。開催が決定次第ご案内させていただきます。

トレーニング概要

Data Engineering on Google Cloud Platform は、4日間かけて GCP のビッグデータサービスを利用して、データの加工や分析を行うための設計方法を学習するトレーニングです。BigQuery, Cloud Bigtable, Cloud Dataproc, Cloud Dataflow, Cloud ML Engine がそれぞれどのような性質を持っていて、ユースケースごとにどのサービスがマッチしているのか、そしてどのように組み合わせればよいかを学習します。

対象者

プレゼンテーション、デモ、ラボを組み合わせて、データ処理システムの設計、エンドツーエンドのデータパイプラインの構築、データの分析、機械学習の方法を学びます。 このコースでは、構造化、非構造化、ストリーミングの各データを扱います。

前提知識

Google Cloud Platform Fundamentals : BigData and Machine Learning 相当の知識基本的な SQL の知識が必要です。基本的な Python の知識が必要です。機械学習もしくは統計の知識が必要です。

当日必要なもの

  • 最新版の Chrome がインストールされた PC
  • GCP請求先アカウントを追加

コース内容

Day1 Serverless Data Analysis

  • Module 1: Serverless data analysis with BigQuery
    • What is BigQuery.
    • Queries and Functions.
    • Lab: Writing queries in BigQuery.
    • Loading data into BigQuery.
    • Exporting data from BigQuery.
    • Lab: Loading and exporting data.
    • Nested and repeated fields.
    • Querying multiple tables.
    • Lab: Complex queries.
    • Performance and pricing.
  • Module 2: Serverless, autoscaling data pipelines with Dataflow
    • The Beam programming model.
    • Data pipelines in Beam Python.
    • Data pipelines in Beam Java.
    • Lab: Writing a Dataflow pipeline.
    • Scalable Big Data processing using Beam.
    • Lab: MapReduce in Dataflow.
    • Incorporating additional data.
    • Lab: Side inputs.
    • Handling stream data.
    • GCP Reference architecture.

Day2 Leveraging unstructured data

  • Module 3: Google Cloud Dataproc Overview
    • Creating and managing clusters.
    • Leveraging custom machine types and preemptible worker nodes.
    • Scaling and deleting Clusters.
    • Lab: Creating Hadoop Clusters with Google Cloud Dataproc.
  • Module 4: Running Dataproc Jobs
    • Running Pig and Hive jobs.
    • Separation of storage and compute.
    • Lab: Running Hadoop and Spark Jobs with Dataproc.
    • Lab: Submit and monitor jobs.
  • Module 5: Integrating Dataproc with Google Cloud Platform
    • Customize cluster with initialization actions.
    • BigQuery Support.
    • Lab: Leveraging Google Cloud Platform Services.
  • Module 6: Making Sense of Unstructured Data with Google’s Machine Learning APIs
    • Google’s Machine Learning APIs.
    • Common ML Use Cases.
    • Invoking ML APIs.
    • Lab: Adding Machine Learning Capabilities to Big Data Analysis.

Day 3: Serverless Machine Learning

  • Module 7: Getting started with Machine Learning
    • What is machine learning (ML).
    • Effective ML: concepts, types.
    • ML datasets: generalization.
    • Lab: Explore and create ML datasets.
  • Module 8: Building ML models with Tensorflow
    • Getting started with TensorFlow.
    • Lab: Using tf.learn.
    • TensorFlow graphs and loops + lab.
    • Lab: Using low-level TensorFlow + early stopping.
    • Monitoring ML training.
    • Lab: Charts and graphs of TensorFlow training.
  • Module 9: Scaling ML models with Cloud ML Engine
    • Lab: Why Cloud ML Engine?
    • Packaging up a TensorFlow model.
    • End-to-end training.
    • Lab: Run a ML model locally and on cloud.
  • Module 10: Feature Engineering
    • Creating good features.
    • Transforming inputs.
    • Synthetic features.
    • Preprocessing with Cloud ML Engine.
    • Lab: Feature engineering.

Day 4: Resilient streaming systems

  • Module 11: Architecture of streaming analytics pipelines
    • Stream data processing: Challenges.
    • Handling variable data volumes.
    • Dealing with unordered/late data.
    • Lab: Designing streaming pipeline.
  • Module 12: Ingesting Variable Volumes
    • What is Cloud Pub/Sub?
    • How it works: Topics and Subscriptions.
    • Lab: Simulator.
  • Module 13: Implementing streaming pipelines
    • Challenges in stream processing.
    • Handle late data: watermarks, triggers, accumulation.
    • Lab: Stream data processing pipeline for live traffic data.
  • Module 14: Streaming analytics and dashboards
    • Streaming analytics: from data to decisions.
    • Querying streaming data with BigQuery.
    • What is Google Data Studio?
    • Lab: build a real-time dashboard to visualize processed data.
  • Module 15: High throughput and low-latency with Bigtable
    • What is Cloud Spanner?
    • Designing Bigtable schema.
    • Ingesting into Bigtable.
    • Lab: streaming into Bigtable.

開講スケジュール

現在調整中です。

TEL. 03-5840-8815 03-5840-8815 平日 10:00~19:00