Spark 和 Hadoop开发人员培训(CCA)
课程介绍
4天的课程包涵了解Apache Spark的基础知识及其与Hadoop整体生态系统的集成方式。本课程将重温HDFS的基础内容,学习如何使用Sqoop/Flume摄取数据,利用Spark处理分布式数据,学习在Impala和Hive上数据建模,以及在数据存储方面的*实践。
• How data is distributed, stored, and processed in a Hadoop cluster
• How to use Sqoop and Flume to ingest data
• How to process distributed data with Apache Spark
• How to model structured data as tables in Impala and Hive
• How to choose the best data storage format for different data usage patterns
• Best practices for data storage
课程目标
• 课程完成时,你将需要参加 CCA Spark 和 Hadoop 开发人员认证。该认证证明了其核心开发人员的技能水平,并且可以编写和维护 Apache Spark 和 Apache Hadoop 项目。
适合人群
• 面向具有 Scala 和 Python 编程经验的开发人员。熟悉 Linux 命令行。强烈建议不熟悉 Hadoop 的人员参加该培训。
• This course is designed for developers and engineers who have programming experience. Apache Spark examples and hands-on exercises are presented in Scala and Python, so the ability to program in one of those languages is required. Basic familiarity with the Linux command line is assumed. Basic knowledge of SQL is helpful; prior knowledge of Hadoop is not required.