By Mark Grover,Ted Malaska,Jonathan Seidman,Gwen Shapira
Get specialist assistance on architecting end-to-end information administration recommendations with Apache Hadoop. whereas many resources clarify the way to use quite a few elements within the Hadoop atmosphere, this useful ebook takes you thru architectural concerns essential to tie these elements jointly right into a whole adapted program, in keeping with your specific use case.
To make stronger these classes, the book’s moment part presents precise examples of architectures utilized in probably the most usually stumbled on Hadoop purposes. no matter if you’re designing a brand new Hadoop software, or making plans to combine Hadoop into your latest facts infrastructure, Hadoop software Architectures will skillfully advisor you thru the process.
This booklet covers:
- Factors to contemplate whilst utilizing Hadoop to shop and version data
- Best practices for relocating info out and in of the system
- Data processing frameworks, together with MapReduce, Spark, and Hive
- Common Hadoop processing styles, similar to elimination reproduction files and utilizing windowing analytics
- Giraph, GraphX, and different instruments for giant graph processing on Hadoop
- Using workflow orchestration and scheduling instruments equivalent to Apache Oozie
- Near-real-time move processing with Apache hurricane, Apache Spark Streaming, and Apache Flume
- Architecture examples for clickstream research, fraud detection, and information warehousing