The talk will focus on the various run time challenges that we experienced while dealing with BigData and the scalable solution that we built using various Amazon Cloud services such as EMR,Redshift and Amazon S3.
The talk will introduce the audience with Amazon Elastic MapReduce (EMR) ,fully managed hosted Hadoop framework on top of Amazon Elastic Compute Cloud (EC2) and Amazon Redshift,fast, fully managed,cost-effective, petabyte-scale data warehouse service.It will cover how to deal with processed data using Hadoop, with the processed data being so huge that it will create a bottleneck for traditional relation databases like Mysql and Oracle. We will analyse the solution to this problem using Amzon Redshift. In addition we will discuss cost effectiveness of Amazon EMR and Redshift when dealing with Big Data of few hundred gigabytes to a petabyte size.
Our AdSever clients generate daily 5TB user logs ( logs of Requests, Impressions and Clicks etc ) . We process these logs using EMR and we store processed output in Amazon Redshift cluster . Our Redshift cluster currently holds around 10 TB processed data which is available for various end user reports.