The talk will focus on the various run time challenges that we experienced while dealing with BigData and the scalable solution that we built using various Amazon Cloud services such as EMR,Redshift and Amazon S3.

The talk will introduce the audience with Amazon Elastic MapReduce (EMR) ,fully managed hosted Hadoop framework on top of Amazon Elastic Compute Cloud (EC2) and Amazon Redshift,fast, fully managed,cost-effective, petabyte-scale data warehouse service.It will cover how to deal with processed data using Hadoop, with the processed data being so huge that it will create a bottleneck for traditional relation databases like Mysql and Oracle.  We will analyse the solution to this problem using Amzon Redshift. In addition we will discuss cost effectiveness of Amazon EMR and Redshift when dealing with Big Data of few hundred gigabytes to a petabyte size.

Our AdSever clients generate daily 5TB user logs ( logs of Requests, Impressions and Clicks etc ) . We process these logs using EMR and we store processed output in Amazon Redshift cluster . Our Redshift cluster currently holds around 10 TB processed data which is available for various end user reports.


Sandesh Deshmane is an Application Architect at Talentica Software. Since three years, he has been solving big data problems for the Mobile Advertising client who generates Tera bytes of data. He has worked on various technologies like Java, J2EE, Hadoop, Hive, etc. since the past 7 years. He is always fascinated by new technologies and emerging trends in software development.

Big Data Analytics using Amazon Elastic MapReduce and Amazon Redshift
Tagged on: