Course Duration in Hours
30
30
Topic
1
01/27/2017
Basic Statistics and R. we will cover basic statistical concepts with a brief review of R, a language very much used by statisticians. This course will use Python much more that R but we want to acknowledge the importance of this language.
2
02/03/2017
Relationships and Representations, Graph Databases. We will use Neo4J graph database to represent relationships existing among objects in IT space, concepts and words in spoken languages. Various types of relationship discovery and representations are essential in solving many of our data analysis problems. Neo4J and similar technologies will make our understanding of complex problems much easier.
3
02/10/2017
Introduction to Spark 2.0. Spark 2.0 replaced Hadoop as the dominant mainstream framework for processing of large data volumes on large computational clusters. Initially, we will learn how to formulate our calculations so that they could process big data in batch mode. We will discuss setup of Spark clusters.
4
02/17/2017
Language processing with Spark 2.0. Processing large volumes of textual data is very important step in many business analysis applications. We will learn how to combine tools for natural language processing with computational efficiency of Spark 2.0. In this lecture we will introduce one of NoSQL databases for fast storage and retrieval of big volumes of textual data.
5
02/24/2017
Analysis of Streaming Data with Spark 2.0. While many applications could profit handsomely from batch processing of large volumes of data, some application must process a lot of data practically in real time. Spark provides its Streaming API as a powerful tool for such scenarios. In this lecture we will introduce a special messaging system (Kafka) which is a necessary buffer between actual data sources and Spark processing engine.
6
03/03/2017
Applications of Spark ML Library. Spark comes with a Machine Learning (ML) API, which allows us to perform many routing ML task at Spark speed. We will learn how to select use cases or scenarios in which Spark ML library is the most appropriate tool.
7
03/10/2017
Basic Neural Network and Tensor Flow. Neural Networks and Deep Learning are emerging as the highest precision tools for many large scale classification and pattern recognition problems. We will learn how to use Tensor Flow both on GPU and CPU machines.
03/17/2017
Spring Break
8
03/24/2017
Advance Tensor Flow. We will analyze some more complex configurations of Neural Networks and also learn how to integrate NN engines into practical systems for large scale analysis. In particular we will learn how to integrate NNs with fast NoSQL storage systems like Mongo DB and Cassandra.
9
03/31/2017
Assessing Quality of Big Data Analysis. We will learn standard procedures for accessing quality of ML algorithms. We will learn also learn how to access precision of other large scale calculations.
10
04/07/2017
Analysis of Images, OCR Applications. Analysis of images and pattern recognition are part and parcel of many applications. We will learn how to use some standard
API-s to perform such analysis at big data speed.
11
04/14/2017
Analysis of Speech Signal. Many intelligent devices can now speak back to us. We will learn how to build large scale systems that can process speech in real time.
12
04/21/2017
Question Answer Systems are the true test of our ability to build intelligent machines. We will learn how to build such systems.
13
04/28/2017
Page Rank like Search systems. Searching through large volumes of textual data at very high speed is what made Google.com possible. We will learn how such systems are build and analyze possibilities to search through large volumes of sound and video data.
14
05/05/2017
Analysis of Streaming Data with Tensor Flow, VoltDB, Data Flow Engines and other memory databases. We need to understand comparative advantages of different technologies for processing of fast moving data.
15
05/12/2017
Final Project Presentations
Engineering, BE, B-TECH
Empower IT, Ambattur (Chennai),Chennai,IN