Data Science and Machine Learning

160 Hours

Data Science and Machine Learning

Through this module, you become an expert in Data Management, Visualization, statistics, Machine Learning, Neural networks, AI, Deep Learning, Big Data Hadoop, structured & unstructured data analysis. Proficiency in programming and some experience coding in Python and R language will be added advantage.


Data Science is the future of Artificial Intelligence. Machine learning is the science of getting computers to take action without being explicitly programmed. This Course will provide you in-depth knowledge about programming fundamentals using Python and language, Data Structures, Statistics, OOPS, threading and socket-programming, DBMS, Linux, AWS, Data Science, Data Analytics, Data Visualization, Matplotlib, Seaborn, Numpy, Pandas, Scikit learn Tableau, ELK, MapReduce, HDFS, Webhdfs, YARN, HBase, CASSANDRA, MongoDB,  Deep Learning, Linear and Logistic Regression, Supervised Learning, Unsupervised Learning, Flume, Kafka, Sqoop, HIVE, pyspark, API integration, Automation using Oozie and zookeeper. Python, as well as R, is the most popular languages for Data Science and Machine Learning and in the name of Big Data Solutions, we will be covering Hadoop and Cloud Computing (AWS, GCP, and AZURE).

Why take Training in Data Science?

Machine learning and Data science are acquisition the world- and with that, there is an increasing need among companies for professionals to know the ins and outs of machine learning. You can build your career as a Python Developer, Bigdata Hadoop developer, Machine Learning Engineer, Analytics managers, Business analysts, Information architects.
  • points to a report by Glassdoor that the average salary of a data scientist is $118,709.
  • Randstad reports that pay hikes in the analytics industry are 50% higher than the IT industry
  • The machine learning marketplace size is estimated to grow from USD 1.02 Billion in 2016 to USD 8.82 Billion by 2022, at a Compound Annual Growth Rate of 44.1% during the calculation period.


  • Data Types
    • Numbers, Strings, List
  • Operators
    • Arithmetic Operators
    • Comparison (Relational) Operators
    • Assignment Operator
    • Logical Operators
    • Bitwise Operators
    • Membership Operators
    • Identity Operators
  • Conditional Statement
    • if-else, nested if-else
  • Loops
    • While loop
    • for loop
  • Functions
    • Built-in Functions, User Define functions
    • Recursion
    • Closures and Decorators
    • Generators
  • File Handling
    • Basic File Handling Tasks such as reading, writing and appending data into files
    • Advance file Handling using Serializer and Deserializer
    • JSON, XML, YAML file handling
    • Using File Handling for Personal Database Management System
  • OOPS
    • Paradigm of Object Oriented Programming
    • Encapsulation, Abstraction and Data Hiding
    • Inheritance
    • Polymorphism and Over-riding
    • Objects and classes
    • Meta classes
    • Abstract Classes
    • Slots
  • Exception Handling
    • Built-in Exception meaning, detection and handling
    • Raising Custom Exceptions
    • Creating Custom Exception Classes for advance Exception Handling
  • Brief Tour to Standard Library
    • os and sys modules
    • shutil, glob modules
    • regular expressions using re module
    • math and random modules
    • statistics module
    • urllib and request module for Internet Access
    • smtplib module for mails
    • datetime, time modules for time series data
    • zlib module for data compression
    • timeit module for performance Measurement
    • sqlite3 for small database management
  • Debugging
    • doctest module
    • unittest module
    • pdb debugger
  • Threading and Socket Programming
  • Scripting for system Automation tasks
  • Algorithms
    • Searching and Sorting Algorithms
    • Complexity of Algorithms
    • Advance Algorithms intro
  • Flow Charts
    • Flow charts using UML diagrams
  • Data Structure
    • Link-list, stack, queue, heap, trees and graphs
    • Python Dictionaries and lists as data structure
    • Building Custom Data Structure using Python Classes, List and Dictionary
    • Sets, tuples and frozen sets in Python
    • Array, Matrix and Data Frames
  • Installation
    • R Base Software
    • Exploring CRAN
    • Rstudio IDE of R language
    • Connecting R kernel to Jupyter notebook
    • Setting up Environment Variables for R
  • Data Structures
    • Scalars
    • Vectors
    • Matrix
    • Array
    • Lists
    • Data Frames
    • Tables
  • Operators
    • Arithmetic Operators
    • Comparison (Relational) Operators
    • Assignment Operator
    • Logical Operators
  • Conditional Statements
    • if-else, nested if-else
    • switch statement in R
  • Control Statements
    • while loop
    • for loop
    • repeat loop
  • Functions in R
    • Built-in Functions
    • User Defined Functions
    • Recursion
  • Brief History Databases
    • Introduction to Database Management System
    • sql and no sql databases
    • pro and cons of RDBMS
    • pro and cons of ODBMS
  • Installation of DBMS server
    • Installation on Windows
    • Installation on Linux
    • Creating Databases
    • User & Permissions
  • Structure Query Language
    • DDL statements
    • DML statements
    • DCL statements
    • Creating, Updating and Altering Tables
    • Aggregation Functions
    • Where, in , like, limit, order by, as, between clauses
    • Multiple table Queries
    • Nested Queries
  • SQL Joins
    • Inner Joins
    • Outer Joins
    • Left Outer Join
    • Right Outer Join
  • Normalization in Databases
    • First normal form
    • Second normal form
    • Third normal form
  • Views in Database
    • Create views
      Update Views
    • Drop Views
    • Alter Views
  • UDF and Triggers
    • Creating and Using User Define Functions
    • Creating Triggers
  • Advance DBMS
    • File Organisation and Database Indexes
    • Data Warehousing and Mining
    • Database Optimization
    • Database Connectivity with Python and R language
    • Backup and Restore
  • All about Web
    • Introduction to Web and Http Requests
    • MIME types and Headers
  • Scrap Web Site
    • Scraping data using request and urllib module of python
    • Advance Scrapping using BeautifulSoup Library of python
    • Data Scrapping using APIs and making own APIs
    • Creation of Rest APIs using json
  • Introduction to Web Designing
    • Introduction to html, css and bootstrap
    • Introduction to apache server and CGI-Scripting
    • Backend Development using Flask web Framework of Python
    • Introduction To Django Web Framework of Python
  • Descriptive Statics
  • Data Collection Techniques
    • Primary Data Collection
    • Secondary Data Collection
  • Data Classification Techniques
    • Geographical Classification
    • Chronological Classification
    • Qualitative Classification
    • Quantities Classification
  • Central Tendency
    • Discrete and Continues Data
    • No of Classes, Class Intervals, Mid-value, Range
    • Frequency Table
    • Less than and Greater than Cumulative Frequency table
  • Measures of Central Tendency ( Mathematics of Statistics)
    • Mathematical Average
      • Arithmetic mean
      • Geometric mean
      • Harmonic mean
    • Average of Position
      • Median
      • Quartile
      • Decile
      • Percentile
      • Mode
      • Variance
      • Standard Deviation
  • Selection of Averages
  • Data Wrangling in R
    • Stats in R
    • Summary commands
    • cbind, rbind, merge, subset, sort, order, group
    • reshape2 package - melt, dcast
    • tidyr package - gather, spread, unite, seprate
    • sqldf package - Database management using R, inner join, left join, right join, select, other sql queries
    • dplyr - select, filter, mutate, arrange, group_by, summarise, bind, bind_cols, bind_rows, intersect, union, setdiff, setequal, left_join, inner_join
    • transform , apply, cut
  • Data Wrangling in Python
    • numpy arrays
    • pandas Module for data wrangling
    • scipy Module scientific calculations of central tendency
    • statistics module for stats
    • sympy module for symbolic representation



  • Data Visualization in R
    • Scatter plot
    • Bar Plot
    • Histograms
    • Box Plot
    • Stack Plot
    • ggplot library for beautiful and meaning full graphs
    • saving plots
  • Data Visualization in Python
    • Scatter, Bar, Histogram and Box Plot
    • matplotlib module
    • seaborn module
    • opencv module
  • Supervised Machine Learning
    • Linear Regression
    • Polynomial Regression
    • Decision Trees
    • SVM
  • Unsupervised Machine Learning
    • Clustering
    • Anomaly Detection
    • Neural Networks
  • Introduction to Linux
    • History, Installation, a Word about OpenSource
    • Accessing the Command line
    • Managing Files from Command Line
    • Basic Commands of Linux
  • Users and Permissions in Linux
    • Creating users and groups in linux
    • Sudoers and super privileges
    • File Permissions in Linux
    • Changing Ownership and Permissions of files
  • Processes, Daemons and Logs in Linux
  • Networking in Linux
    • Ipv4 and Ipv6, gateway, subnets
    • DNS
    • DHCP
  • Client-Server Architecture
    • Remote access using ssh, putty
    • Deploying ftp server
    • Deploying Database Server and accessing in client machine
    • Deployment of Apache Server
  • Scripting in Linux
    • Basics of Shell Scripting
    • If-else, loops and regular expression
    • System Automation using Shell Scripting
    • Automation using Python + Shell Scripts using CGI
  • Introduction to AWS services
    • ec2 instance creation
    • lambda service
    • EMR service
    • Beanstalk service
  • Introduction to GCP and Azure
  • Introduction to Big Data
    • History of Big Data
    • Introduction Hadoop Architecture
  • Installation of Hadoop
    • Single node cluster Installation
    • Multi-node cluster Installation
    • Working on Cloudera Distributed Hadoop (CDH)
  • Hadoop Distributed File System in Detail (HDFS)
  • Resource management using Yet Another Resource Negotiator (YARN)
  • Map-Reduce frame work of Hadoop
    • Job Tracker
    • Task Tracker
    • Input, Input split
    • Mapping
    • Shuffling and sorting
    • Combiner and Reducer
  • Data injection Tools
    • Sqoop
    • Flume
    • Kafka
  • Data Warehousing Tools
    • Hive
    • Cassendra
    • Hbase
    • Kafka
  • Data Processing Engines
    • MapReduce
    • Hive
    • Pig
    • Tez & Impala
  • Work Flow and Job Scheduler
    • OOZIE
    • Zookeeper


  • Spark Core
  • Spark SQL
  • PySpark
  • Machine Learning on Spark
  • Spark on Data Bricks (AZURE)
  • Tableau making data processing and analysis easy
  • Microsoft Power BI Business Intelligence tool
  • Movie Recommendation System: This is a very interesting project where we have a large data set of movies and there reviews, using big-data technologies and data wrangling tools of R and Python we will build a Movie Recommendation System to predict movies to watch on the behalf of their content, review, and genre.
  • H1B visa analysis for job assistance: In this project, we will analyzing dataset of 30 lakh petitions fired for h1b visa analysis since the year 2011. First, we will wrangle the data to particular form so we can predict top 15 hiring companies, top companies according to salary as well as we analyze year by petitioned fired, certified, withdrawn and denied a petition. Also, we will be looking into data to find top 15 worksites where people had applied in past as well as top 15 job titles for which most petition was fired.
  • Fraud Detection: Be it emails, text messages, transactions or spoken word, fraud detection can be used. To know that an email is fake or a transaction is shady requires more than human intelligence. This application has potential uses in a lot of domains and is an extremely important part of any service.
  • Market Basket Analysis: You are a data scientist (or becoming one!), and you get a client who runs a retail store. Your client gives you data for all transactions that consists of items bought in the store by several customers over a period of time and asks you to use that data to help boost their business. Your client will use your findings to not only change/update/add items in inventory but also use them to change the layout of the physical store or rather an online store. To find results that will help your client, you will use Market Basket Analysis (MBA)which uses Association Rule Mining on the given transaction data.
  • Case Study 1: Google
    Google constantly develops new products and services that have big data algorithms. Google uses big data to refine its core search and ad-serving algorithms. Google describes that the self-driving car as a big data application
  • Case Study 2: LinkedIn
    LinkedIn is a business-oriented social networking service. Founded in December 2002 and launched in 2003, it is mainly used for professional networking. LinkedIn uses big data to develop product offerings such as people you may know, jobs you may be interested in, who has viewed my profile and more.
  • Case Study 3: trivago is well known for global hotel search platform. It Provides customers approximately 1.3 million hotels in over 190 countries. The platform itself is accessed globally via 55 localized websites and apps in 33 languages. Faster response to customers is vital and trivago thrives on how it can analyze and extract performance insights from digital experience data collected globally from its sites and systems in real time. 

Course Features

Provide Training Certificates, Internship Letter and Red Hat Participation Certificates.
Support on a daily basis and one to one support with Experienced & Certified Trainers.
 Support on a daily basis and one to one support with Experienced & Certified Trainers.
Students will get placement assistance after the summer training gets completed .
Interview preparation with Mock Interview sessions and HR Round skills will be complimentary for students .
Digital notes,assignments , Soft Copies and PDF's will be provided.
Exam prepration of respective global cetification in the course is included in the training .
Weekly test series will be conducted to enhance competitive environment.
Seminars by professionals and industry experts will be conducted to explain the live working in a company and industry standard working tools.


Our Summer Internship modules are designed in such a way that you don’t need to have any specific prior knowledge .Whatever knowledge is required, will be delivered during the training itself. Only your enthusiasm and your will to do is required !
Students (BCA, MCA, B. Tech, M. Tech, MSc-it etc.) who want to make their careers in any IT field , want to do mandatory internship/ training prescribed under the university provisions or the one who has the will to learn and utilise summer time /lockdown time can attend this.
All the Mentors are Certified Industry Experts with vast experience in implementing real-time solutions on different queries related to different topics. They will share their personal industry experience with you while connecting with you .
No doubt , our Online Training Conduction Pattern is same as our Classroom Training pattern. Whether it is the curriculum , way of teaching , way of providing practical exposure ,assignments or projects to the students , we abide by the principle of same teaching pattern in both the training .
For sure , our concern is to pay individual attention and assistance to the students . Hence , you can feel free to ask queries , extra time , doubt solving sessions and assistance in making projects
You just need to have laptop/desktop and proper net connection so that these online training sessions can be conducted without hampering the flow of learning .Proper internet connection speed is required so that disturbance and technical glitches can be avoided during the training .
Candidates need not worry about losing any training session. They will be able to take their missing sessions in extra time by mentor. We also have a technical team to assist the candidates in case they have any query.
Before registering , you can attend one FREE WEBINAR where you can decide to join the course or not. After that, if you are enrolled/ registered in classes and/or have paid fees, but want to cancel the registration due to certain reason, it can be attained within 72 hours of initial registration. Please make a note that refunds will be processed within 30 days of prior request.
Yes, we do provide Placement Assistance with our training courses. You will get assistance in getting job references regarding particular technology and stream of IT . If you’re an undergraduate , then also you can get assistance after you are a graduate by our placement team.

You can enroll to this program following the application process mentioned here:-

Depending upon the area of interest, a candidate can opt the course.

We have limited seats; you can make the payment in the payment link which gets generated to your registered email. 

You will get E-Mail and whole the registration process there.

We do have Cash/ Card/ Paytm/ Google pay etc payment option.

You can pay your fees in installments also.

Reach out to / 9001997178/ 9772165018 in case you do not have a provision to make an online payment or you have any query.

Grras Register

Apply Now For Course

Here You can apply for your Internship program

Grras Register

Have More Queries

If You're confused, which track to chose?

1 Year Diploma Program

Absolutely FREE & 100% JOB GUARANTEE

Get training on Linux, Ansible, Devops ,Python , Networking , AWS and Openstack Cloud by Certified Trainers at GRRAS. You would be able to get the best training along with the interview preparation in this course module .