- What is Data Science?
- Statistics
- Central tendency
- Variability
- Hypothesis testing
- Anova
- Correlation
- Regression
- Probability
- Joint probabilities
- Bayes theorem
- Mathematics
- Linear Algebra
- Calculus
- Integral transformations
- Vector algebra
- Vector calculus
- Matrices and vector spaces
- Information theory
- Databases
- Database Types and Concepts
- SQL vs NoSQL
- Data Modeling and Database Design
- Data Cleaning and Transformation
- Database Optimization and Performance
- Data Integrity and Quality Checks
- Integration with Tools and Languages
- Type Of Data
- Structured Data
- Unstructured Data
- Semi-structured Data
- Data Manipulation & Analysis
- Data Extraction
- Data Extraction vs Data Mining
- Role of Extract, Transform, Load (ETL)
- Data Wrangling / Data Cleaning / Data Munging
- Data Visualisation
- Tools
- Tableau
- Google Charts
- Dundas BI
- Power BI
- JupyteR
- Infogram
- ChartBlocks
- D3.js
- FusionCharts
- Grafana
- Data Modeling
- Exploratory Data Analysis (EDA)
- Big Data
- Overview
- Engineering with Hadoop
- Overview
- Ecosystem
- HDFS Architecture
- MapReduce
- Yarn
- Hive
- HBase
- Pig
- Engineering with Spark
- Introduction to Spark
- Working with RDDs in Spark
- Aggregating Data with Pair RDDs
- Writing and Deploying Spark Applications
- Parallel Processing
- Spark RDD Persistence
- Spark MLlib
- Integrating Apache Flume and Apache Kafka
- Spark Streaming
- Improving Spark Performance
- Spark SQL and Data Frames
- Scheduling/Partitioning in Spark
- Data Processing & Analysis
- Stream vs Batch Processing
- Apache Flink
- Apache Storm
- Distributed Storage Systems
- HDFS
- Data Warehousing
- Amazon Redshift
- Data Lakes
- Apache Delta Lake
- Data Science with R
- Overview
- R packages
- Sorting DataFrame
- Matrices and vectors
- Reading data from external files
- Generating plots
- Analysis of Variance (ANOVA)
- K-means clustering
- Association rule mining
- Regression in R
- Analyzing relationship with regression
- Advanced regression
- Logistic Regression
- Advanced Logistic Regression
- Receiver Operating Characteristic (ROC)
- Kolmogorov-Smirnov chart
- Database connectivity with R
- Integrating R with Hadoop
- Data Science with Python
- Overview
- Python packages
- Pandas
- Introduction
- Creating Objects
- Viewing Data
- Selection
- Manipulating Data
- Grouping Data
- Merging, Joining and Concatenating
- Working with Date and Time
- Working With Text Data
- Working with CSV and Excel files
- Operations
- Visualization
- Numpy
- Introduction
- Ndarray
- Datatypes
- Arrays
- Matplotlib
- Introduction
- Seaborn
- Introduction
- Scikit-learn
- Introduction
- Statsmodels
- Introduction
- SciPy
- Introduction
- TensorFlow
- Introduction
- PyTorch
- Introduction
- Keras
- Introduction
- NLTK (Natural Language Toolkit)
- Introduction
- Miscellaneous
- Data Engineer vs Data Analyst vs Data Scientist vs Machine Learning Engineer
- Data Science Resources
DATA SCIENCE
Subscribe to:
Posts (Atom)