What is Data Analysis?
Data Analysis is a process that involves understanding or defining the problem, identifying the needed data sources, gathering, cleaning, and exploring the gathered data sources, performing exploratory data analysis, and identifying patterns and trends in the data to gain insights and present the findings to stakeholders clearly and concisely.
How do you clean the data?
Some of the important techniques that help to clean the data are listed below.
- Remove duplicates
- Handle missing values
- Using standard formats for data
- Correct Typos and Misspellings
- Remove irrelevant or noisy data
Language Used in Data Analysis
Although many languages can be used for data analysis, some of the popular ones are listed below.
- SQL: Used for Database Querying and cleaning the data
- Python: Used for Data Manipulation
- R: Used for Statistical analysis and visualization
Visualization tool used in Data Analysis
Below are some of the visualization tools used in data analysis.
- Tableau
- Power BI
- Seaborn Library
- Matplotlib
- Excel
Techniques for Missing or Incomplete Data
There are many techniques or approaches to deal with missing data. Some of the important ones are listed below.
- Delete the observations with missing data
- Assigning the missing value with mean or median with the help of imputation or interpolation technique
- Machine learning algorithms to predict the missing values
- Checking with the data source to resend the data
Statistical models used in Data Analysis
Some of the common and important statistical models used in Data analysis are listed below.
- Linear Regression
- Logistic Regression
- Decision Trees
- Random Forest
- Clustering
- Cross Validation Techniques