Back

Anup Thatal

IT enthusiastic | Fu... • 1d

Basic of Statistics for data science Statistics is one of the most crucial subjects for the students. It has various methods that are helpful to solve the most complex problems of real life. Statistics is almost everywhere. Data science and data analysts use it to have a look on the meaningful trends in the world. Besides, statistics has the power to drive meaningful insight from the data. Statistics offers a variety of functions, principles, and algorithms. That is helpful to analyze raw data, build a Statistical Model and infer or predict the result. Terminologies in Statistics Before getting started with data science; we have to be well aware of the key statistical terminologies. The population: It is the set of the given sources from which the data has to be collected. There can be a huge number of population. Sample: It is the subset of data that is extracted from the given Population. Variable: It is the characteristics, number, or quantity of the data that can be measured or counted. In other words the variable is the data item. statistical model: The statistical model is also known as statistical Parameter or population parameter. Types of Analysis Statistics has two types of analysis.I Quantitative Analysis: Quantitative Analysis is also known as statistical analysis. It is the science or an art of collecting and interpreting data with numbers and graphs. We also use it to identify patterns and trends. Qualitative Analysis: Qualitative is also known as Non-Statistical Analysis. It gives generic information. It also uses text, sound and other forms of media. Data Types Numerical: Numerical data types are those data types which are expressed with digits. These data types are measurable. There are two major types of data types i.e. discrete and continuous. Categorical: Categorical data types are qualitative data and it is classified into categories. There are two types of major categorical data types i.e. nominal (no order) or ordinal (ordered data). Measures of Central Tendency Mean: Means stands for the average of the given dataset. Median: Median is the middle of the given ordered dataset. Mode: Mode is the most common value in a given dataset. It is the only relevant for discrete data. Measures of Variability Range: Range is the difference between the maximum and minimum value in a given dataset. Variance (σ2): Variance measures how spread out a set of the given data is relative to the mean. Standard Deviation (σ): It is also a measurement of how spread out numbers are in the given data set. Square root of variance is also known as standard deviation. Z-score: Z score determines the number of standard deviations a data point is from the mean. R-Squared: R square is a statistical measure of fit. It used to indicate how much variation of a dependent variable is explained by the independent variable(s). We can use it only for the simple linear regression. Adjusted R-squared: It is similar to the R squared and also R square modified version. It has been adjusted for the number of predictors in the model. It decreases if the old term improves the model more than would be expected by chance and vice versa. Measurements of Relationships between Variables Covariance: If we want to find the difference between two variables then we use the covariance. It is based on the philosophy that if it is positive then they tend to move in the same direction. Or if it’s negative then they tend to move in opposite directions. There will also be no relation with each other, if they are zero. Correlation: Correlation is all about to measure the strength of a relationship between two different variables. It ranges from -1 to 1. It is the normalized version of co-variance. Most of the time the correlation of +/- 0.7 represents a strong relationship between two different variables. On the other hand, there is no relationship between variables when the correlations between -0.3 and 0.3 Probability Distribution Functions Probability Density Function (PDF): It is for continuous data. Hereby in the continuous data the value at any point can be interpreted as providing a relative likelihood. In addition, the value of the random variable will also be equal to that sample. Probability Mass Function (PMF): In the probability mass function for a discrete data. It also gives the probability of a given occurring value. Cumulative Density Function (CDF): The cumulative density function is used to tell us the probability that the random variable is less than a certain value. In addition is also the integral of the PDF.

Reply
3

More like this

Recommendations from Medial

Raghu Vinay C

One more step ! • 3m

My Greetings, Does anyone know statistical data on gold imports and mining in india, every year till 2024. Does anyone have statistics data on it.

Reply
5
Image Description
Image Description

Riya Sharma

Hey I am on Medial • 3m

"Hey everyone! I'm a data analyst with expertise in data visualization, statistical analysis, and data reporting. If you're looking for help with data analysis, feel free to reach out to me! #dataanalysis #statistics #datavisualization

2 Replies
1
3

Mantra Pandya

Learning is real wea... • 11m

Day3 of 100 days of code Today work Completed assigments of python Functions Learn statistics Probability

Reply
3

Comet

#freelancer • 25d

Statistical interview questions for entry-level data analyst roles in an MNC. 1. Explain the difference between mean, median, and mode. When would you use each? 2. How do you calculate the variance and standard deviation of a dataset? 3. What is ske

See More
Reply
3

Om Badoni

B.tech CSE | A.I | M... • 1y

🚀 Just explored C++ basics! 🚀 🔹 Variables: Store data. 🔹 Scope: Where variables can be accessed. 🔹 Datatypes: Define data type (e.g., int, float, char). Next up: Data Structures and Algorithms (DSA)! Any tips? Drop them in the comments! 💬 #

See More
Reply
3

Devak K

Hey I am on Medial • 4m

Types of Data Breaches in Cyber Security | Digitdefence Understanding the types of data breaches in cyber security is crucial to protecting your business. Knowing these breach methods helps you implement stronger defenses and safeguard sensitive inf

See More
Reply
1
Image Description
Image Description

Mukul Chauhan

Driving Manufacturin... • 1y

Hey Guys, Statistics on mobile data usage in India: Why Bangalore is lagging??

9 Replies
6

Chamarti Sreekar

Passionate about Pos... • 1m

The Ministry of Statistics and Programme Implementation (MoSPI) announced India's inaugural monthly unemployment data, revealing a 5.1% unemployment rate for April

Reply
10
Image Description
Image Description

Chamarti Sreekar

Passionate about Pos... • 1m

Meta AI collects over 90% of user data types.

2 Replies
19
Image Description
Image Description

gopalsheth

Engineer | Founder @... • 7m

If Startup Idea Is Easy To Do The Probability Of Coping Chances Are High.

4 Replies
4

Download the medial app to read full posts, comements and news.