Table of Contents

##
Data analysis methods in research

Basically I can say there are two methods for data analysis.Quantitative data analysis, and Qualitative data analysis.

,Quantitative is method is where you are dealing with numbers and you what to derive mathematical and statistical analyses from your data, by tabulations and cross tabulations.

,Qualitative is the method of analyzing text/transcripts or your focus groups data.

And this is achieved by finding common themes within the transcripts.

##
Data analysis techniques PDF

By strong foundation I assume that you want to understand the principles of statistics, and not just how to apply it.I give very different advice for people who want to jump feet-first into data analysis.

,Most of the self-taught statisticians I know started by learning about Bayesian inference.

In short, statistics is traditionally taught from the frequentist perspective, but its usually taught in a way that emphasizes formulas rather than concepts.

The best way to learn the concepts behind statistics is to start by learning the Bayesian approach, which is the most intuitive approach to statistics.

Once youve learned the principles of Bayesian inference, you will be ready to start analyzing data in a principled way; do so as soon as possible! Now, after youve gotten your hands dirty with some practical data analysis, youll have a much better context to understand (and appreciate) many of the commonly used frequentist techniques like confidence intervals, p-values, cross-validation, permutation tests, and regularized estimation.

(For an extremely detailed and readable account about the differences between the frequentist and Bayesian approaches to statistics, see this amazing answer by Jason Eisner.

),Given this focus on learning the concepts and intuition behind statistical data analysis, I suggest the following roadmap for learning Bayesian inference:,,Review the fundamentals of probability.

Measure theory is not needed, but independence, expectation, variance, covariance, correlation, conditional expectation/conditional distribution/conditional independence, central limit thoerem, Markovs inequality, and moments are a must.

Also learn why moment-generating functions and characteristic functions are important (but not necessarily why they work.

) A fairly standard textbook is Durretts Probability: Theory and Examples.

Many statistics textbooks (e.

g.

the first four chapters of Statistical Inference) will also contain a review of probability.

,While youre reviewing probability, take every chance to get familiar with the following important families of probability distributions as you encounter them: Normal/Gaussian, exponential, uniform, binomial, multinomial, Poisson, chi-squared (or the entire Gamma family), and beta.

Its less likely that youll run into any inverse-Gamma, inverse-Wishart, or Dirichlet distributions in the wild, but those are particularly important for Bayesian inference.

Note also that all of the above distributions (except for the uniform distribution) are examples of exponential families.

The reasons might not be immediately apparent, but as a statistician youll surely learn to use and love exponential families.

n,Brush up on calculus and linear algebra: e.

g.

integration by parts, very simple multivariate calculus (gradients and hessians), eigenvalues and singular-value decomposition.

Taylor expansion will become your best friend if you start learning more classical statistics.

Also there are some fairly specialized calculus tricks which are used in statistics all the time: e.

g.

the fact that the integral of an unnormalized pdf of a {gamma, normal, exponential} density gives you the inverse normalizing constant.

n,While youre learning math, try implementing every idea you learn in code, and visualizing the results, in either python, Matlab/Octave, or R.

It doesnt matter which of the three you pick: if youre like me, youll eventually end up learning all three of them.

Learning how to code and produce visualizations will not only prepare your programming skills for data analysis--its also one of the best ways to build an understanding of the mathematics on an intuitive level.

,After youve learned a good chunk of probability, it becomes relatively easy to learn Bayesian inference.

You can either go with a fairly standard textbook like Bayesian Data Analysis or start with the most hardcore philosophical Bayesian text, Probability: the Logic of Science, which is especially entertaining to read because the author constantly trash-talks mainstream statisticians (not that all of it is justified!) In any case, know whats a prior distribution, whats a posterior distribution, how to use Bayes rule to compute the posterior given data.

Also know how Bayesian inference can be used to minimize your average loss under uncertainty.

Some other useful (but less essential) facts: what are conjugate distributions, and the fact that the posterior mean minimizes mean-squared error.

,Most importantly, go out and do some data analysis! Find problems you care about and start thinking up models for your data, check if you get reasonable results, and if not, think up some more models!,One you start analyzing real data, you will likely find yourself doing a lot of data cleaning and pre-processing.

Its worth spending the time to figure out the best way to clean data, because like it or not, its something youll have to do routinely as an applied statistician or data scientist.

,Exploratory data analysis is key for analyzing real datasets.

Lean how to use tools like scatterplots, Q-Q plots, histograms, clustering and principal component analysis, and techniques like looking at the residuals of your model.

Decision trees provide yet another way to visualize your data!,After youve gotten the hang of building models, you will run into computational difficulties with more complicated models.

At this point youll have to learn some Markov Chain Monte Carlo (or, just learn how to use a probabilistic programming language like Stan.

) When things get really tough, you might want to start looking at machine learning or Frequentist side of statistics.

Non-Bayesian techniques are extremely useful for practical data analysis (and often required for publications), but learning the principles behind those techniques is a whole other story.

.

.

##
What is data analysis

Data analysis is the method or methods that can be used to analyze data and the process of analyzing it.There are many different forms of data, but people usually think of quantitative data first.

These are data such as census data or survey data.

There is also scientific data, such as data physicists collect about the cosmos.

These days, most people are interested in analyzing the vast amounts of data collected through transactions with a variety of companies and/or websites.

Often, though, people forget about qualitative data, which is data that has not yet had any form imposed on it by humans.

,Big Data is a term for data sets so large and/or complex that they are difficult to process using traditional data processing software.

Difficulties include analysis, capture, metadata, storage, search, sharing, transfer, visualization, and privacy.

Hadoop is an application that helps manage big data.

,Youll need to develop skills in all aspects of data analysis.

Learn how to manage data sets and how to manipulate them.

Learn to computing new variables, merge data sets, recode data and so on.

Learn about flat file databases and hierarchical databases and relational databases and how to build software or programs that manage them all.

Learn Hadoop in order to learn to manage big data.

,Learn all kinds of statistical techniques such as OLS, HLM, SEM, missing data estimation techniques, Facet analysis, Social Network Analysis, longitudinal analysis techniques and on and on -- as much as you can stomach.

Learn to use programs that perform statistical analyses, such as SPSS, Stata, SAS and R, just so you can use whatever tool you need when you need it.

,Learn qualitative data analysis techniques, as well.

Learn Atlas.

ti or NVivo, and gain some experience with qualitative studies.

Maybe do a dissertation on methodological techniques and the relationship between qualitative and quantitative methods.

,You can learn these skills on your own, reading text books and online classes or tutorials.

However, it probably makes more sense to learn in college or graduate school in the statistics or computer science departments.

##
Data analysis example

Qualitative research data may be case histories or interviews, for example.It will involve describing your research, the results, and your analysis as a narrative (essay, paragraphs, chapters).

Details on what is expected should be provided to you by your professor and thesis advisor.

,A style guide will applyu2014perhaps APAu2014but I donu2019t know which one, since I donu2019t know what field you are in.

,Good luck!

##
Types of data analysis

For sure EDA (Explorative Data Analysis) is one of those.Then you can apply algorithms such as Classification, Clustering, Regression, Anomaly detection etc.

.

Each of those can be an homonymous type of data analysis.

##
Tools for data analysis in research example

Both Python and R have vast software ecosystems and communities, so either language is suitable for almost any data science task.That said, there are some areas in which one is stronger than the other.

,Where Python ExcelsThe majority of deep learning research is done in Python, so tools such as Keras and PyTorch have Python-first development.

You can learn about these topics in Introduction to Deep Learning in Keras and Introduction to Deep Learning in PyTorch.

,Another area where Python has an edge over R is in deploying models to other pieces of software.

Python is a general purpose programming language, so if you write an application in Python, the process of including your Python-based model is seamless.

We cover deploying models in Designing Machine Learning Workflows in Python and Building Data Engineering Pipelines in Python.

,Python is often praised for being a general-purpose language with an easy-to-understand syntax,Where R ExcelsA lot of statistical modeling research is conducted in R, so theres a wider variety of model types to choose from.

If you regularly have questions about the best way to model data, R is the better option.

DataCamp has a large selection of courses on statistics with R.

,The other big trick up Rs sleeve is easy dashboard creation using Shiny.

This enables people without much technical experience to create and publish dashboards to share with their colleagues.

Python does have Dash as an alternative, but itu2019s not as mature.

You can learn about Shiny in our course on Building Web Applications with Shiny in R.

,Rs functionality was developed with statisticians in mind, thereby giving it field-specific advantages such as great features for data visualization.

,This list is far from exhaustive and experts endlessly debate which tasks can be done better in one language or another.

Further, Python programmers and R programmers tend to borrow good ideas from each other.

For example, Pythons plotnine data visualization package was inspired by Rs ggplot2 package, and Rs rvest web scraping package was inspired by Pythons BeautifulSoup package.

So eventually, the best ideas from either language find their way into the other making both languages similarly useful & valuable.

,If youu2019re too impatient to wait for a particular feature in your language of choice, its also worth noting that there is excellent language interoperability between Python and R.

That is, you can run R code from Python using the rpy2 package, and you can run Python code from R using reticulate.

That means that all the features present in one language can be accessed from the other language.

For example, the R version of deep learning package Keras actually calls Python.

Likewise, rTorch calls PyTorch.

,Beyond features, the languages are sometimes used by different teams or individuals based on their backgrounds.

,Who Uses PythonPython was originally developed as a programming language for software development (the data science tools were added later), so people with a computer science or software development background might feel more comfortable using it.

,Accordingly, transition from other popular programming languages like Java or C++ to Python is easier than the transition from those languages to R.

,Who Uses RR has a set of packages known as the Tidyverse, which provide powerful yet easy-to-learn tools for importing, manipulating, visualizing, and reporting on data.

Using these tools, people without any programming or data science experience (at least anecdotally) can become productive more quickly than in Python.

,If you want to test this for yourself, try taking Introduction to the Tidyverse, which introduces Rs dplyr and ggplot2 packages.

It will likely be easier to pick up on than Introduction to Data Science in Python, but why not see for yourself what you prefer?,Overall, if you or your employees dont have a data science or programming background, R might make more sense.

,Wrapping up, though it may be hard to know whether to use Python or R for data analysis, both are great options.

One language isnu2019t better than the otheru2014it all depends on your use case and the questions youu2019re trying to answer.

Finally, Iu2019ll share the first bit of this a handy infographic comparing the two languages.

I donu2019t want to include it all as itu2019s very long and would require too much scrolling, but you can download the full image here.