HyperStat Online Statistics Textbook
© 1993-2005 David M. Lane, Associate Professor of Psychology, Statistics, and Management at Rice University
Online Statistics: An Interactive Multimedia Course of Study
Online Statistics: A Multimedia Course of Study is an introductory-level statistics book. The material is presented both as a standard textbook and as a multimedia presentation. The book features interactive demonstrations and simulations, case studies, and an analysis lab.
Table of Contents
Full Table of Contents
Statistical Analyses Explained
Researchers often seek to infer whether variables are related to each other; hypothesis testing permits one to examine such relationships empirically.
Typically, researchers state a hypothesis as a “null hypothesis (abbreviated as Ho).” That is, there is no difference in the dependent variable by the independent variable.
One type of t-test is called an independent sample t-test.
Analysis of Variance (ANOVA) is a statistical method to examine if there are differences in a dependent variable by a set of interval independent variables.
The Analysis of Variance (ANOVA) yields Main effects and Interaction effects
The chi-square test is a nonparametric statistic that assesses the association between two categorical variables.
The null hypothesis states there is no association between these variables, while the alternative hypothesis states a relationship does exist between the two variables.
Multiple Regression is used to predict the amount of variance (R2) accounted for in the criterion (dependent variable) from a set of predictors (independent variables).
The predictors can be interval, dichotomous, and/or dummy variables.
Logistic Regression is a regression method used when the dependent variable is dichotomous.
Logistic regression is used to predict the likelihood (the odds ratio) of the outcome based on the predictor variables (called covariates in logistic regression).
LISREL is a popular software program designed for structural equation modeling.
The LISREL program may be used to handle standard multivariate methods, such as analysis of variance, regression analyses, and multivariate analysis of variance.
Structural equation modeling (SEM) is a multivariate statistical technique used to examine direct and indirect relationships between one or more independent variables and one or more dependent variables.
Path analysis examines the direct and indirect effects of variables hypothesized as causes of variables treated as effects.
A method applied to causal models already formulated on the basis of knowledge and theoretical considerations.
SPSS is a statistical package for dissertation and thesis graduate students.
SPSS permits graduate students to conduct descriptive statistics, one-way and multivariate ANOVAs, repeated-measures ANOVA, correlation, regression, discriminant analysis, factor analysis, alpha reliability analysis, chi-square test
Statistical formulas are used to generate statistics necessary to infer relationships between variables and decide whether statistical hypotheses are supported.
Statistics Solutions uses a wide variety of statistical tests to obtain the necessary statistics.
Correlation is a bivariate measure of association (strength) of the relationship between two variables. It varies from 0 (random relationship) to 1 (perfect linear relationship) or -1 (perfect negative linear relationship).
Partial correlation is the correlation of two variables while controlling for a third or more other variables. The technique is commonly used in "causal" modeling of small models (3 - 5 variables).
Discriminant function analysis, a.k.a. discriminant analysis or DA, is used to classify cases into the values of a categorical dependent, usually a dichotomy.
Factor analysis is used to uncover the latent structure (dimensions) of a set of variables.
Log-linear, logit, and probit models extend the principles of generalized linear models (ex., regression) to better treat the case of dichotomous and categorical variables.
Type I. Used in hierarchical balanced designs where main effects are specified before first-order interaction effects, and first-order interaction effects are specified before second-order interaction errects, etc.
Reliability is the correlation of an item, scale, or instrument with a hypothetical one which truly measures what it is supposed to.
Association refers to a wide variety of coefficients which measure strength of relationship, defined various ways. In common usage "association" refers to measures of strength of relationship in which at least one of the variables is a dichotomy, nominal, or ordinal.
Association refers to coefficients which gauge the strength of a relationship. Coefficients in this section are designed for use with 2-by-2 tables. Note that measures for larger tables, discussed separately for nominal and ordinal data, may also be used with 2-by-2 tables.
Association refers to coefficients which gauge the strength of a relationship. Coefficients in this section are designed for use with nominal data.
Eta is a coefficient of nonlinear association. For linear relationships, eta equals the correlation coefficient (Pearson's r).
Gamma, also called Goodman and Kruskal's gamma, is a symmetric measure which varies from +1 to -1, based on the difference between concordant pairs (P) and discordant pairs (Q).
Assumptions are covered under each statistical topic. See also the separate section on data levels. This section provides information general to all procedures.
A canonical correlation is the correlation of two canonical (latent) variables, one representing a set of independent variables, the other a set of dependent variables.
Cluster analysis, also called segmentation analysis or taxonomy analysis, seeks to identify homogeneous subgroups of cases in a population.
Correspondence analysis is a method of factoring categorical variables and displaying them in a property space which maps their association in two or more dimensions.
Proper handling of missing values is important in all analyses and is critical in some, such as time series analysis.
Nominal data has no order, and the assignment of numbers to categories is purely arbitrary (ex., 1=East, 2=North, 3=South, etc.).
Partial least squares (PLS) regression (path) analysis is an alternative to OLS regression, canonical correlation, or structural equation modeling (SEM) for analysis of systems of independent and response variables.
Research designs fall into two broad classes: quasi-experimental and experimental.
Significance is the percent chance that a relationship found in the data is just due to an unlucky sample, such that if we took another sample we might find nothing.
The binomial test is an exact probability test, based on the rules of probability, and is used to examine the distribution of a single dichotomy when the researcher has a small sample.
Normal curve means tests, commonly called simply "hypothesis tests," are a basic method of exploring possible differences between two samples, or of testing the null hypothesis that an observed sample mean does not differ significantly from zero.
The Fisher exact test of significance is used in place of the chi-square test in small 2-by-2 tables.
The one-sample runs test of significance is commonly used as a test of randomness in a sample.
The Kolmogorov-Smirnov D test is a goodness-of-fit test which tests whether a given distribution is not significantly different from one hypothesized (ex., on the basis of the assumption of a normal distribution).
This set of significance coefficients tests whether an ordinal or interval variable measured in each of two independent samples can be assumed to come from the same underlying population.
The tests in this section test whether one can reject the null hypothesis that two or more independent samples come from the same underlying population distribution.
The McNemar test assesses the significance of the difference between two dependent samples when the variable of interest is a dichotomy.
Survey research is the method of gathering data from respondents thought to be representative of some population, using an instrument composed of closed structure or open-ended items (questions).
Simple time series design. The usual time series design is simply the collection of quantitative observations at regular intervals through repeated surveys, such as unemployment indexes collected by the Bureau of Labor Statistics.
Event history analysis is an umbrella terms for a set of procedures. As such it is a specialized subfield of time series analysis which uses techniques, such as Poisson regression, which are designed to analyze rare events (time series in which most data are non-events).
Two-stage least squares regression (2SLS) is a method of extending regression to cover models which violate ordinary least squares (OLS)
A study is valid if its measures actually measure what they claim to, and if there are no logical errors in drawing conclusions from the data.
We have compiled a list of respected sites that can help out with your verious dissertation needs.
Multivariate Analysis of covariance (MANCOVA)
Repeated measures analysis
Wilcoxon Matched-pairs Sign test
Peace and survival of life on Earth as we know it are threatened by human activities that lack a commitment to humanitarian values. Destruction of nature and natural resources results from ignorance, greed, and a lack of respect for the Earth's living things... . It is not difficult to forgive destruction in the past, which resulted from ignorance. Today, however, we have access to more information, and it is essential that we re-examine ethically what we have inherited, what we are responsible for, and what we will pass on to coming generations. Clearly this is a pivotal generation... . Our marvels of science and technology are matched if not outweighed by many current tragedies, including human starvation in some parts of the world, and extinction of other life forms... . We have the capability and responsibility. We must act before it is too late. Tenzin Gyatso the fourteenth Dalai Lama.