Content
Data preparation
Transformations
Fill Series
Measurement scales and data types
Converting between units
Log transform
Logit transformation
Probit transformation
Angular transformation
Cumulative transformation
Ladder of powers transformation
Apply function
Date and time intervals
Ranking
Normal scores
Standardization
Sort data to new column
Sort within worksheet
Pairwise
Pairwise differences
Pairwise means
Pairwise slopes
Text to numbers
Random number generation
Rotate data block
Dummy variables
Search and replace
Clear missing data
Categorise
Extract a subset of data
Split by group identifier
Combine groups and form an identifier
Tabulate
Detabulate
Crosstabs
Arial;Courier New;Symbol;Helvetica;Courier;Tms Rmn;Helv;New York;System;Wingdings;MSMincho ;Batang ;SimSun ;PMingLiU ;MS Gothic ;Dotum ;SimHei ;MingLiU ;Mincho ;Gulim ;Century;AngsanaNew;CordiaNew;Mangal;Latha;Sylfaen;Vrinda;Raavi;Shruti;Sendnya;Gautami;Tunga;Estrangelo Edessa;Kartika;Arial Unicode MS;Tahoma;Verdana;TimesNew Roman CE;TimesNew Roman Cyr;Times New Roman Greek;Times New Roman Tur;Times New Roman (Hebrew);Times New Roman (Arabic);Times New RomanBaltic;Times New Roman (Vietnamese);Arial CE;Arial Cyr;Arial Greek;ArialTur;Arial (Hebrew);Arial (Arabic);Arial Baltic;Arial (Vietnamese);Courier NewCE;Courier New Cyr;Courier New Greek;Courier New Tur;Courier New(Hebrew);Courier New (Arabic);Courier New Baltic;Courier New(Vietnamese);Sylfaen Cyr;Sylfaen Greek;Verdana CE;Verdana Cyr;VerdanaGreek;Verdana Tur;Verdana Baltic;Verdana (Vietnamese);
Data preparation. 3
Transformations. 4
Fill series. 4
Measurement scales and data types. 5
Converting between units. 6
Logarithmic transformation. 8
Logarithmic transformation. 9
Probit transformation. 9
Angular transformation. 10
Cumulative transformation. 10
Ladder of powers transformations. 11
Apply function. 11
Date and time intervals. 13
Ranking.. 14
Normal scores. 15
Standardization. 16
Sort data to new column. 17
Sort within worksheet. 18
Pairwise. 19
Pairwise differences. 20
Pairwise means. 20
Pairwise slopes. 21
Text to numbers. 22
Random number generation. 22
Rotate data block. 26
Dummy variables. 27
Search and replace. 28
Clear missing data.. 31
Categorise. 32
Extract a subset of data.. 33
Split by group identifier.. 34
Combine groups and form an identifier.. 35
Tabulate. 36
Detabulate. 36
Crosstabs. 37
·WORKBOOK
·EXCELLINKS
·MISSINGDATA
·TRANSFORMATIONS
·APPLYFUNCTION
·NORMALSCORES
·STANDARDIZATION
·CONVERTINGBETWEEN UNITS
·RANKING
·SORTINGDATA TO NEW COLUMNS IN WORKSHEETS
·SORTINGEXISTING CELLS WITHIN WORKSHEETS
·DATE& TIME INTERVALS
·PAIRWISE
·RANDOMNUMBERS
·ROTATEDATA BLOCK
·DUMMYVARIABLES
·TEXTTO NUMBERS
·SEARCH& REPLACE
·CLEARMISSING DATA
·CATEGORISE
·EXTRACTVARIABLES
·COMBINEGROUPS AND CREATE A GROUP IDENTDIFIER
·SPLITBY GROUP IDENTIFIER
·TABULATE
·DETABULATE
·CROSSTABS
·FREQUENCIES
Copyright © 1990-2006 StatsDirect Limited, all rights reserved
Download a free 10 dayStatsDirect trial
Menu location: Data_Transformations.
This section deals with somecommon transformations:
·Log(natural)
·Log (base10)
·Logit
·Probit
·Angular
·Cumulative
·Ladderof powers
Please note that othertransformations can be achieved by applyinga function to a column of data as described below.
Transforming a variable changesits measurement scale. Reasons for transformation include stabilizing variance,linearizing relationships, normalizing distributions and making data easier tohandle numerically (Armitage and Berry,1994; Bland and Altman, 1996c). Details of the above transformations arediscussed under their own titles, here we consider how to deal with othertransformations, for example a power transform:
Power
Example, for square root applyfunction v1^.5 or sqr(v1), for standardized version (v1^.5-1)/.5
Limitation, v1 must be ³ 0.
Standardized form: y = (x^c-1) /c
For c = 0.5 (square root):
Variance of x can be stabilizedespecially with Poisson distributed data.
For c = -1 (reciprocal):
Variance of x can be stabilizedwith distant high outliers, e.g. very long survival times c.f. mainly shorterones.
For c = 2 (square):
Variance can be stabilized whenvar(x) decreases with increasing x and with negatively skewed x.
Menu location: Data_FillSeries.
This function fills a series ofrows in a worksheet with numbers according to a formula that you specify.
The default formula is VX+1 witha starting value of 1, which will give a sequence of numbers from 1 to thelength of series that you specify. A formula of VX+10 would give a series ofintegers spaced at 10, i.e. 10, 20 30, 40 etc. Complex non-linear formulae canbe used; see calculatorfor a list of functions that you can use in a formula.
Copyright © 1990-2006 StatsDirect Limited, all rights reserved
Download a free 10 dayStatsDirect trial
It is important, in statisticalanalysis, to know about the different scales of measurement, these are:
·INTERVAL
Scale with a fixed and defined interval e.g.temperature or time.
·ORDINAL
Scale for ordering observations from low tohigh with any ties attributed to lack of measurement sensitivity e.g.score from a questionnaire.
·NOMINAL with order
Scale for grouping into categories with order e.g. mild, moderate orsevere. This can be difficult to separate from ordinal.
·NOMINAL without order
Scale for grouping into unique categoriese.g. eye colour.
·DICHOTOMOUS
As for nominal but two categories only e.g.male/female.
In addition to the classificationof measurement scales, other related terms are used to describe types of data:
·CATEGORICAL vs. NUMERICAL (quantitative vs. qualitative)
Data that represent categories, such as dichotomous (two categories) andnominal (more than two categories) observations, are collectively calledcategorical (qualitative). Data that are counted or measured using anumerically defined method are called numerical (quantitative).
·DISCRETE vs. ORDERED CATEGORICAL
Discrete data arise from observations that can only take certain numericalvalues, usually counts such as number of children or number of patientsattending a clinic in a year. Ordered categorical data are sometimes treated asdiscrete data, this is wrong. For example, using the Registrar General'sclassification of social class, it would be wrong to say that class I is fivetimes the socio-economic status as class V, as there is not a strict numericalrelationship between these categories. It follows, therefore, that averagesocial class is a meaningless statistic. Thus, ordered categorical data shouldnot be treated as discrete data for statistical analysis. Discrete data may betreated as ordered categorical data in statistical analysis, but someinformation is lost in doing so.
·CONTINUOUS
Continuous data are numerical data that can theoretically be measured ininfinitely small units. For example, blood pressure is usually measured to thenearest 2mm Hg, but could bemeasured with much greater resolution of difference. The interval measurementscale is intended for continuous data. Sometimes continuous data are givendiscrete values at certain thresholds, for example age a last birthday is adiscrete value but age itself is a continuous quantity; in these situations itis reasonable to treat discrete values as continuous. Remember that informationis lost when continuous data are recorded only in ranges (ordered categories),and the statistical analysis of continuous data is more powerful than that ofcategorical data.
·PERCENTAGES and RATIOS
Percentages or ratios summarise two pieces of information, namely theirconstituent numerator and denominator values. Simple ratios (0 to 1, i.e. thedenominator is the maximum possible value that the numerator can take) can be treatedas continuous data. More difficult to analyse data arise when the ratiorepresents a change, and the value can be negative. Ratios of observationscompared with reference values, e.g. height relative to the mean of a referencepopulation for a given sex and age, are difficult to handle as values may falleither side of 1 (100%).
Many statistical methods areappropriate only for data of certain measurement scales. When selectinga statistical method, it is essential to understand how the data to beanalysed were measured. The best stage of investigation for ponderingmeasurement scales is the design stage, at which the statistical limitationsimposed by certain measurement scales may influence your choice of observationsand methods of measurement.
Note that transforming datachanges their measurement scales.
See also:
Normalscores
Standardization
Ranking
Convertingbetween units
Copyright © 1990-2006 StatsDirect Limited, all rights reserved
Download a free 10 dayStatsDirect trial
Converting from 1: | multiply by: | to give 1: |
Atmospheres | 76 | CM of Mercury (0°C) |
Atmospheres | 29.921 | Inches of Mercury (32°F) |
Atmospheres | 2116.32 | Pounds/sq ft |
Atmospheres | 14.696 | Pounds/sq ft |
Barrels, oil | 42 | Gallon (US) |
Barrels (API) | 0.159 | Meter³ |
BTU (15.56°C) | 1055 | Joule |
BTU/Minute | 0.0236 | Horsepower |
Calories (mean) | 4.19 | Joule |
Centimetres | 0.3937 | Inches |
Cubic centimetres | 0.061 | Cubic inches |
Cubic Centimetres | 0.00022 | Gallons (British) |
Cubic Centimetres | 0.00026 | Gallons (US) |
Cubic Feet | 0.037 | Cubic Yards (US) |
Cubic Feet | 6.2288 | Gallons (British) |
Cubic Feet | 7.4805 | Gallons (US) |
Cubic Feet | 28.3162 | Litres |
Cubic Inches | 16.3872 | Cubic Cm |
Cubic Meters | 35.314 | Cubic Feet |
Cubic Meters | 219.969 | Gallons (British) |
Cubic Meters | 264.173 | Gallons (US) |
Degrees (F) | (Fahrenheit - 32) / 1.8 | Degrees (C) |
Degrees (C) | (Celsius * 1.8) + 32 | Degrees (F) |
Feet | 0.3048 | Meters |
Feet of Water (39.2°F) | 0.8826 | Inches of Mercury (32°F) |
Feet of Water (39.2°F) | 62.427 | Pounds/sq ft |
Feet/minute | 0.0183 | Kilometres/hour |
Feet/minute | 0.005 | Meters/second |
Gallons (British) | 0.1605 | Cubic Ft |
Gallons (British) | 1.2009 | Gallons(US) |
Gallons (British) | 4.5459 | Litres |
Gallons (US) | 0.1337 | Cubic Ft |
Gallons (US) | 0.8327 | Gallons(British) |
Gallons (US) | 3.7853 | Litres |
Grams | 0.03527 | Ounces |
Horsepower | 1.014 | Cheval-Vapeur |
Horsepower | 745.7 | Watts |
Inches | 2.54 | Centimetres |
Inches of Mercury (32°F) | 0.0334 | Atmospheres |
Kilograms | 2.2046 | Pounds |
Kilometres | 3280 | Feet |
Kilometres | 0.6213 | Miles |
Kilowatts | 56.884 | BTU/Minute |
Litres | 0.0353 | Cubic Feet |
Litres | 0.2199 | Gallons (British) |
Litres | 0.2641 | Gallons (US) |
Meters | 3.2808 | Feet |
Miles | 5280 | Feet |
Miles | 1.6093 | Kilometres |
Miles | 320 | Rods |
Miles (nautical) | 6080 | Feet |
Millilitres | 0.035 | Ounces (Fluid-British) |
Millilitres | 0.0338 | Ounces (Fluid-US) |
Millimetres | 0.039 | Inches |
Ounces | 28.3495 | Grams |
Ounces (British) | 28.4130 | Cubic Cm |
Ounces (US) | 29.5737 | Cubic Cm |
Pounds | 453.5924 | Grams |
Pounds/sq ft | 0.000472 | Atmospheres |
Pounds/sq in | 0.0680 | Atmospheres |
Pounds/sq in | 2.036 | Inches of Mercury (32°F) |
Quarts (British) | 1136.521 | Cubic Cm |
Quarts (US) | 946.3586 | Cubic Cm |
Tons (long) | 1016.047 | Kilograms |
Tons (long) | 2240 | Pounds |
Tons (long) | 1.016 | Tons (metric) |
Tons (long) | 1.12 | Tons (short) |
Yards | 91.44 | Centimetres |
Copyright © 1990-2006 StatsDirect Limited, all rights reserved
Download a free 10 dayStatsDirect trial
Menu location: Data_Transformations_Log(Natural) and Data_Transformations_Log (Base 10).
Logarithmic transformation can beused to stabilise the variance of a sample (Bland, 2000).
The function named Log (natural)calculates the natural (Naperian, log to the base e) logarithm of the data youselect.
The function named Log (base 10)calculates the common (log to the base 10) logarithm of the data you select.
The results are stored in a newcolumn that is marked Log(natural):<name> or Log(base 10):<name>where <name> is the original column label. Indeterminable values (i.e.log of any data < 0) are marked as missing data unless you opt to add asuitable constant to all data (you are prompted if you need to do this). It isbest to consult with a statistician before using constants in logtransformations.
Effects of log transformation:
·Variance stabilisation.
·Increasing slopes in x in relation to another variable arelinearized.
·Positively skewed distributions of x are normalized.
Copyright © 1990-2006 StatsDirect Limited, all rights reserved
Download a free 10 dayStatsDirect trial
Menu location: Data_Transformations_Log(Natural) and Data_Transformations_Log (Base 10).
Logarithmic transformation can beused to stabilise the variance of a sample (Bland, 2000).
The function named Log (natural)calculates the natural (Naperian, log to the base e) logarithm of the data youselect.
The function named Log (base 10)calculates the common (log to the base 10) logarithm of the data you select.
The results are stored in a newcolumn that is marked Log(natural):<name> or Log(base 10):<name>where <name> is the original column label. Indeterminable values (i.e.log of any data < 0) are marked as missing data unless you opt to add asuitable constant to all data (you are prompted if you need to do this). It isbest to consult with a statistician before using constants in logtransformations.
Effects of log transformation:
·Variance stabilisation.
·Increasing slopes in x in relation to another variable arelinearized.
·Positively skewed distributions of x are normalized.
Copyright © 1990-2006 StatsDirect Limited, all rights reserved
Download a free 10 dayStatsDirect trial
Menu location: Data_Transformations_Probit.
Probit is a common transformationfor linearizing sigmoid distributions of proportions (Armitage and Berry,1994). The probit is defined as 5 + the 1-p quantile from the standard normaldistribution, where p is a proportion. For example, the number of insectskilled by the log dose of an insecticide might describe a sigmoid relationship,which is a rectangular hyperbolic relationship to the non-log transformed dose.This sort of quantal response situation can be treated as a linear problemafter probit transformation. Probitanalysis uses these principles.
You can supply proportions ordiscrete data for logit transformation. If you specify discrete data thenStatsDirect converts these to proportions by taking each value as a proportionof the maximum of the supplied data. The results are stored in a new columnthat is marked Probit:<name> where <name> is the original columnlabel.
StatsDirect marks indeterminablevalues as missing data, i.e. p=0 or p=1. StatsDirect probit analysis, on theother hand, treats p=0 and p=1 ina more complex way that includes them in the overall analysis.
Copyright © 1990-2006 StatsDirect Limited, alquanxiangyun.cn/rencai/l rights reserved
Download a free 10 dayStatsDirect trial
Menu location: Data_Transformations_Angular.
Angular transformation can beused to linearize sigmoid distributions and equalize variances of proportions (Bland, 2000).The angle is defined as arcsin (sqr(p)), where sqr(p) is the square root of aproportion. Logit and probit transformations do not equalize variances butlogistic regression and probit analysis do give special treatment to variancesin the way that they handle expected error distributions. Angular regression isseldom used but angle transformed data can be useful for exploratory dataanalysis by experts.
You can supply proportions ordiscrete data for angular transformation. If you specify discrete data thenStatsDirect converts these to proportions by taking each value as a proportionof the maximum of the supplied data. The results are stored in a new columnthat is marked Angle:<name> where <name> is the original columnlabel.
Indeterminable values are markedas missing data.
Copyright © 1990-2006 StatsDirect Limited, all rights reserved
Download a free 10 dayStatsDirect trial
Menu location: Data_Transformations_Cumulative.
This function takes a set of dataand returns the cumulative set in an equal column marked with the labelCumulate:<name> where <name> is the column label of the originaldata. Cumulative data need to have some meaning to the order in which they arederived.
Example:
Data A, become ---> | Cumulate:Data A |
1 | 1 |
3 | 4 |
5 | 9 |
4 | 13 |
7 | 20 |
9 | 29 |
Copyright © 1990-2006 StatsDirect Limited, all rights reserved
Download a free 10 dayStatsDirect trial
Menulocation: Data_Transformations_Ladder of Powers.
This function provides a seriesof transformations that have increasing power to pull in the right hand tail ofa distribution. The ladder consists of:
1/x² | reciprocal of square (power -2) |
1/x | reciprocal (power -1) |
ln(x) | natural logarithm |
x½ | square root (power 0.5) |
x² | square (power 2) |
For more information see "tquanxiangyun.cnransformingdata".
Copyright © 1990-2006 StatsDirect Limited, all rights reserved
Download a free 10 dayStatsDirect trial
Menu location: Data_ApplyFunction.
This applies an algebraicexpression to any selection of data. The workbook column(s) you select aregiven the variable names V1, V2 ... Vn in the order that they are selected fromleft to right across the workbook. You can then create a new column as afunction of those selected. You do this by entering an algebraic expressionthat can use any of the functions, operators and constants below. For exampleV1/SQR(V2) would produce a column where each row entry was that of the firstselected column divided by the square root of the corresponding row entry inthe second selected column. A maximum of 9 variables/columns can be selected.
The same results can be achievedusing intrinsic functions within a StatsDirect workbook. Large complexfunctions may be handled faster by the intrinsicworkbook functions of StatsDirect.
Constants | |
PI | 3.14159265358979323846 (p) |
EE | 2.71828182845904523536 (e) |
Arithmetic Functions | |
ABS | absolute value |
CLOG | common (base 10) logarithm |
CEXP | anti log (base 10) |
EXP | anti log (base e) |
LOG | natural (base e, Naperian) logarithm |
LOGIT | logit: log(p/[1-p]), p=proportion |
ALOGIT | antilogit: exp(l)/[1+exp(l)], l=logit |
SQR or SQRT | square root |
! | factorial (maximum 170.569) |
LOG! | log factorial |
IZ | normal deviate for a p value |
UZ | upper tail p for a normal deviate |
LZ | lower tail p for a normal deviate |
TRUNC or FIX | integer part of a real number |
CINT | real number rounded to nearest integer |
INT | real number truncated to integer closest to zero |
Please note that the largestfactorial allowed is 170.569398315538748, but you can work with Log factorialsvia the LOG! function, e.g. LOG!(171).
Arithmetic Operators | |
^ | exponentiation (to the power of) |
+ | addition |
- | subtraction |
* | multiplication |
/ | division |
\ | integer division |
Calculations give an order ofpriority to arithmetic operators; this must be considered when enteringexpressions. For example, the result of the expression "6 - 3/2" is4.5 and not 1.5 because division takes priority over subtraction.
Priority of arithmeticoperators in descending order
1. Exponentiation(^)
2. Negation(-X) (Exception = x^-y; i.e. 4^-2 is 0.0625 and not -16)
3. Multiplicationand Division (*, /)
4. IntegerDivision (\)
5. Additionand Subtraction (+, -)
Trigonometric Functions | |
ARCCOS | arc cosine |
ARCCOSH | arc hyperbolic cosine |
ARCCOT | arc cotangent |
ARCCOTH | arc hyperbolic cotangent |
ARCCSC | arc cosecant |
ARCCSCH | arc hyperbolic cosecant |
ARCTANH | arc hyperbolic tangent |
ARCSEC | arc secant |
ARCSECH | arc hyperbolic secant |
ARCSIN | arc sine |
ARCSINH | arc hyperbolic sine |
ATN | arc tangent |
COS | cosine |
COT | cotangent |
COTH | hyperbolic cotangent |
CSC | cosecant |
CSCH | hyperbolic cosecant |
SIN | sine |
SINH | hyperbolic sine |
SECH | hyperbolic secant |
SEC | secant |
TAN | tangent |
TANH | hyperbolic tangent |
To convert degrees to radians,multiply degrees by pi/180. To convert radians to degrees, multiply radians by180/pi.
Logical Functions | |
AND | logical AND |
NOT | logical NOT |
OR | logical OR |
< | less than |
= | equal to |
> | greater than |
Copyright © 1990-2006 StatsDirect Limited, all rights reserved
Download a free 10 dayStatsDirect trial
Menu location: Data_Date &Time Intervals.
This function enables you tocreate a column of intervals from a column of dates or times. You must havedate or time data in your workbook to do this. The StatsDirect workbookrecognises input such as 1/3/67 or 12:55 as datesand times.
You can use this function to workout the distance of each date or time entry from a point which you specify,this point can be a date, a time or a date with a time after it. You must also specifythe interval that you want StatsDirect to calculate, this can be years, months,weeks, days, hours, minutes or seconds. Positive intervals represent dates ortimes later than the reference point which you entered and negative intervalsreflect earlier dates/times.
Example
Test workbook (Other worksheet:Event).
To working out the interval indays from a reference point of 1/4/96to the dates in the following list you can use the "Date & TimeInterval" function of the "Data" menu.
Event -----> | Event~Days from 01/04/96 |
01/03/97 | 334 |
02/05/97 | 396 |
12/12/96 | 255 |
01/11/96 | 214 |
03/01/97 | 277 |
05/05/97 | 399 |
07/08/97 | 493 |
08/10/96 | 190 |
See also Dates and Times inStatsDirect.
Copyright © 1990-2006 StatsDirect Limited, all rights reserved
Download a free 10 dayStatsDirect trial
Menu location: Data_Rank.
This function ranks the workbookdata you select and saves the rankings into a new workbook column labelledRank: Name where Name is the column label of the original data. You cancalculate a correction factor for ties in the ranking; five formulae areoffered for tie correction:
1.S(t^3 - t /12)
2.S(t * (t-1)/2)
3.S(t * (t-1) *(2t+5))
4.S(t * (t-1) *(t-2))
5.S(t * (t-1) *(t+1))
Here t is the number of data tiedat each tie and upper case sigma (S) is the summation across these ties.
The use of tie corrections is acomplex subject upon which learned statisticians sometimes disagree.
Example
Test workbook (Nonparametricworksheet: First Born).
Ranking the following agressivityscores for a sample of firstborn twins gives.
First Born -----> | Rank: First Born |
86 | 8 |
71 | 3.5 |
77 | 6.5 |
68 | 1 |
91 | 11.5 |
72 | 5 |
77 | 6.5 |
91 | 11.5 |
70 | 2 |
71 | 3.5 |
88 | 10 |
87 | 9 |
Copyright © 1990-2006 StatsDirect Limited, all rights reserved
Download a free 10 dayStatsDirect trial
Menu location: Data_NormalScores.
This function saves the normalscores of workbook data you select into a new workbook column marked Nml Score:Name, where Name is the column label of the original data.
Three different methods forcalculating normal scores are provided:
1. vander Waerden's method (Conover, 1999):
- where s is the normal score for an observation, r is the rank for thatobservation, n is the sample size and F(p) is the pth quantile from the standardnormal distribution.
2. Blom's method (Altman, 1991):
- where s is the normal score for an observation, r is the rank for thatobservation, n is the sample size and F(p) is the pth quantile from the standardnormal distribution.
3. Expectednormal order scores (David, 1981;Royston, 1982; Harter, 1961)
- where s is the normal score for an observation, r is the rank for thatobservation, n is the sample size, f(p) is the standard normal density for p and F(p) is the pthquantile from the standard normal distribution. The solution is found bynumerical integration. Calculation of expected normal order scores is notpractical for very large samples, n of 2500 is the maximum permitted inStatsDirect.
Example
Test workbook (Nonparametricworksheet: First Born).
Scoring the following agressivityscores for a sample of firstborn twins using the first method above gives:
First Born -----> | Nml Score(vdW): First Born |
86 | 0.2934 |
71 | -0.6151 |
77 | 0 |
68 | -1.4261 |
91 | 1.1984 |
72 | -0.2933 |
77 | 0 |
91 | 1.1984 |
70 | -1.0201 |
71 | -0.6151 |
88 | 0.7363 |
87 | 0.5024 |
Copyright © 1990-2006 StatsDirect Limited, all rights reserved
Download a free 10 dayStatsDirect trial
Menu location: Data_Standardization
This function saves standardizedvalues of workbook data you select into a new workbook column marked Std.(Method): Name, where Name is the column label of the original data and Methodis one of the methods listed below.
Three different methods forstandardization are provided:
1. (x -mean) / standard deviation
2. x /standard deviation
3. x -mean
Example
Test workbook (Nonparametricworksheet: First Born).
Standardizing the followingagressivity scores for a sample of firstborn twins using the first method abovegives.
First Born ---> | Std ((x-mean)/SD): First Born |
86 | 0.7782 |
71 | -0.9095 |
77 | -0.2344 |
68 | -1.2470 |
91 | 1.3408 |
72 | -0.7970 |
77 | -0.2344 |
91 | 1.3408 |
70 | -1.0220 |
71 | -0.9095 |
88 | 1.0033 |
87 | 0.8907 |
Menu location: Data_Sort_CreateSorted Columns.
StatsDirect enables you to createsorted copies of workbook columns or to manipulate data within a selected datarange of a workbook. This section deals with creating new sorted workbookcolumns based on existing ones. See also workbooksort.
This function sorts workbook datayou select and saves the sorted data into a new workbook column marked Sort:Name, where Name is the column label of the original data. Sorting may beascending or descending. The sort may also be tied to other data, i.e. the datain column b may be sorted in the in the order of sorting the data in column a.Tied sorting can be repeated for any number of columns.
Example 1
Test workbook (Nonparametricworksheet: First Born).
Sorting the following agressivityscores for a sample of firstborn twins in ascending order gives.
First Born -----> | Sort: First Born |
86 | 68 |
71 | 70 |
77 | 71 |
68 | 71 |
91 | 72 |
72 | 77 |
77 | 77 |
91 | 86 |
70 | 87 |
71 | 88 |
88 | 91 |
87 | 91 |
Example 2
Test workbook (Nonparametricworksheet: First Born, Second Born).
Sorting the following agressivityscores for a sample of second born twins by the ascending order of the scoresfor firstborn twins gives.
First Born | Second Born -----> | Sr~Second Born~First Born |
86 | 88 | 64 |
71 | 77 | 65 |
77 | 76 | 80 |
68 | 64 | 77 |
91 | 96 | 72 |
72 | 72 | 76 |
77 | 65 | 65 |
91 | 90 | 88 |
70 | 65 | 72 |
71 | 80 | 81 |
88 | 81 | 96 |
87 | 72 | 90 |
Copyright © 1990-2006 StatsDirect Limited, all rights reserved
Download a free 10 dayStatsDirect trial
Menu location: Data_Sort_ManipulateWorksheet.
StatsDirect enables you to createsorted copies of worksheet columns or to manipulate data within a selected datarange of a worksheet. This section deals with sorting data within existingranges of data that you select in a worksheet. See also sortingcolumns.
PLEASE NOTE that for thisfunction you should not select columns by clicking on the headers of thecolumns. Select only the data ranges that you want to sort by dragging themouse across them with the left hand mouse button held down.
This function manipulatesselected workbook data ranges by sorting the data. Sorting may be ascending ordescending and either up/down columns (by row) or from side to side (bycolumn). You select a key which is the cell where you want the sorting to beginand in the row or column upon which your sort order is based. Selecting morethan one key enables you to perform a nested sort, i.e. sort on key one (saythe data in column x) then if there are repeats of the same value in column xlook in the key two column (say column y) to order the repeated valuesaccording to the order of the data in this secondary key and so on.
Example 1
Sorting the following data rangesascending order by column with key as the first data cell (at value 86) gives:
86 ------> | 68 |
71 | 70 |
77 | 71 |
68 | 71 |
91 | 72 |
72 | 77 |
77 | 77 |
91 | 86 |
70 | 87 |
71 | 88 |
88 | 91 |
87 | 91 |
Example 2
Sorting the following data rangesby the ascending order of the data in the first column by row with key as thefirst data cell in the first column (86) gives:
86 | 88 ------> | 68 | 64 |
71 | 77 | 70 | 65 |
77 | 76 | 71 | 80 |
68 | 64 | 71 | 77 |
91 | 96 | 72 | 72 |
72 | 72 | 77 | 76 |
77 | 65 | 77 | 65 |
91 | 90 | 86 | 88 |
70 | 65 | 87 | 72 |
71 | 80 | 88 | 81 |
88 | 81 | 91 | 96 |
87 | 72 | 1 | 90 |
·Pairwisedifferences
·Pairwisemeans
·Pairwiseslopes
Menu location: Data_Pairwise.
These functions providetransformations based on pairwise calculations either between columns or withina column, they often provide intermediate steps in non-parametric methods.
Copyright © 1990-2006 StatsDirect Limited, all rights reserved
Download a free 10 dayStatsDirect trial
Menu location: Data_Pairwise_Differences.
This function creates a column ofall possible (n*n) pairwise differences from a pair of columns:
X | Y |
1 | 2 |
2 | 3 |
3 | 6 |
Differences (X, Y)
-1
-2
-5
0
-1
-4
1
0
-3
All possible (n * n) Xi...(i=1 tok) are compared with Yj...(j=1 to k).
These contrasts form intermediatesteps in a number of non-parametric statistical methods.
Copyright © 1990-2006 StatsDirect Limited, all rights reserved
Download a free 10 dayStatsDirect trial
Menu location: Data_Pairwise_Means.
This function creates a column ofall possible (n(n+1)/2) pairwise means within a column:
X
2
3
6
Means (Within X)
2
2.5
4
3
4.5
6
All possible (n(n+1)/2) Xi...(i=1to n) are averaged with Xj...(j=i to n).
These values form intermediatesteps in a number of non-parametric statistical methods.
Copyright © 1990-2006 StatsDirect Limited, all rights reserved
Download a free 10 dayStatsDirect trial
Menu location: Data_Pairwise_Slopes.
This function creates a column ofall (n(n-1)/2) pairwise slopes from a pair of columns (Y the response variableand X the predictor variable):
X | Y |
1 | 2 |
2 | 3 |
3 | 6 |
Slopes (X, Y)
1
2
3
All possible (n(n-1)/2)XiYi...(i=1 to n) are compared with XjYj...(j=i to n) where i<>j. Slopeis (Yi-Yj)/(Xi-Xj).
These values form intermediatesteps in a number of non-parametric statistical methods. For example, innon-parametric linear regression (Theil type: Conover, 1999)the median slope would be 2 in theusual Y = bX +C, where C is the median intercept (i.e. median[Y - median b *X]).
StatsDirect will calculate themedian slope and its confidence interval for you if you select this option whenprompted. The results are stored in the column label of the relevant list ofpairwise slopes. The confidence interval is calculated using the distributionof Kendall's T as described in Conover (Conover, 1999).Please note that the results may differ slightly from Conover's text becauseStatsDirect calculates the inverse of Kendall'sT more accurately than the routines used to calculate Best's widely quoted 1974table for this statistic.
A fuller non-parametriclinear regression function is given in the non-parametric methods andregression sections.
Copyright © 1990-2006 StatsDirect Limited, all rights reserved
Download a free 10 dayStatsDirect trial
Menu location: Data_Text toNumbers.
This function codes text asnumbers to enable analysis. An example of where you might want to do this is aworkbook that has been created from a database where certain categories areentered as text:
Category | Data |
TA | 2.3 |
TA | 3.4 |
TB | 5.6 |
TB | 4.5 |
TB | 4.9 |
TA | 3.9 |
TB | 3.9 |
If you enter these data into twocolumns in a StatsDirect workbook, you can convert the Category column tonumbers using the Text to Numbers function from the Data menu. If you selectthe whole of this column by clicking on the header, the first row will have theword "Category" in it. To avoid counting "Category" as oneof your categories when it is really the column title, click on "No"when you are asked if the top row is to be included in the conversion. Theresults will be as follows:
Category | Data |
1 | 2.3 |
1 | 3.4 |
2 | 5.6 |
2 | 4.5 |
2 | 4.9 |
1 | 3.9 |
2 | 3.9 |
StatsDirect can code textautomatically as in the example above. You may also specify your own numericcode for each different text string by entering any valid number alongside thematching text in a substitution table. The substitution table only appears ifyou click "Yes" when StatsDirect prompts you to specify numbers.
Copyright © 1990-2006 StatsDirect Limited, all rights reserved
Download a free 10 dayStatsDirect trial
Menu location: Data_RandomNumbers.
This function enables you tocreate one or more series of random numbers from given distributions.
A robust generatorof uniform (pseudo)random numbers is used as the basis for generating deviatesfrom the probability distributions described below. You are given theopportunity to enter your own seed number to be used by the random numbergenerator but you should use the default seed (based upon your computer'sclock) in most cases. Please note that each seed generates its own series andthat series is the same if you use the seed again. You have very little chanceof using the same seed twice if you select the seed that StatsDirect suggests.
These functions are intended forsimulation work; they employ widely cited and debated algorithms (Gentle, 2003).
Uniform (continuousuniform, rectangular)
The continuous uniformdistribution has a constant density function of the interval (a, b) and thus arectangular shape from a to b:
- where a and b lie between minusand plus infinity.
The interval most commonly usedis 0 to 1
Normal (Gaussian)
The commonly used standard normaldistribution (mean of 0 and standard deviation of 1) is one of a family ofnormal distributions defined by the density function:
- where mean m lies betweenminus and plus infinity and standard deviation s is greater than zero.
Algorithm: inversion of thecumulative distribution function (Wichura, 1988;Gentle 2003)
Lognormal
The density of a lognormaldistribution is given by:
- where mean m lies betweenminus and plus infinity and standard deviation s is greater than zero.
The mean of the lognormaldistribution, as opposed to the mean of the underlying normal distribution, isequal to exp(m+s*s/2) and the variance is equal to exp(2m+2s*s)-exp(2m+ s*s).
Algorithm: transformed inversionof the cumulative distribution function (Wichura, 1988;Gentle 2003)
Exponential
The (negative) exponentialdistribution is a special case (shaping parameter of 1) of the gammadistribution. Its density is given by:
- where the parameter l must be greaterthan zero.
Algorithm: transformation (Ahrens & Dieter,1972).
Gamma
The density function of the gammadistribution is given by:
- where the parameters l and r (shapingparameter) must be greater than zero.
G(*) is the gamma function:
,x>0
Please note that gamma deviateswith a shaping parameter of 0.5 are half the square of normal deviates andgamma deviates with a shaping parameter of 1 are exponential deviates.
Algorithm: acceptance-rejectionmethods GD and GS (Ahrens& Dieter, 1974, 1982b).
Binomial
The density function of the binomialdistribution is given by:
- where p lies between 0 and 1 in n ranges.
Algorithm: acceptance-rejectionmethod BTPEC (Kachitvichyanukul& Schmeiser, 1988).
Poisson
The density function of the Poissondistribution is given by:
- where parameter l is greater thanzero.
Algorithm: acceptance-rejection (Ahrens & Dieter,1982a).
Chi-square
The density function of the chi-squaredistribution is given by:
The deviates are calculated asgamma deviates with parameters n/2 and 2, where n is degrees of freedom.
Algorithm: Transformed gammadeviates (Ahrens& Dieter, 1974,1982b; Gentle 2003).
F (variance ratio)
The density function of the F distribution is given by:
The deviates are calculated asnx/dz where n is the numerator degrees of freedom, d is the denominator degreesof freedom, x is a gamma deviate with parameters n/2 and 2 (chi-square with ndegrees of freedom), and z is a gamma deviate with parameters d/2 and 2(chi-square with d degrees of freedom).
Algorithm: Transformed gammadeviates (Ahrens& Dieter, 1974, 1982b; Gentle 2003).
Student's t
The density function of Student's tdistribution is given by:
The deviates are calculated as astandard normal deviate multiplied by the square root of the degrees of freedom(n) divided by a gamma deviate with parameters n/2 and 2 (a chi-square deviatewith n degrees of freedom).
Algorithm: Transformed standardnormal and gamma deviates (Ahrens & Dieter,1974, 1982b; Gentle 2003).
Beta
The density function of the beta distributionis given by:
Algorithm: Acceptance-rejectionmethods BB and BC (Cheng1978).
Logistic
The density function of the betadistribution is given by:
Algorithm: Transformed uniformdeviates (Gentle2003).
Cauchy
Algorithm: Transformed uniformdeviates (Gentle2003).
Weibull
Algorithm: Transformed uniformdeviates (Gentle2003).
Geometric
Algorithm: Transformed Poissonand exponential deviates (Devroye, 1986;Gentle 2003; Ahrens & Dieter, 1972, 1982a).
Negative binomial
Algorithm: Transformed Poissonand gamma deviates (Devroye, 1986;Gentle 2003; Ahrens & Dieter, 1974, 1982b, 1982a).
Copyright © 1990-2006 StatsDirect Limited, all rights reserved
Download a free 10 dayStatsDirect trial
Menu location: Data_RotateData Block.
This function rotates a selectedblock of data from a StatsDirect workbook in the following way:
Example
1.. | 2.. | 3.. | |
1 | 1.1 | 0.7 | 1 |
2 | 1.5 | 0.6 | 2 |
3 | 1.6 | 0.6 | 3 |
4 | 1.8 | 0.5 | 4 |
...this would become:
1.. | 2.. | 3.. | 4.. | |
1 | 1.1 | 1.5 | 1.6 | 1.8 |
2 | 0.7 | 0.6 | 0.6 | 0.5 |
3 | 1 | 2 | 3 | 4 |
... in other words rows becomecolumns and columns become rows.
The resulting rotation is bestsaved to a new workbook.
You are given the option to keepexisting formulae or to translate them into results before rotation. Pleasenote that if you keep formulae and save your rotated data to another worksheet,the references in the formulae are likely to be incorrect.
Copyright © 1990-2006 StatsDirect Limited, all rights reserved
Download a free 10 dayStatsDirect trial
Menu location: Data_DummyVariables
This function creates dummy (ordesign) variables from one categorical variable.
The reference cell coding modelis used (Kleinbaumet al., 1998):
- the source data may benumerical or text, representing categories. The coding scheme shown above isapplied to your data in reverse alphanumeric order for the k categoriesfound, so for three categories, say race equal to black, white or other, white(being the last in an alphabetical sorting) is coded 1,0,0 which reduces todummy variables X (3) = 1, X (2) = 0.
In order to represent acategorical variable with more than two levels in a regression model you maywish to convert it to a series of dummy variables using this function.
Say a linear regression model isspecified with three predictors; the first and third predictors are continuousdata, and the second predictor is a classifier (categorical data) with threelevels. The second predictor should be converted to two dummy dichotomousvariables (e.g. the example below) and put into a multiple linear regression astwo predictors.
The naming scheme for dummyvariables is the original variable name suffixed with (1) if there are only twocategories, or suffixed with (j+1) where there are j+1 categories giving riseto j dummy variables.
In general form, a regressionmodel where the jth predictor variable is a classifier with klevels can be interpreted as follows, provided the jth variable isconverted to dummy variables:
- where Y is the outcomevariable, b is a regression coefficient, D is a dummy variablefor a classifier variable of k levels and x is a non-classifierpredictor variable.
Example
Group ID ---> | Group ID (2) | Group ID (3) |
1 | 0 | 0 |
1 | 0 | 0 |
1 | 0 | 0 |
1 | 0 | 0 |
2 | 1 | 0 |
2 | 1 | 0 |
2 | 1 | 0 |
2 | 1 | 0 |
3 | 0 | 1 |
3 | 0 | 1 |
3 | 0 | 1 |
Copyright © 1990-2006 StatsDirect Limited, all rights reserved
Download a free 10 dayStatsDirect trial
Menu location: Data_Search& Replace.
This function searches the activeworksheet for data that match the conditions you specify. When the condition,for example data > 20, is found then you have the option to replace orsimply count the data that match this condition.
Note that you can use the replacefunction to remove cells or rows from a worksheet by leaving the "replacewith" box blank.
Example
The following are data importedfrom an application which uses -99 as the missing data value:
12 |
1 |
2 |
3 |
4 |
-99 |
4 |
2 |
1 |
-99 |
-99 |
To replace -99 with StatsDirect missing data youcan use the search function with the "search for x" condition set toequal to -99 and the "replace with" entry set to *. To run thisexample, enter the above data into a workbook column and selectthat column. Then select search from the data menu. Check the "equalto" option, enter -99 inthe "search for x" box and enter * in the "replace with"box. When you click on the OK button the -99 values will be replaced with * andthese will also be displayed as an asterisk (missing data) in the workbook.
The match expression option ofthe StatsDirect search form enables you to select data for an expression whichevaluates as true. You use x to denote the data to match, i.e. x ³ 10 and x £ 20 matches allselected data between 10 and 20 inclusive, you can count or replace the matchesby clicking the relevant button. You can compose an expression using x and anyof the functions, operators and logical expressions below:
Constants | |
PI | 3.14159265358979323846 (p) |
EE | 2.71828182845904523536 (e) |
Arithmetic Functions | |
ABS | absolute value |
CLOG | common (base 10) logarithm |
CEXP | anti log (base 10) |
EXP | anti log (base e) |
LOG or LN | natural (base e, Naperian) logarithm |
LOGIT | logit: log(p/[1-p]), p=proportion |
ALOGIT | antilogit: exp(l)/[1+exp(l)], l=logit |
SQR or SQRT | square root |
! | factorial (maximum 170.569) |
LOG! | log factorial |
IZ | normal deviate for a p value |
UZ | upper tail p for a normal deviate |
LZ | lower tail p for a normal deviate |
TRUNC or FIX | integer part of a real number |
CINT | real number rounded to nearest integer |
INT | real number truncated to integer closest to zero |
Please note that the largestfactorial allowed is 170.569398315538748, but you can work with Log factorialsvia the LOG! function, e.g. LOG!(171).
Arithmetic Operators | |
^ | exponentiation (to the power of) |
+ | addition |
- | subtraction |
* | multiplication |
/ | division |
\ | integer division |
Calculations give an order ofpriority to arithmetic operators, this must be considered when enteringexpressions. For example, the result of the expression "6 - 3/2" is4.5 and not 1.5 because division takes priority over subtraction.
Priority of arithmetic operators in descending order | |
1. | Exponentiation (^) |
2. | Negation (-X) (Exception = x^-y; i.e. 4^-2 is 0.0625 and not -16) |
3. | Multiplication and Division (*, /) |
4. | Integer Division (\) |
5. | Addition and Subtraction (+, -) |
Trigonometric Functions | |
ARCCOS | arc cosine |
ARCCOSH | arc hyperbolic cosine |
ARCCOT | arc cotangent |
ARCCOTH | arc hyperbolic cotangent |
ARCCSC | arc cosecant |
ARCCSCH | arc hyperbolic cosecant |
ARCTANH | arc hyperbolic tangent |
ARCSEC | arc secant |
ARCSECH | arc hyperbolic secant |
ARCSIN | arc sine |
ARCSINH | arc hyperbolic sine |
ATN | arc tangent |
COS | cosine |
COT | cotangent |
COTH | hyperbolic cotangent |
CSC | cosecant |
CSCH | hyperbolic cosecant |
SIN | sine |
SINH | hyperbolic sine |
SECH | hyperbolic secant |
SEC | secant |
TAN | tangent |
TANH | hyperbolic tangent |
To convert degrees to radians,multiply degrees by pi/180. To convert radians to degrees, multiply radians by180/pi.
Logical Functions | |
AND | logical AND |
NOT | logical NOT |
OR | logical OR |
< | less than |
= | equal to |
> | greater than |
Copyright © 1990-2006 StatsDirect Limited, all rights reserved
Download a free 10 dayStatsDirect trial
Menu location: Data_ClearMissing Data
This function enables you toclear missing data values from a selection of cells, and to optionally preservethe matching of rows in that selection.
For example, you might want toignore incomplete records. Here, each row is a record for one individual, soyou would normally choose to preserve the matching of rows when prompted:
Sex | Age | Score |
1 | 23 | 2.3 |
1 | 33 | 3.1 |
1 | 19 | |
2 | 21 | 1.9 |
2 | 43 | |
2 | 39 | 5.0 |
2 | 26 | 1.1 |
Clearing missing data with therow matching option ON would produce:
Sex | Age | Score |
1 | 23 | 2.3 |
1 | 33 | 3.1 |
2 | 21 | 1.9 |
2 | 39 | 5.0 |
2 | 26 | 1.1 |
…and clearing missing data withthe row matching option OFF would produce:
Sex | Age | Score |
1 | 23 | 2.3 |
1 | 33 | 3.1 |
1 | 19 | 1.9 |
2 | 21 | 5.0 |
2 | 43 | 1.1 |
2 | 39 | |
2 | 26 |
StatsDirect treats the followingnumber, characters and text as missing data:
3E+300
*
.
missing
<just spaces>
You may additionally definemissing numbers or text codes when prompted. This is often useful when you havebeen given data with missing values coded in a specific way, e.g. a score of -1 in the example above.
Copyright © 1990-2006 StatsDirect Limited, all rights reserved
Download a free 10 dayStatsDirect trial
Menu location: Data_Grouping_Categorise.
This function enables you tocategorise any set of data into groups that you specify, for example ages intoage groups.
Typically, a continuousvariable might be divided into categories or groups. Take the IgM variable inthe parametric sheet of the test workbook for example; this has 298observations which you might want to summarise in ranges of values. In order todo this, simply select the Data_Grouping_Categorise menu item then select theIgM column of data. You are presented with different ways to group your datainto bins (intervals) of counts:
·Quartiles
4 bins (< lower quartile; lower quartile to median; median to upperquartile; >= upper quartile)
·Quintiles
5 bins (< first quintile…; >= fourth quintile)
·Deciles
10 bins (<first decile…; >=ninth decile)
·Age groups
one of four common groupings:
<15; 15-19… five yearly bands to 85+
<15; 15-24… ten yearly bands to 85+
<1; 1-4 … five yearly bands to 85+
<1; 1-4… ten yearly bands to 75+
·User-defined
from minimum min, in k intervals of equal size = step:
<min+1*step; >= min+1*step to < min+2*step… in k intervals to >=min+ k*step
Using the IgM example inquartiles:
category | count |
< 0.5 | 56 |
>= 0.5; < 0.7 | 67 |
>= 0.7; < 1 | 98 |
>= 1 | 77 |
Using the IgM example in 10 intervals of 0.5 from:
category | count |
< 0.5 | 56 |
>= 0.5; < 1 | 165 |
>= 1; < 1.5 | 54 |
>= 1.5; < 2 | 14 |
>= 2; < 2.5 | 6 |
>= 2.5; < 3 | 2 |
>= 3; < 3.5 | 0 |
>= 3.5; < 4 | 0 |
>= 4; < 4.5 | 0 |
>= 4.5 | 1 |
A quick look at the counts aboveshows a similar picture to that you would see from a histogram,namely that the data are not evenly spreadinto ranges of values, i.e. they are skewed.The text-basedhistogram will give you counts, but note that the bin values in a histogramare the mid-point of the bin and not the cut-off value between bins, i.e. theyare the same as a user-defined bin cut-off values minus half of the step size.
Copyright © 1990-2006 StatsDirect Limited, all rights reserved
Download a free 10 dayStatsDirect trial
Menu location: Data_Grouping_Extract.
This function enables you toextract subsets of data from a column of numbers, based upon an expression thatrefers to indicators in related rows of worksheet.
For example, you might want toselect only the rows from a score data column for which sex=2 and age>30;i.e. extract into one new variable, all data for females over 30:
Sex | Age | Score |
1 | 23 | 2.3 |
1 | 33 | 3.1 |
1 | 19 | 3.2 |
2 | 21 | 1.9 |
2 | 43 | 4.2 |
2 | 39 | 5.0 |
2 | 26 | 1.1 |
Use the Data_Grouping_Extractmenu item with Score as the data variable and Sex and Age and the indicatorvariables to create a new variable with the expression "v1=1 andv2>30":
Score [Sex=2 AND Age>30]
4.2
5
You can compose an expressionusing v1, v2 etc. and any of the functions, operators and logical expressionsdescribed under search andreplace. Here are some examples of expressions for the example of the twoindicators mentioned above:
V1: Sex
V2 Age
"v1=1" (not all of theselected/listed indicator variables have to be used)
"v2<>21 andv2>16"
"v1=2 or v2>25"
"(v1=1 and v2<=25) or(v1=2 and v2<=30)"
Copyright © 1990-2006 StatsDirect Limited, all rights reserved
Download a free 10 dayStatsDirect trial
Menu location: Data_Grouping_Split.
This function splits a datacolumn into other data columns according to one or more group identifier columns.The split data are put into new workbook columns that are marked Name~G=m,Name~G=f etc, where Name is the original combined data column label and G=xrepresent the group identifiers that you selected. It is wise to keepidentifier schemes simple.
Example
Gp | Data ---> | Data~Gp=1 | Data~Gp=2 | Data~Gp=3 |
1 | 1.1 | 1.1 | 0.7 | 1.9 |
1 | 1.3 | 1.3 | 1.0 | 2.2 |
1 | 0.9 | 0.9 | 0.6 | 1.7 |
2 | 0.7 | 1.5 | 1.1 | |
2 | 1.0 | 1.3 | ||
3 | 1.9 | |||
1 | 1.5 | |||
1 | 1.3 | |||
2 | 0.6 | |||
3 | 2.2 | |||
3 | 1.7 | |||
2 | 1.1 |
Some other statistical packagesrequire group identifier columns for group comparison tests, where all data areexpected to be in a single column. StatsDirect does not do this for simplegroup comparison methods because separate data columns aid comprehension of theanalysis. If you need to share data between StatsDirect and statisticalpackages that require group identifiers then use this function and the combinegroups function to prepare data prior to import/export.
Copyright © 1990-2006 StatsDirect Limited, all rights reserved
Download a free 10 dayStatsDirect trial
Menu location: Data_Grouping_Combine.
This function combines data fromthe workbook columns you select and puts them into a single new data columnwith a group identifier column alongside it.
Example
Data~G1 | Data~G2 | Data~G3 ---> | Group ID | Data |
1.1 | 0.7 | 1.9 | Data~G1 | 1.1 |
1.3 | 1.0 | 2.2 | Data~G1 | 1.3 |
0.9 | 0.6 | 1.7 | Data~G1 | 0.9 |
1.5 | 1.1 | Data~G1 | 1.5 | |
1.3 | Data~G1 | 1.3 | ||
Data~G2 | 0.7 | |||
Data~G2 | 1.0 | |||
Data~G2 | 0.6 | |||
Data~G2 | 1.1 | |||
Data~G1 | 1.9 | |||
Data~G1 | 2.2 | |||
Data~G1 | 1.7 |
Some other statistical packagesrequire group identifier columns for group comparison tests, where all data areexpected to be in a single column. StatsDirect does not do this for simplegroup comparison methods because separate data columns aid comprehension of theanalysis. If you need to share data between StatsDirect and statisticalpackages that require group identifiers then use this function and the groupsplit function to prepare data prior to import/export.
Copyright © 1990-2006 StatsDirect Limited, all rights reserved
Download a free 10 dayStatsDirect trial
Menu location: Data_Grouping_Tabulate.
This a simple two way crosstabulation function. If you have two columns of numbers that correspond todifferent classifications of the same individuals then you can use thisfunction to give a two way frequency table for the cross classification.
If you need to stratify by athird classification variable, or you want to perform analyses straight away onyou cross tabulation, then please see the Analysis_Crosstabsfunction instead.
Example
A database of test scores containstwo fields of interest, sex (M=1, F=0) and grade of skin reaction to an antigen(none = 0, weak + = 1, strong + = 2). Here is a list of those fields for 10patients:
Sex | Reaction |
0 | 0 |
1 | 1 |
1 | 2 |
0 | 2 |
1 | 2 |
0 | 1 |
0 | 0 |
0 | 1 |
1 | 2 |
1 | 0 |
In order to get a crosstabulation of these from StatsDirect you should enter these data in twoworkbook columns. Then choose Tabulate from the Grouping section of the Datamenu.
For this example:
Reaction:0 | reaction:1 | reaction:2 | |
0 | 1 | 2 | |
sex:0 | 2 | 2 | 1 |
sex:1 | 1 | 1 | 3 |
Copyright © 1990-2006 StatsDirect Limited, all rights reserved
Download a free 10 dayStatsDirect trial
Menu location: Data_Grouping_Detabulate.
This a function to expand a twoway table of counts into a column indicator variable and a row indicatorvariable, i.e. to deconstruct a table back into individual level data. This isthe opposite of crosstabs or the tabulatefunction.
Example
Given the following table:
reaction:0 | reaction:1 | reaction:2 | |
sex:0 | 2 | 2 | 1 |
sex:1 | 1 | 1 | 3 |
You can use theData_Grouping_Detabulate menu item on the two by three table of counts to getback to individual level data with each row representing the classification ofa subject as follows:
Row Classifier | Column Classifier |
0 | 0 |
1 | 1 |
1 | 2 |
0 | 2 |
1 | 2 |
0 | 1 |
0 | 0 |
0 | 1 |
1 | 2 |
1 | 0 |
Copyright © 1990-2006 StatsDirect Limited, all rights reserved
Download a free 10 dayStatsDirect trial
Menu location: Analysis_Crosstabs.
This a two or three way crosstabulation function. If you have two columns of numbers that correspond todifferent classifications of the same individuals then you can use thisfunction to give a two way frequency table for the cross classification. Thiscan be stratified by a third classification variable.
For two way crosstabs,StatsDirect offers a range of analyses appropriate to the dimensions of thecontingency table. For more information see chi-squaretests and exacttests.
For three way crosstabs, StatsDirectoffers either oddsratio (for case-control studies) or relative risk(for cohort studies) meta-analyses for 2 by 2 by k tables, and generalisedCochran-Mantel-Haenszel tests for r by c by k tables.
Example
A database of test scorescontains two fields of interest, sex (M=1, F=0) and grade of skin reaction toan antigen (none = 0, weak + = 1, strong + = 2). Here is a list of those fieldsfor 10 patients:
Sex | Reaction |
0 | 0 |
1 | 1 |
1 | 2 |
0 | 2 |
1 | 2 |
0 | 1 |
0 | 0 |
0 | 1 |
1 | 2 |
1 | 0 |
In order to get a cross tabulationof these from StatsDirect you should enter these data in two workbook columns.Then choose crosstabs from the analysis menu.
For this example:
Reaction | ||||
0 | 1 | 2 | ||
Sex | 0 | 2 | 2 | 1 |
1 | 1 | 1 | 3 |
We could then proceed to an r byc (2 by 3) contingencytable analysis to look for association between sex and reaction to thisantigen:
Contingency table analysis
Observed | 2 | 2 | 1 | 5 |
% of row | 40% | 40% | 20% | |
% of col | 66.67% | 66.67% | 25% | 50% |
Observed | 1 | 1 | 3 | 5 |
% of row | 20% | 20% | 60% | |
% of col | 33.33% | 33.33% | 75% | 50% |
Total | 3 | 3 | 4 | 10 |
% of n | 30% | 30% | 40% |
TOTAL number of cells = 6
WARNING: 6 out of 6 cells haveEXPECTATION < 5
NOMINAL INDEPENDENCE
Chi-square = 1.666667, DF = 2, P= 0.4346
G-square = 1.726092, DF = 2, P =0.4219
Fisher-Freeman-Halton exact P =0.5714
ANOVA
Chi-square for equality of meancolumn scores = 1.5
DF = 2, P = 0.4724
LINEAR TREND
Sample correlation (r) = 0.361158
Chi-square for linear trend (M²)= 1.173913
DF = 1, P = 0.2786
NOMINAL ASSOCIATION
Phi = 0.408248
Pearson's contingency = 0.377964
Cramér's V = 0.408248
ORDINAL
Goodman-Kruskal gamma = 0.555556
Approximate test of gamma = 0: SE= 0.384107, P = 0.1481, 95% CI = -0.197281 to 1.308392
Approximate test of independence:SE = 0.437445, P = 0.2041, 95% CI = -0.301821 to 1.412932
Kendall tau-b = 0.348155
Approximate test of tau-b = 0: SE= 0.275596, P = 0.2065, 95% CI = -0.192002 to 0.888313
Approximate test of independence:SE = 0.274138, P = 0.2041, 95% CI = -0.189145 to 0.885455