parallelplot
Create a parallel coordinates plot from a table of medical patient data.
Load the
patients
data set, and create a table from a subset of the variables loaded into the workspace. Create a parallel coordinates plot using the table. The lines in the plot correspond to individual patients. Use the plot to observe trends in the data. For example, the plot indicates that smokers tend to have higher blood pressure values (both diastolic and systolic).
load patients tbl = table(Diastolic,Smoker,Systolic); p = parallelplot(tbl) fig2plotly()
fig2plotly()p = ParallelCoordinatesPlot with properties: SourceTable: [100x3 table] CoordinateVariables: {'Diastolic' 'Smoker' 'Systolic'} GroupVariable: '' Show all properties
By default, the software randomly jitters plot lines so that they are unlikely to overlap perfectly along coordinate rulers. This jittering is particularly helpful for visualizing categorical data because it enables you to distinguish between plot lines more easily. For example, observe the plot lines along the
Smoker
coordinate ruler; the plot lines are not flush with either thetrue
orfalse
tick marks.To disable the default jittering, set the
Jitter
property to0
.
p.Jitter = 0; fig2plotly()
Create a parallel coordinates plot from a table of tsunami data. Specify the table variables to display and their order, and group the lines in the plot according to one of the variables.
Read the tsunami data into the workspace as a table.
tsunamis = readtable('tsunamis.xlsx');
Create a parallel coordinates plot using a subset of the variables in the table. First, increase the figure window size to prevent overcrowding in the plot. Then, to specify the variables and their order, use the
'CoordinateVariables'
name-value pair argument. To group occurrences according to their validity, set the'GroupVariable'
name-value pair argument to'Validity'
. The lines in the plot correspond to individual tsunami occurrences. The plot indicates that most of the occurrences in the data set that have aValidity
value are considered definite tsunamis.
figure('Units','normalized','Position',[0.3 0.3 0.45 0.4]) coordvars = {'Year','Validity','Cause','Country'}; p = parallelplot(tsunamis,'CoordinateVariables',coordvars,'GroupVariable','Validity'); fig2plotly()
Create a parallel coordinates plot from a matrix containing medical patient data. Bin the values in one of the columns in the matrix, and group the lines in the plot using the binned values.
Load the
patients
data set, and create a matrix from theAge
,Height
, andWeight
values. Create a parallel coordinates plot using the matrix data. Label the coordinate variables in the plot. The lines in the plot correspond to individual patients.
load patients X = [Age Height Weight]; p = parallelplot(X) fig2plotly()
fig2plotly()p = ParallelCoordinatesPlot with properties: Data: [100x3 double] CoordinateData: [1 2 3] GroupData: [] Show all properties
p.CoordinateTickLabels = {'Age (years)','Height (inches)','Weight (pounds)'}; fig2plotly()
Create a new categorical variable that groups each patient into one of three categories:
short
,average
, ortall
. Set the bin edges such that they include the minimum and maximumHeight
values.
min(Height)
fig2plotly()ans = 60
max(Height)
fig2plotly()ans = 72
binEdges = [60 64 68 72]; bins = {'short','average','tall'}; groupHeight = discretize(Height,binEdges,'categorical',bins); fig2plotly()
Now use the
groupHeight
values to group the lines in the parallel coordinates plot. The plot indicates thatshort
patients tend to weigh less thantall
patients.
p.GroupData = groupHeight; fig2plotly()
Create parallel coordinates plots from a matrix containing medical patient data. For each plot, specify the columns of the matrix to display, and group the lines in the plot according to a separate variable.
Load the
patients
data set, and create a matrix from some of the variables loaded into the workspace.
load patients X = [Age Height Weight];
Create a parallel coordinates plot using a subset of the columns in the matrix
X
. To specify the columns and their order, use the'CoordinateData'
name-value pair argument. Group patients according to their smoker status by passing theSmoker
values to the'GroupData'
name-value pair argument. The lines in the plot correspond to individual patients. The plot indicates that no clear relationship exists between smoker status and either age or weight.
coorddata = [1 3]; p = parallelplot(X,'CoordinateData',coorddata,'GroupData',Smoker) fig2plotly()
fig2plotly()p = ParallelCoordinatesPlot with properties: Data: [100x3 double] CoordinateData: [1 3] GroupData: [100x1 logical] Show all properties
p.CoordinateTickLabels = {'Age','Weight'}; fig2plotly()
Create another parallel coordinates plot using a different subset of the columns in
X
. Group the patients according to their gender. The plot indicates that the men are taller and weigh more than the women.
coorddata2 = [2 3]; p2 = parallelplot(X,'CoordinateData',coorddata2,'GroupData',Gender) fig2plotly()
fig2plotly()p2 = ParallelCoordinatesPlot with properties: Data: [100x3 double] CoordinateData: [2 3] GroupData: {100x1 cell} Show all properties
p2.CoordinateTickLabels = {'Height','Weight'}; fig2plotly()
Create a parallel coordinates plot from a table of power outage data. Change the normalization method for the numeric coordinate variables.
Read the power outage data into the workspace as a table. Display the first few rows of the table.
outages = readtable('outages.csv'); head(outages)
ans=8×6 table Region OutageTime Loss Customers RestorationTime Cause _____________ ________________ ______ __________ ________________ ___________________ {'SouthWest'} 2002-02-01 12:18 458.98 1.8202e+06 2002-02-07 16:50 {'winter storm' } {'SouthEast'} 2003-01-23 00:49 530.14 2.1204e+05 NaT {'winter storm' } {'SouthEast'} 2003-02-07 21:15 289.4 1.4294e+05 2003-02-17 08:14 {'winter storm' } {'West' } 2004-04-06 05:44 434.81 3.4037e+05 2004-04-06 06:10 {'equipment fault'} {'MidWest' } 2002-03-16 06:18 186.44 2.1275e+05 2002-03-18 23:23 {'severe storm' } {'West' } 2003-06-18 02:49 0 0 2003-06-18 10:54 {'attack' } {'West' } 2004-06-20 14:39 231.29 NaN 2004-06-20 19:16 {'equipment fault'} {'West' } 2002-06-06 19:28 311.86 NaN 2002-06-07 00:51 {'equipment fault'}
Create a new variable called
OutageDuration
that indicates how long each power outage lasted. ConvertOutageDuration
to the number of days each power outage lasted. Add the new variable to theoutages
table, and call itOutageDays
.
OutageDuration = outages.RestorationTime - outages.OutageTime; outages.OutageDays = days(OutageDuration);
Create a parallel coordinates plot using the
Loss
,Customers
, andOutageDays
variables. Because the coordinate variables are numeric, display the values in the plot as z-scores, without any jittering, using the'DataNormalization'
and'Jitter'
name-value pair arguments.
coordvars = {'Loss','Customers','OutageDays'}; p = parallelplot(outages,'CoordinateVariables',coordvars,'DataNormalization','zscore','Jitter',0); fig2plotly()
The
OutageDays
variable contains one value that is more than 30 standard deviations away from the meanOutageDays
value and another value that is more than 10 standard deviations away from the mean. Hover over the values in the plot to display data tips. Each data tip indicates the row in the table corresponding to the line in the plot.
Find the rows in the
outages
table that have the identified extremeOutageDays
values. Notice that theRestorationTime
values for these two power outages are suspicious.
outliers = outages([1011 269],:)
fig2plotly()outliers=2×7 table Region OutageTime Loss Customers RestorationTime Cause OutageDays _____________ ________________ ______ __________ ________________ ____________________ __________ {'NorthEast'} 2009-08-20 02:46 NaN 1.7355e+05 2042-09-18 23:31 {'severe storm' } 12083 {'MidWest' } 2008-02-07 06:18 2378.7 0 2019-08-14 16:16 {'energy emergency'} 4206.4
Create a parallel coordinates plot. Reorder the categories of one of the coordinate variables.
Read data on power outages into the workspace as a table.
outages = readtable('outages.csv');
Create a parallel coordinates plot using a subset of the columns in the table. Group the lines in the plot according to the event that caused the power outage.
coordvars = [1 3 4 6]; p = parallelplot(outages,'CoordinateVariables',coordvars,'GroupVariable','Cause'); fig2plotly()
Change the order of the events in
Cause
by updating the source table. First, convertCause
to acategorical
variable, specify the new order of the events, and use thereordercats
function to create a new variable calledorderCause
. Then, replace the originalCause
variable with the neworderCause
variable in the source table of the plot.
categoricalCause = categorical(p.SourceTable.Cause); newOrder = {'attack','earthquake','energy emergency','equipment fault', ... 'fire','severe storm','thunder storm','wind','winter storm','unknown'}; orderCause = reordercats(categoricalCause,newOrder); p.SourceTable.Cause = orderCause; fig2plotly()
Because the
Cause
variable contains more than seven categories, some of the groups have the same color in the plot. Assign distinct colors to every group by changing theColor
property ofp
.
p.Color = parula(10); fig2plotly()