Difference between revisions of "ADSP t-SNE"

From Bradwiki
Jump to navigation Jump to search
(Created page with "The Alzheimer's Disease Sequencing Project ([https://www.niagads.org/adsp/content/home ADSP]) has made available WES data in vcf format. The following is a step-by-step analys...")
 
m
 
(4 intermediate revisions by the same user not shown)
Line 1: Line 1:
The Alzheimer's Disease Sequencing Project ([https://www.niagads.org/adsp/content/home ADSP]) has made available WES data in vcf format. The following is a step-by-step analysis walkthrough of this dataset. Here the ultimate goal is to develop a platform for Alzheimer's Disease diagnosis based on genome sequencing information. The data for this analysis can be downloaded here:
+
t-Distributed Stochastic Neighbor Embedding (tSNE) is a technique like PCA that allows one perform dimensionality reduction for visualization purposes. Supposedly tSNE does better than PCA at revealing clusters in high-dimensional data. Whereas PCA only allows you to visualize two or three components directly against each at the same time -- tSNE uses math magic to coerce a high-dimensional dataset into either a 2D or 3D array.
 +
 
 +
{{SmallBox|float=right|clear=none|margin=0px 0px 8px 18px|width=170px|font-size=13px|Other Analyses|txt-size=11px|
 +
1. [[AD|Intro]]<br>
 +
4. [[AD Neural Nets|More Neural Nets]]<br>
 +
2. [[AD PCA|PCA]]<br>
 +
3. [[AD t-SNE|t-SNE]]<br>
 +
5. [[AD Stats|Descriptive Statistics]]<br>
 +
}}
 +
 
 +
t-SNE models each multi-dim object against a point on a euclidean surface in such a way that similar features are modeled by nearby point functions and dissimilar features are modeled by distant point functions. It then projects these points onto the plane allowing you visualize, what would effectively be, all the interesting principal component combinations - the ones that yield unique clusters - simultaneously.
 +
 
 +
<br> <br> <br>
 +
 
 +
==t-SNE Code==
 +
----
 +
<br><br>
 +
 
 +
<syntaxhighlight lang="matlab" line start="1" highlight="1" enclose="div">
 +
% ######################################################################
 +
%%      tSNE : t-Distributed Stochastic Neighbor Embedding
 +
% ######################################################################
 +
clc; close all; clear; rng('shuffle')
 +
cd(fileparts(which('GENOS.m')));
 +
 
 +
 
 +
MATDATA = 'ADdata.mat';
 +
which(MATDATA)
 +
load(MATDATA)
 +
 
 +
clearvars -except AD
 +
 
 +
 
 +
 
 +
%% CARBON COPY MAIN VARIABLES FROM AD.STRUCT
 +
 
 +
LOCI = AD.LOCI(:,1:17);
 +
CASE = AD.CASE;
 +
CTRL = AD.CTRL;
 +
PHEN = AD.PHEN;
 +
 
 +
clearvars -except AD LOCI CASE CTRL PHEN
 +
 
 +
 
 +
 
 +
 
 +
 
 +
%###############################################################
 +
%%      DETERMINE WHICH PARTICIPANTS TO KEEP
 +
%###############################################################
 +
 
 +
 
 +
 
 +
PHE = PHEN(PHEN.TOTvars>14000,:);
 +
 
 +
 
 +
PHECASE = PHE(PHE.AD==1,:);
 +
PHECTRL = PHE(PHE.AD==0,:);
 +
 
 +
 
 +
clearvars -except AD LOCI CASE CTRL PHEN PHE PHECASE PHECTRL
 +
 
 +
 
 +
 
 +
 
 +
 
 +
 
 +
 
 +
 
 +
 
 +
%###############################################################
 +
%%          COUNT NUMBER OF VARIANTS PER LOCI
 +
%###############################################################
 +
 
 +
% The varsum() function will go through each known variant loci
 +
% and check whether anyone's SRR ID from your subset of IDs match
 +
% all known SRR IDs for that loci. It will then sum the total
 +
% number of alleles (+1 for hetzy-alt, +2 for homzy-alt) for each
 +
% loci and return the totals.
 +
 
 +
 
 +
[CASEN, CTRLN] = varsum(CASE, PHECASE.SRR, CTRL, PHECTRL.SRR);
 +
 
 +
 
 +
% SAVE COUNTS AS NEW TABLE COLUMNS
 +
LOCI.CASEREFS = numel(PHECASE.SRR)*2-CASEN;
 +
LOCI.CTRLREFS = numel(PHECTRL.SRR)*2-CTRLN;
 +
LOCI.CASEALTS = CASEN;
 +
LOCI.CTRLALTS = CTRLN;
 +
 
 +
 
 +
clearvars -except AD LOCI CASE CTRL PHEN PHE PHECASE PHECTRL
 +
 
 +
 
 +
 
 +
 
 +
 
 +
 
 +
 
 +
%###############################################################
 +
%%              COMPUTE FISHER'S P-VALUE
 +
%###############################################################
 +
 
 +
 
 +
% COMPUTE FISHERS STATISTICS FOR THE TRAINING GROUP
 +
[FISHP, FISHOR] = fishp_mex(LOCI.CASEREFS,LOCI.CASEALTS,...
 +
                            LOCI.CTRLREFS,LOCI.CTRLALTS);
 +
 
 +
LOCI.FISHPS  = FISHP;
 +
LOCI.FISHORS = FISHOR;
 +
 
 +
 
 +
clearvars -except AD LOCI CASE CTRL PHEN PHE PHECASE PHECTRL
 +
 
 +
 
 +
 
 +
 
 +
 
 +
%% MAKE LATEST COUNTS THE MAIN TABLE STATS
 +
 
 +
LOCI.CASEREF = LOCI.CASEREFS;
 +
LOCI.CTRLREF = LOCI.CTRLREFS;
 +
LOCI.CASEALT = LOCI.CASEALTS;
 +
LOCI.CTRLALT = LOCI.CTRLALTS;
 +
LOCI.FISHP  = LOCI.FISHPS;
 +
LOCI.FISHOR  = LOCI.FISHORS;
 +
 
 +
 
 +
 
 +
 
 +
 
 +
 
 +
%% SORT VARIANT LOCI TABLE BY FISHER P-VALUE
 +
 
 +
[X,i] = sort(LOCI.FISHP);
 +
 
 +
LOCI  = LOCI(i,:);
 +
CASE  = CASE(i);
 +
CTRL  = CTRL(i);
 +
LOCI.VID = (1:size(LOCI,1))';
 +
 
 +
LOCI.GENE = string(LOCI.GENE);
 +
 
 +
 
 +
 
 +
clc; clearvars -except AD LOCI CASE CTRL PHEN PHE PHECASE PHECTRL
 +
disp(LOCI(1:9,:))
 +
 
 +
 
 +
 
 +
 
 +
 
 +
%% STORE VARIABLES FOR PCA/TSNE AS 'AMX'
 +
 
 +
AMX        = LOCI;
 +
AMXCASE    = CASE;
 +
AMXCTRL    = CTRL;
 +
 
 +
 
 +
clearvars -except AD LOCI CASE CTRL PHEN PHE PHECASE PHECTRL...
 +
AMX AMXCASE AMXCTRL
 +
 
 +
 
 +
 
 +
 
 +
 
 +
%% FILTER VARIANTS BASED ALT > REF
 +
 
 +
PASS = (AMX.CASEREF > AMX.CASEALT./1.5) | (AMX.CTRLREF > AMX.CTRLALT./1.5);
 +
sum(~PASS)
 +
 
 +
AMX      = AMX(PASS,:);
 +
AMXCASE  = AMXCASE(PASS);
 +
AMXCTRL  = AMXCTRL(PASS);
 +
AMX.VID  = (1:size(AMX,1))';
 +
 
 +
 
 +
 
 +
 
 +
clearvars -except AD LOCI CASE CTRL PHEN PHE PHECASE PHECTRL...
 +
AMX AMXCASE AMXCTRL
 +
 
 +
 
 +
 
 +
 
 +
 
 +
%% TAKE THE TOP N NUMBER OF VARIANTS
 +
N = 100;
 +
 
 +
 
 +
AMX      = AMX(1:N,:);
 +
AMXCASE  = AMXCASE(1:N);
 +
AMXCTRL  = AMXCTRL(1:N);
 +
AMX.VID  = (1:size(AMX,1))';
 +
 
 +
fprintf('\n %.0f final loci count \n\n',size(AMX,1))
 +
 
 +
clearvars -except AD LOCI CASE CTRL PHEN PHE PHECASE PHECTRL...
 +
AMX AMXCASE AMXCTRL
 +
 
 +
 
 +
 
 +
 
 +
 
 +
 
 +
 
 +
 
 +
%% MAKE  RECTANGLE  NN VARIANT MATRIX
 +
 
 +
 
 +
[ADNN, caMX, coMX] = varmx(AMX,AMXCASE,AMXCTRL,PHE);
 +
 
 +
clearvars -except AD LOCI CASE CTRL PHEN PHE PHECASE PHECTRL...
 +
AMX AMXCASE AMXCTRL ADNN
 +
 
 +
 
 +
 
 +
 
 +
 
 +
 
 +
 
 +
 
 +
 
 +
%% RANDOMIZE ADNN AND REORDER PHE TO MATCH ADNN
 +
 
 +
ADL = ADNN(1,:);
 +
ADN = ADNN(2:end,:);
 +
i = randperm(size(ADN,1));
 +
ADN = ADN(i,:);
 +
ADNN = [ADL;ADN];
 +
 
 +
 
 +
[i,j] = ismember(PHE.SRR, ADN(:,1) );
 +
PHE.USED = i;
 +
PHE.ORDER = j;
 +
PHE = PHE(PHE.USED,:);
 +
PHE = sortrows(PHE,'ORDER');
 +
 
 +
 
 +
 
 +
PCAMX = ADNN(2:end,4:end);
 +
 
 +
 
 +
clearvars -except AD LOCI CASE CTRL PHEN PHE PHECASE PHECTRL...
 +
AMX AMXCASE AMXCTRL ADNN PCAMX 
 +
 
 +
 
 +
 
 +
 
 +
%% (OPTIONAL) PRE-PERFORM PCA BEFORE TSNE
 +
 
 +
% ss = statset('pca');
 +
% ss.Display = 'iter';
 +
% ss.MaxIter = 100;
 +
% ss.TolFun = 1e4;
 +
% ss.TolX = 1e4;
 +
% ss.UseParallel = true;
 +
%
 +
% [PCAC,PCAS,~,~,~] = pca(  PCAMX' , 'Options',ss);
 +
% clc; close all; scatter(PCAC(:,1),PCAC(:,2))
 +
%
 +
% % ...,'NumPCAComponents',0,...  means don't use PCA
 +
% tSN = tsne(PCAC(:,1:10),'NumDimensions',2,'Theta',.6,'NumPCAComponents',0);
 +
%
 +
% clearvars -except AD GENB LOCI CASE CTRL PHEN AMX AMXCASE AMXCTRL...
 +
% PHE ADNN PCAMX tSN PCAC PCAS
 +
 
 +
 
 +
 
 +
 
 +
 
 +
 
 +
%######################################################################
 +
%%      tSNE : t-Distributed Stochastic Neighbor Embedding
 +
%######################################################################
 +
 
 +
 
 +
 
 +
tSN = tsne(PCAMX,'NumDimensions',2,'Theta',.6,'NumPCAComponents',8);
 +
 
 +
 
 +
disp('done')
 +
clearvars -except AD GENB LOCI CASE CTRL PHEN AMX AMXCASE AMXCTRL...
 +
PHE ADNN PCAMX tSN PCAC PCAS
 +
</syntaxhighlight>
 +
 
 +
 
 +
 
 +
 
 +
 
 +
 
 +
==t-SNE Plots==
 +
----
 +
<br><br>
 +
 
 +
 
 +
 
 +
 
 +
 
 +
====ALZHEIMER'S STATUS====
 +
<syntaxhighlight lang="matlab" line start="1" highlight="1" enclose="div">
 +
%% PLOT TSNE --- ALZHEIMER'S STATUS (CASE/CTRL) --------------------------
 +
close all;
 +
fh1=figure('Units','normalized','Position',[.05 .05 .70 .84],'Color','w');
 +
ax1=axes('Position',[.05 .02 .9 .9],'Color','none');
 +
 
 +
ph1 = gscatter(tSN(:,1),tSN(:,2),  PHE.AD, [],'.',15);
 +
 
 +
title({'\fontsize{16} t-SNE : CASE vs CTRL',' '})
 +
legend(ph1,{'CTRL','CASE'},'FontSize',12,'Box','off','Location','NorthWest');
 +
axis off
 +
</syntaxhighlight>
 +
 
 +
<big>Top 100 variants</big>
 +
[[File: TSNE Case Control.png|800px]]
 +
 
 +
<big>Top 2000 variants</big>
 +
[[File: TSNE Case Control 2kvars.png|800px]]
 +
 
 +
 
 +
 
 +
 
 +
 
 +
 
 +
 
 +
====STUDY COHORT====
 +
<syntaxhighlight lang="matlab" line start="1" highlight="1" enclose="div">
 +
%% PLOT TSNE --- CONSORTIUM STUDY COHORT (1:24) -------------------------
 +
close all;
 +
fh1=figure('Units','normalized','Position',[.05 .05 .70 .84],'Color','w');
 +
ax1=axes('Position',[.05 .02 .9 .9],'Color','none');
 +
 
 +
ph1 = gscatter(tSN(:,1),tSN(:,2),  PHE.COHORT, [],'.',15);
 +
 
 +
 
 +
title({'\fontsize{16} t-SNE : STUDY COHORT',' '})
 +
% legend(ph1,{'CTRL','CASE'},'FontSize',12,'Box','off','Location','NorthWest');
 +
axis off
 +
</syntaxhighlight>
 +
 
 +
<big>Top 100 variants</big>
 +
[[File: TSNE Study Cohort.png|800px]]
 +
 
 +
<big>Top 2000 variants</big>
 +
[[File: TSNE Study Cohort 2kvars.png|800px]]
 +
 
 +
 
 +
 
 +
 
 +
====SEX====
 +
<syntaxhighlight lang="matlab" line start="1" highlight="1" enclose="div">
 +
%% PLOT TSNE --- SEX (M/F) ----------------------------------------------
 +
close all;
 +
fh1=figure('Units','normalized','Position',[.05 .05 .70 .84],'Color','w');
 +
ax1=axes('Position',[.05 .02 .9 .9],'Color','none');
 +
 
 +
ph1 = gscatter(tSN(:,1),tSN(:,2),  PHE.SEX, [],'.',15);
 +
 
 +
 
 +
title({'\fontsize{16} t-SNE : SEX',' '})
 +
legend(ph1,{'Male','Female'},'FontSize',12,'Box','off','Location','NorthWest');
 +
axis off
 +
</syntaxhighlight>
 +
 
 +
<big>Top 100 variants</big>
 +
[[File: TSNE Sex.png|800px]]
 +
 
 +
<big>Top 2000 variants</big>
 +
[[File: TSNE Sex 2kvars.png|800px]]
 +
 
 +
 
 +
 
 +
 
 +
====AGE====
 +
<syntaxhighlight lang="matlab" line start="1" highlight="1" enclose="div">
 +
%% PLOT TSNE --- AGE (BINNED AGE) ---------------------------------------
 +
close all;
 +
fh1=figure('Units','normalized','Position',[.05 .05 .70 .84],'Color','w');
 +
ax1=axes('Position',[.05 .02 .9 .9],'Color','none');
 +
 
 +
AGE = round(PHE.AGE);
 +
ofAGE = AGE>60;
 +
A = AGE(ofAGE);
 +
 
 +
histogram(AGE)
 +
 
 +
[Y,E] = discretize(A,[60 80 90 91]);
 +
% [Y,E] = discretize(A,[60 75 85 90 91]);
 +
for nn = 1:numel(E)
 +
A(Y==nn) = E(nn);
 +
end
 +
 
 +
ph1 = gscatter(tSN(ofAGE,1),tSN(ofAGE,2),  A, [],'.',15);
 +
 
 +
 
 +
title({'\fontsize{16} t-SNE : AGE',' '})
 +
% legend(ph1,{'CTRL','CASE'},'FontSize',12,'Box','off','Location','NorthWest');
 +
axis off
 +
</syntaxhighlight>
 +
 
 +
<big>Top 100 variants</big>
 +
[[File: TSNE Age.png|800px]]
 +
 
 +
<big>Top 2000 variants</big>
 +
[[File: TSNE Age 2kvars.png|800px]]
 +
 
 +
 
 +
 
 +
 
 +
 
 +
====APOE STATUS====
 +
<syntaxhighlight lang="matlab" line start="1" highlight="1" enclose="div">
 +
%% PLOT TSNE --- APOE STATUS (22,23,24,33,34,44) ------------------------
 +
close all;
 +
fh1=figure('Units','normalized','Position',[.05 .05 .70 .84],'Color','w');
 +
ax1=axes('Position',[.05 .02 .9 .9],'Color','none');
 +
 
 +
 
 +
ph1 = gscatter(tSN(:,1),tSN(:,2),  PHE.APOE, [],'.',15);
 +
 
 +
 
 +
ph1(1).MarkerSize = 35;
 +
ph1(2).MarkerSize = 25;
 +
ph1(2).Color = [.20 .20 .99];
 +
ph1(3).MarkerSize = 35;
 +
ph1(4).Color = [.99 .50 .10];
 +
ph1(5).Color = [.30 .70 .80];
 +
ph1(6).MarkerSize = 25;
 +
 
 +
title({'\fontsize{16} t-SNE : APOE',' '})
 +
% legend(ph1,{'CTRL','CASE'},'FontSize',12,'Box','off','Location','NorthWest');
 +
axis off
 +
</syntaxhighlight>
 +
 
 +
<big>Top 100 variants</big>
 +
[[File: TSNE APOE.png|800px]]
 +
 
 +
<big>Top 2000 variants</big>
 +
[[File: TSNE APOE 2kvars.png|800px]]
 +
 
 +
 
 +
 
 +
 
 +
 
 +
====CONSENT GROUP====
 +
<syntaxhighlight lang="matlab" line start="1" highlight="1" enclose="div">
 +
%% PLOT TSNE --- CONSENT GROUP ------------------------------------------
 +
close all;
 +
fh1=figure('Units','normalized','Position',[.05 .05 .70 .84],'Color','w');
 +
ax1=axes('Position',[.05 .02 .9 .9],'Color','none');
 +
 
 +
 
 +
ph1 = gscatter(tSN(:,1),tSN(:,2),  PHE.RD, [],'.',15);
 +
 
 +
 
 +
title({'\fontsize{16} t-SNE : CONSENT GROUP',' '})
 +
% legend(ph1,{'CTRL','CASE'},'FontSize',12,'Box','off','Location','NorthWest');
 +
axis off
 +
</syntaxhighlight>
 +
 
 +
<big>Top 2000 variants</big>
 +
[[File: TSNE Consent 2kvars.png|800px]]
 +
 
 +
 
 +
 
 +
 
 +
 
 +
 
 +
 
 +
 
 +
 
 +
 
 +
 
 +
 
 +
 
 +
 
 +
 
 +
 
 +
 
 +
 
 +
 
 +
 
  
: [https://drive.google.com/open?id=1e3tIbQhcDUF1vofAwf4oYVOl8XriC44v ADSP_WES_VCF_LATEST_RELEASE.mat].
 
  
Both the dataset and the following analysis code are in MATLAB format. Note however there is an [https://cran.r-project.org/web/packages/R.matlab/index.html R package] for reading .mat files. All MATLAB code and custom functions used in the demo walkthrough below can be downloaded here:
 
  
: [https://github.com/subroutines/genos GENOS GITHUB CODE REPO]
 
  
{{SmallBox|float=right|clear=none|margin=0px 0px 8px 18px|width=170px|font-size=13px|Other Analyses|txt-size=11px|
 
1. [[ADSP|Intro]]<br>
 
4. [[ADSP Neural Nets|More Neural Nets]]<br>
 
2. [[ADSP PCA|PCA]]<br>
 
3. [[ADSP t-SNE|t-SNE]]<br>
 
5. [[ADSP Stats|Descriptive Statistics]]<br>
 
}}
 
  
  
  
  
 +
<br> <br> <br> <br> <br> <br> <br>
  
 
==Additional Genomics Analyses==
 
==Additional Genomics Analyses==
Line 23: Line 495:
 
<br><br>
 
<br><br>
 
{{SmallBox|float=left|clear=none|margin=0px 0px 8px 18px|width=50%|font-size=18px|Other Analyses|txt-size=14px|
 
{{SmallBox|float=left|clear=none|margin=0px 0px 8px 18px|width=50%|font-size=18px|Other Analyses|txt-size=14px|
1. [[ADSP|Intro]]<br>
+
1. [[AD|Intro]]<br>
4. [[ADSP Neural Nets|More Neural Nets]]<br>
+
4. [[AD Neural Nets|More Neural Nets]]<br>
2. [[ADSP PCA|PCA]]<br>
+
2. [[AD PCA|PCA]]<br>
3. [[ADSP t-SNE|t-SNE]]<br>
+
3. [[AD t-SNE|t-SNE]]<br>
5. [[ADSP Stats|Descriptive Statistics]]<br>
+
5. [[AD Stats|Descriptive Statistics]]<br>
 
}}
 
}}
  
Line 38: Line 510:
 
----
 
----
  
[http://www.bradleymonk.com/Category:ADSP Category:ADSP]
+
[http://www.bradleymonk.com/Category:AD Category:AD]
[[Category:ADSP]]
+
[[Category:AD]]

Latest revision as of 15:08, 11 June 2018

t-Distributed Stochastic Neighbor Embedding (tSNE) is a technique like PCA that allows one perform dimensionality reduction for visualization purposes. Supposedly tSNE does better than PCA at revealing clusters in high-dimensional data. Whereas PCA only allows you to visualize two or three components directly against each at the same time -- tSNE uses math magic to coerce a high-dimensional dataset into either a 2D or 3D array.

Other Analyses


t-SNE models each multi-dim object against a point on a euclidean surface in such a way that similar features are modeled by nearby point functions and dissimilar features are modeled by distant point functions. It then projects these points onto the plane allowing you visualize, what would effectively be, all the interesting principal component combinations - the ones that yield unique clusters - simultaneously.




t-SNE Code




  1 % ######################################################################
  2 %%       tSNE : t-Distributed Stochastic Neighbor Embedding
  3 % ######################################################################
  4 clc; close all; clear; rng('shuffle')
  5 cd(fileparts(which('GENOS.m')));
  6 
  7 
  8 MATDATA = 'ADdata.mat';
  9 which(MATDATA)
 10 load(MATDATA)
 11 
 12 clearvars -except AD
 13 
 14 
 15 
 16 %% CARBON COPY MAIN VARIABLES FROM AD.STRUCT
 17 
 18 LOCI = AD.LOCI(:,1:17);
 19 CASE = AD.CASE;
 20 CTRL = AD.CTRL;
 21 PHEN = AD.PHEN;
 22 
 23 clearvars -except AD LOCI CASE CTRL PHEN
 24 
 25 
 26 
 27 
 28 
 29 %###############################################################
 30 %%       DETERMINE WHICH PARTICIPANTS TO KEEP
 31 %###############################################################
 32 
 33 
 34 
 35 PHE = PHEN(PHEN.TOTvars>14000,:);
 36 
 37 
 38 PHECASE = PHE(PHE.AD==1,:);
 39 PHECTRL = PHE(PHE.AD==0,:);
 40 
 41 
 42 clearvars -except AD LOCI CASE CTRL PHEN PHE PHECASE PHECTRL
 43 
 44 
 45 
 46 
 47 
 48 
 49 
 50 
 51 
 52 %###############################################################
 53 %%          COUNT NUMBER OF VARIANTS PER LOCI
 54 %###############################################################
 55 
 56 % The varsum() function will go through each known variant loci
 57 % and check whether anyone's SRR ID from your subset of IDs match
 58 % all known SRR IDs for that loci. It will then sum the total
 59 % number of alleles (+1 for hetzy-alt, +2 for homzy-alt) for each
 60 % loci and return the totals.
 61 
 62 
 63 [CASEN, CTRLN] = varsum(CASE, PHECASE.SRR, CTRL, PHECTRL.SRR);
 64 
 65 
 66 % SAVE COUNTS AS NEW TABLE COLUMNS
 67 LOCI.CASEREFS = numel(PHECASE.SRR)*2-CASEN;
 68 LOCI.CTRLREFS = numel(PHECTRL.SRR)*2-CTRLN;
 69 LOCI.CASEALTS = CASEN;
 70 LOCI.CTRLALTS = CTRLN;
 71 
 72 
 73 clearvars -except AD LOCI CASE CTRL PHEN PHE PHECASE PHECTRL
 74 
 75 
 76 
 77 
 78 
 79 
 80 
 81 %###############################################################
 82 %%               COMPUTE FISHER'S P-VALUE
 83 %###############################################################
 84 
 85 
 86 % COMPUTE FISHERS STATISTICS FOR THE TRAINING GROUP
 87 [FISHP, FISHOR] = fishp_mex(LOCI.CASEREFS,LOCI.CASEALTS,...
 88                             LOCI.CTRLREFS,LOCI.CTRLALTS);
 89 
 90 LOCI.FISHPS  = FISHP;
 91 LOCI.FISHORS = FISHOR;
 92 
 93 
 94 clearvars -except AD LOCI CASE CTRL PHEN PHE PHECASE PHECTRL
 95 
 96 
 97 
 98 
 99 
100 %% MAKE LATEST COUNTS THE MAIN TABLE STATS
101 
102 LOCI.CASEREF = LOCI.CASEREFS;
103 LOCI.CTRLREF = LOCI.CTRLREFS;
104 LOCI.CASEALT = LOCI.CASEALTS;
105 LOCI.CTRLALT = LOCI.CTRLALTS;
106 LOCI.FISHP   = LOCI.FISHPS;
107 LOCI.FISHOR  = LOCI.FISHORS;
108 
109 
110 
111 
112 
113 
114 %% SORT VARIANT LOCI TABLE BY FISHER P-VALUE
115 
116 [X,i] = sort(LOCI.FISHP);
117 
118 LOCI  = LOCI(i,:);
119 CASE  = CASE(i);
120 CTRL  = CTRL(i);
121 LOCI.VID = (1:size(LOCI,1))';
122 
123 LOCI.GENE = string(LOCI.GENE);
124 
125 
126 
127 clc; clearvars -except AD LOCI CASE CTRL PHEN PHE PHECASE PHECTRL
128 disp(LOCI(1:9,:))
129 
130 
131 
132 
133 
134 %% STORE VARIABLES FOR PCA/TSNE AS 'AMX'
135 
136 AMX         = LOCI;
137 AMXCASE     = CASE;
138 AMXCTRL     = CTRL;
139 
140 
141 clearvars -except AD LOCI CASE CTRL PHEN PHE PHECASE PHECTRL...
142 AMX AMXCASE AMXCTRL 
143 
144 
145 
146 
147 
148 %% FILTER VARIANTS BASED ALT > REF
149 
150 PASS = (AMX.CASEREF > AMX.CASEALT./1.5) | (AMX.CTRLREF > AMX.CTRLALT./1.5);
151 sum(~PASS)
152 
153 AMX      = AMX(PASS,:);
154 AMXCASE  = AMXCASE(PASS);
155 AMXCTRL  = AMXCTRL(PASS);
156 AMX.VID  = (1:size(AMX,1))';
157 
158 
159 
160 
161 clearvars -except AD LOCI CASE CTRL PHEN PHE PHECASE PHECTRL...
162 AMX AMXCASE AMXCTRL 
163 
164 
165 
166 
167 
168 %% TAKE THE TOP N NUMBER OF VARIANTS
169 N = 100;
170 
171 
172 AMX      = AMX(1:N,:);
173 AMXCASE  = AMXCASE(1:N);
174 AMXCTRL  = AMXCTRL(1:N);
175 AMX.VID  = (1:size(AMX,1))';
176 
177 fprintf('\n %.0f final loci count \n\n',size(AMX,1))
178 
179 clearvars -except AD LOCI CASE CTRL PHEN PHE PHECASE PHECTRL...
180 AMX AMXCASE AMXCTRL 
181 
182 
183 
184 
185 
186 
187 
188 
189 %% MAKE  RECTANGLE  NN VARIANT MATRIX
190 
191 
192 [ADNN, caMX, coMX] = varmx(AMX,AMXCASE,AMXCTRL,PHE);
193 
194 clearvars -except AD LOCI CASE CTRL PHEN PHE PHECASE PHECTRL...
195 AMX AMXCASE AMXCTRL ADNN 
196 
197 
198 
199 
200 
201 
202 
203 
204 
205 %% RANDOMIZE ADNN AND REORDER PHE TO MATCH ADNN
206 
207 ADL = ADNN(1,:);
208 ADN = ADNN(2:end,:);
209 i = randperm(size(ADN,1));
210 ADN = ADN(i,:);
211 ADNN = [ADL;ADN];
212 
213 
214 [i,j] = ismember(PHE.SRR, ADN(:,1) );
215 PHE.USED = i;
216 PHE.ORDER = j;
217 PHE = PHE(PHE.USED,:);
218 PHE = sortrows(PHE,'ORDER');
219 
220 
221 
222 PCAMX = ADNN(2:end,4:end);
223 
224 
225 clearvars -except AD LOCI CASE CTRL PHEN PHE PHECASE PHECTRL...
226 AMX AMXCASE AMXCTRL ADNN PCAMX  
227 
228 
229 
230 
231 %% (OPTIONAL) PRE-PERFORM PCA BEFORE TSNE 
232 
233 % ss = statset('pca');
234 % ss.Display = 'iter';
235 % ss.MaxIter = 100;
236 % ss.TolFun = 1e4;
237 % ss.TolX = 1e4;
238 % ss.UseParallel = true;
239 % 
240 % [PCAC,PCAS,~,~,~] = pca(  PCAMX' , 'Options',ss);
241 % clc; close all; scatter(PCAC(:,1),PCAC(:,2))
242 %
243 % % ...,'NumPCAComponents',0,...  means don't use PCA
244 % tSN = tsne(PCAC(:,1:10),'NumDimensions',2,'Theta',.6,'NumPCAComponents',0);
245 % 
246 % clearvars -except AD GENB LOCI CASE CTRL PHEN AMX AMXCASE AMXCTRL...
247 % PHE ADNN PCAMX tSN PCAC PCAS
248 
249 
250 
251 
252 
253 
254 %######################################################################
255 %%       tSNE : t-Distributed Stochastic Neighbor Embedding
256 %######################################################################
257 
258 
259 
260 tSN = tsne(PCAMX,'NumDimensions',2,'Theta',.6,'NumPCAComponents',8);
261 
262 
263 disp('done')
264 clearvars -except AD GENB LOCI CASE CTRL PHEN AMX AMXCASE AMXCTRL...
265 PHE ADNN PCAMX tSN PCAC PCAS




t-SNE Plots






ALZHEIMER'S STATUS

 1 %% PLOT TSNE --- ALZHEIMER'S STATUS (CASE/CTRL) --------------------------
 2 close all; 
 3 fh1=figure('Units','normalized','Position',[.05 .05 .70 .84],'Color','w');
 4 ax1=axes('Position',[.05 .02 .9 .9],'Color','none');
 5 
 6 ph1 = gscatter(tSN(:,1),tSN(:,2),  PHE.AD, [],'.',15);
 7 
 8 title({'\fontsize{16} t-SNE : CASE vs CTRL',' '})
 9 legend(ph1,{'CTRL','CASE'},'FontSize',12,'Box','off','Location','NorthWest');
10 axis off

Top 100 variants TSNE Case Control.png

Top 2000 variants TSNE Case Control 2kvars.png




STUDY COHORT

 1 %% PLOT TSNE --- CONSORTIUM STUDY COHORT (1:24) -------------------------
 2 close all; 
 3 fh1=figure('Units','normalized','Position',[.05 .05 .70 .84],'Color','w');
 4 ax1=axes('Position',[.05 .02 .9 .9],'Color','none');
 5 
 6 ph1 = gscatter(tSN(:,1),tSN(:,2),  PHE.COHORT, [],'.',15);
 7 
 8 
 9 title({'\fontsize{16} t-SNE : STUDY COHORT',' '})
10 % legend(ph1,{'CTRL','CASE'},'FontSize',12,'Box','off','Location','NorthWest');
11 axis off

Top 100 variants TSNE Study Cohort.png

Top 2000 variants TSNE Study Cohort 2kvars.png



SEX

 1 %% PLOT TSNE --- SEX (M/F) ----------------------------------------------
 2 close all; 
 3 fh1=figure('Units','normalized','Position',[.05 .05 .70 .84],'Color','w');
 4 ax1=axes('Position',[.05 .02 .9 .9],'Color','none');
 5 
 6 ph1 = gscatter(tSN(:,1),tSN(:,2),  PHE.SEX, [],'.',15);
 7 
 8 
 9 title({'\fontsize{16} t-SNE : SEX',' '})
10 legend(ph1,{'Male','Female'},'FontSize',12,'Box','off','Location','NorthWest');
11 axis off

Top 100 variants TSNE Sex.png

Top 2000 variants TSNE Sex 2kvars.png



AGE

 1 %% PLOT TSNE --- AGE (BINNED AGE) ---------------------------------------
 2 close all; 
 3 fh1=figure('Units','normalized','Position',[.05 .05 .70 .84],'Color','w');
 4 ax1=axes('Position',[.05 .02 .9 .9],'Color','none');
 5 
 6 AGE = round(PHE.AGE);
 7 ofAGE = AGE>60;
 8 A = AGE(ofAGE);
 9 
10 histogram(AGE)
11 
12 [Y,E] = discretize(A,[60 80 90 91]);
13 % [Y,E] = discretize(A,[60 75 85 90 91]);
14 for nn = 1:numel(E)
15 A(Y==nn) = E(nn);
16 end
17 
18 ph1 = gscatter(tSN(ofAGE,1),tSN(ofAGE,2),  A, [],'.',15);
19 
20 
21 title({'\fontsize{16} t-SNE : AGE',' '})
22 % legend(ph1,{'CTRL','CASE'},'FontSize',12,'Box','off','Location','NorthWest');
23 axis off

Top 100 variants TSNE Age.png

Top 2000 variants TSNE Age 2kvars.png



APOE STATUS

 1 %% PLOT TSNE --- APOE STATUS (22,23,24,33,34,44) ------------------------
 2 close all; 
 3 fh1=figure('Units','normalized','Position',[.05 .05 .70 .84],'Color','w');
 4 ax1=axes('Position',[.05 .02 .9 .9],'Color','none');
 5 
 6 
 7 ph1 = gscatter(tSN(:,1),tSN(:,2),  PHE.APOE, [],'.',15);
 8 
 9 
10 ph1(1).MarkerSize = 35;
11 ph1(2).MarkerSize = 25;
12 ph1(2).Color = [.20 .20 .99];
13 ph1(3).MarkerSize = 35;
14 ph1(4).Color = [.99 .50 .10];
15 ph1(5).Color = [.30 .70 .80];
16 ph1(6).MarkerSize = 25;
17 
18 title({'\fontsize{16} t-SNE : APOE',' '})
19 % legend(ph1,{'CTRL','CASE'},'FontSize',12,'Box','off','Location','NorthWest');
20 axis off

Top 100 variants TSNE APOE.png

Top 2000 variants TSNE APOE 2kvars.png



CONSENT GROUP

 1 %% PLOT TSNE --- CONSENT GROUP ------------------------------------------
 2 close all; 
 3 fh1=figure('Units','normalized','Position',[.05 .05 .70 .84],'Color','w');
 4 ax1=axes('Position',[.05 .02 .9 .9],'Color','none');
 5 
 6 
 7 ph1 = gscatter(tSN(:,1),tSN(:,2),  PHE.RD, [],'.',15);
 8 
 9 
10 title({'\fontsize{16} t-SNE : CONSENT GROUP',' '})
11 % legend(ph1,{'CTRL','CASE'},'FontSize',12,'Box','off','Location','NorthWest');
12 axis off

Top 2000 variants TSNE Consent 2kvars.png






















Additional Genomics Analyses




Other Analyses










Notes


Category:AD