{"id":383,"date":"2015-07-17T23:52:59","date_gmt":"2015-07-17T23:52:59","guid":{"rendered":"http:\/\/josephpcohen.com\/teaching\/cs310\/?p=383"},"modified":"2016-07-26T14:17:50","modified_gmt":"2016-07-26T14:17:50","slug":"project3","status":"publish","type":"post","link":"https:\/\/josephpcohen.com\/teaching\/cs310\/project3\/","title":{"rendered":"Project 3 : Studying Proteins"},"content":{"rendered":"<table>\n<tr>\n<td>\n<a href=\"http:\/\/josephpcohen.com\/teaching\/cs310\/wp-content\/uploads\/sites\/10\/lchen.png\"><img decoding=\"async\" loading=\"lazy\" class=\"aligncenter wp-image-422 size-large\" src=\"http:\/\/josephpcohen.com\/teaching\/cs310\/wp-content\/uploads\/sites\/10\/lchen-1024x779.png\" alt=\"lchen\" width=\"580\" height=\"441\" srcset=\"https:\/\/josephpcohen.com\/teaching\/cs310\/wp-content\/uploads\/sites\/10\/lchen-1024x779.png 1024w, https:\/\/josephpcohen.com\/teaching\/cs310\/wp-content\/uploads\/sites\/10\/lchen-300x228.png 300w, https:\/\/josephpcohen.com\/teaching\/cs310\/wp-content\/uploads\/sites\/10\/lchen-768x584.png 768w, https:\/\/josephpcohen.com\/teaching\/cs310\/wp-content\/uploads\/sites\/10\/lchen.png 1754w\" sizes=\"(max-width: 580px) 100vw, 580px\" \/><\/a><br \/>\n<center>Luke Chen (CS310 Summer 2015)<\/center>\n<\/td>\n<td>\n<a href=\"http:\/\/josephpcohen.com\/teaching\/cs310\/wp-content\/uploads\/sites\/10\/hkim.png\"><img decoding=\"async\" loading=\"lazy\" class=\"aligncenter wp-image-426 size-large\" src=\"http:\/\/josephpcohen.com\/teaching\/cs310\/wp-content\/uploads\/sites\/10\/hkim-1024x725.png\" alt=\"hkim\" width=\"580\" height=\"411\" srcset=\"https:\/\/josephpcohen.com\/teaching\/cs310\/wp-content\/uploads\/sites\/10\/hkim-1024x725.png 1024w, https:\/\/josephpcohen.com\/teaching\/cs310\/wp-content\/uploads\/sites\/10\/hkim-300x212.png 300w, https:\/\/josephpcohen.com\/teaching\/cs310\/wp-content\/uploads\/sites\/10\/hkim-768x544.png 768w, https:\/\/josephpcohen.com\/teaching\/cs310\/wp-content\/uploads\/sites\/10\/hkim.png 1534w\" sizes=\"(max-width: 580px) 100vw, 580px\" \/><\/a><br \/>\n<center>Hae Young Kim (CS310 Summer 2015)<\/center>\n<\/td>\n<\/tr>\n<\/table>\n<p><\/p>\n<p>For this project you will be studying the p53 protein between many species and building a graph to visualize the relationship between them. This protein is known to be a tumor suppressor and is discussed here: <a href=\"http:\/\/www.uniprot.org\/uniprot\/P04637\" target=\"_blank\">http:\/\/www.uniprot.org\/uniprot\/P04637<\/a><\/p>\n<h2>#0<\/h2>\n<p> Obtain the <a href=\"http:\/\/limbo.switchlab.org\/content\/fasta-format\" target=\"_blank\">FASTA<\/a> formatted sequences for the p53 protein from at least 20 species. You can find them here: <a href=\"http:\/\/www.ncbi.nlm.nih.gov\/protein\/?term=p53\">http:\/\/www.ncbi.nlm.nih.gov\/protein\/?term=p53<\/a> and here: <a href=\"http:\/\/www.bioinformatics.org\/p53\/protein.html\">http:\/\/www.bioinformatics.org\/p53\/protein.html<\/a>. An example of the FASTA file for Homo sapiens is show below. We can ignore the FASTA header and just use the sequence that follows it.<\/p>\n<pre>&gt;gi|4731632|gb|AAD28535.1|AF135121_1 tumor suppressor protein p53 [Homo sapiens]\r\nMEEPQSDPSVEPPLSQETFSDLWKLLPENNVLSPLPSQAMDDLMLSPDDIEQWFTEDPGPDEAPRMPEAA\r\nPRVAPAPAAPTPAAPAPAPSWPLSSSVPSQKTYQGSYGFRLGFLHSGTAKSVTCTYSPALNKMFCQLAKT\r\nCPVQLWVDSTPPPGTRVRAMAIYKQSQHMTEVVRRCPHHERCSDSDGLAPPQHLIRVEGNLRVEYLDDRN\r\nTFRHSVVVPYEPPEVGSDCTTIHYNYMCNSSCMGGMNRRPILTIITLEDSSGNLLGRNSFEVRVCACPGR\r\nDRRTEKENLRKKGEPHHELPPGSTKRALPNNTSSSPQPKKKPLDGEYFTLQIRGRERFEMFRELNEALEL\r\nKDAQAGKEPGGSRAHSSHLKSKKGQSTSRHKKLMFKTEGPDSD<\/pre>\n<pre>&gt;gi|200203|gb|AAA39883.1| p53 [Mus musculus]\r\nMTAMEESQSDISLELPLSQETFSGLWKLLPPEDILPSPHCMDDLLLPQDVEEFFEGPSEALRVSGAPAAQ\r\nDPVTETPGPVAPAPATPWPLSSFVPSQKTYQGNYGFHLGFLQSGTAKSVMCTYSPPLNKLFFQLAKTCPV\r\nQLWVSATPPAGSRVRAMAIYKKSQHMTEVVRRCPHHERCSDGDGLAPPQHLIRVEGNLYPEYLEDRQTFR\r\nHSVVVPYEPPEAGSEYTTIHYKYMCNSSCMGGMNRRPILTIITLEDSSGNLLGRDSFEVRVCACPGRDRR\r\nTEEENFRKKEVLCPELPPGSAKRALPTCTSASPPQKKKPLDGEYFTLKIRGRKRFEMFRELNEALELKDA\r\nHATEESGDSRAHSSLQPRAFQALIKEESPNC\r\n<\/pre>\n<p>Because some amino acids can have similar shape and function the match and mismatch scores can vary. One set of substitution costs is known as BLOSUM. There are examples here: <a href=\"ftp:\/\/ftp.ncbi.nih.gov\/blast\/matrices\/\">ftp:\/\/ftp.ncbi.nih.gov\/blast\/matrices\/<\/a>. Commonly BLOSUM62 is used but your code must work with all of them. You will need to read in this format and use these match\/mismatch costs when calculating your edit distance. Your code will take in a filename such as &#8220;BLOSUM62&#8221; to read and process it. For a gap you will use the cost of aligning a character with the * column. In the BLOSUM62 file below the penalty for aligning an A with a gap is -4.<\/p>\n<p><code>Note:<\/code>The values are inverted from how we talked about edit distance. You will need to invert the sign of each value so that you can minimize the edit distance.<\/p>\n<pre>\r\n#  Matrix made by matblas from blosum62.iij\r\n#  * column uses minimum score\r\n#  BLOSUM Clustered Scoring Matrix in 1\/2 Bit Units\r\n#  Blocks Database = \/data\/blocks_5.0\/blocks.dat\r\n#  Cluster Percentage: >= 62\r\n#  Entropy =   0.6979, Expected =  -0.5209\r\n   A  R  N  D  C  Q  E  G  H  I  L  K  M  F  P  S  T  W  Y  V  B  Z  X  *\r\nA  4 -1 -2 -2  0 -1 -1  0 -2 -1 -1 -1 -1 -2 -1  1  0 -3 -2  0 -2 -1  0 -4 \r\nR -1  5  0 -2 -3  1  0 -2  0 -3 -2  2 -1 -3 -2 -1 -1 -3 -2 -3 -1  0 -1 -4 \r\nN -2  0  6  1 -3  0  0  0  1 -3 -3  0 -2 -3 -2  1  0 -4 -2 -3  3  0 -1 -4 \r\nD -2 -2  1  6 -3  0  2 -1 -1 -3 -4 -1 -3 -3 -1  0 -1 -4 -3 -3  4  1 -1 -4 \r\nC  0 -3 -3 -3  9 -3 -4 -3 -3 -1 -1 -3 -1 -2 -3 -1 -1 -2 -2 -1 -3 -3 -2 -4 \r\nQ -1  1  0  0 -3  5  2 -2  0 -3 -2  1  0 -3 -1  0 -1 -2 -1 -2  0  3 -1 -4 \r\nE -1  0  0  2 -4  2  5 -2  0 -3 -3  1 -2 -3 -1  0 -1 -3 -2 -2  1  4 -1 -4 \r\nG  0 -2  0 -1 -3 -2 -2  6 -2 -4 -4 -2 -3 -3 -2  0 -2 -2 -3 -3 -1 -2 -1 -4 \r\nH -2  0  1 -1 -3  0  0 -2  8 -3 -3 -1 -2 -1 -2 -1 -2 -2  2 -3  0  0 -1 -4 \r\nI -1 -3 -3 -3 -1 -3 -3 -4 -3  4  2 -3  1  0 -3 -2 -1 -3 -1  3 -3 -3 -1 -4 \r\nL -1 -2 -3 -4 -1 -2 -3 -4 -3  2  4 -2  2  0 -3 -2 -1 -2 -1  1 -4 -3 -1 -4 \r\nK -1  2  0 -1 -3  1  1 -2 -1 -3 -2  5 -1 -3 -1  0 -1 -3 -2 -2  0  1 -1 -4 \r\nM -1 -1 -2 -3 -1  0 -2 -3 -2  1  2 -1  5  0 -2 -1 -1 -1 -1  1 -3 -1 -1 -4 \r\nF -2 -3 -3 -3 -2 -3 -3 -3 -1  0  0 -3  0  6 -4 -2 -2  1  3 -1 -3 -3 -1 -4 \r\nP -1 -2 -2 -1 -3 -1 -1 -2 -2 -3 -3 -1 -2 -4  7 -1 -1 -4 -3 -2 -2 -1 -2 -4 \r\nS  1 -1  1  0 -1  0  0  0 -1 -2 -2  0 -1 -2 -1  4  1 -3 -2 -2  0  0  0 -4 \r\nT  0 -1  0 -1 -1 -1 -1 -2 -2 -1 -1 -1 -1 -2 -1  1  5 -2 -2  0 -1 -1  0 -4 \r\nW -3 -3 -4 -4 -2 -2 -3 -2 -2 -3 -2 -3 -1  1 -4 -3 -2 11  2 -3 -4 -3 -2 -4 \r\nY -2 -2 -2 -3 -2 -1 -2 -3  2 -1 -1 -2 -1  3 -3 -2 -2  2  7 -1 -3 -2 -1 -4 \r\nV  0 -3 -3 -3 -1 -2 -2 -3 -3  3  1 -2  1 -1 -2 -2  0 -3 -1  4 -3 -2 -1 -4 \r\nB -2 -1  3  4 -3  0  1 -1  0 -3 -4  0 -3 -3 -2  0 -1 -4 -3 -3  4  1 -1 -4 \r\nZ -1  0  0  1 -3  3  4 -2  0 -3 -3  1 -1 -3 -1  0 -1 -3 -2 -2  1  4 -1 -4 \r\nX  0 -1 -1 -1 -2 -1 -1 -1 -1 -1 -1 -1 -1 -1 -2  0  0 -2 -1 -1 -1 -1 -1 -4 \r\n* -4 -4 -4 -4 -4 -4 -4 -4 -4 -4 -4 -4 -4 -4 -4 -4 -4 -4 -4 -4 -4 -4 -4  1 \r\n<\/pre>\n<h2>#1<\/h2>\n<p> Compute the minimum edit distance between each species&#8217; p53 protein. Use the standard Needleman-Wunsch edit distance algorithm. Loop through the &gt;=20 proteins contained in a folder specified. If a cost matrix file is specified then use that file otherwise use a mismatch and gap penalty of both 1.<\/p>\n<p>Print out the alignment scores in a program ProteinCompare.java as follows:<\/p>\n<pre>$java ProteinCompare &lt;FOLDER WITH PROTEINS&gt; &lt;OPTIONAL COST MATRIX&gt;\r\nProtein1\tProtein1\t0\r\nProtein1\tProtein2\tcost\r\nProtein1\tProtein3\tcost\r\nProtein2\tProtein2\t0\r\n..\r\n<\/pre>\n<p>Here are the alignment scores of the two FASTA files shown in section 0 using the BIOSUM62 cost matrix:<\/p>\n<pre>\r\n$java ProteinCompare proteinfolder BLOSUM62\r\nHomosapiens.fa Homosapiens.fa -2119\r\nHomosapiens.fa Musmusculus.fa -1455\r\nMusmusculus.fa Homosapiens.fa -1455\r\nMusmusculus.fa Musmusculus.fa -2039\r\n\r\n$java ProteinCompare proteinfolder\r\nHomosapiens.fa\tHomosapiens.fa\t0\r\nHomosapiens.fa\tMusmusculus.fa\t107\r\nMusmusculus.fa\tHomosapiens.fa\t107\r\nMusmusculus.fa\tMusmusculus.fa\t0\r\n<\/pre>\n<h2>#2<\/h2>\n<p> Show the minimum alignment between two proteins as shown below. Take the two FASTA files in as input. If a cost matrix file is specified then use that file otherwise use a skip and gap penalty of both 1. This example uses an mismatch cost of 1 and a gap penalty of 2.<\/p>\n<pre>$java ShowAlignment p53-AAD28535-homo-sapiens.fa p53-Q95330-rabbit.fa &lt;OPTIONAL COST MATRIX&gt;\r\nCost of 61\r\nM M  0\r\nE E  0\r\nE E  0\r\nP S  1\r\nQ Q  0\r\n...\r\nT N  1\r\nE E  0\r\nD D  0\r\nP P  0\r\nG E  1\r\nP    2\r\nD    2\r\nE E  0\r\nA G  1\r\nP L  1\r\nR R  0\r\nM V  1\r\nP P  0\r\nE A  1\r\nA A  0\r\nA P  1\r\nP A  1\r\nR P  1\r\nV E  1\r\nA A  0\r\nP P  0\r\nA A  0\r\nP P  0\r\nA A  0\r\nA A  0\r\nP P  0\r\nT A  1\r\nP L  1\r\nA A  0\r\nA A  0\r\nP P  0\r\nA A  0\r\nP P  0\r\nA A  0\r\nP T  1\r\nS S  0\r\nW W  0\r\nP P  0\r\n...\r\n<\/pre>\n<p>Here are some small examples to make sure your alignment is working correctly. Here I am also printing the cost matrix and the path where the values came from. This is using the BLOSUM62 cost matrix. You won&#8217;t print these out though.<\/p>\n<pre>\r\nEEPQSDPSV\r\nMEEPQSDPSV\r\nDistance:\t-43\r\n  0,  4,  8, 12, 16, 20, 24, 28, 32, 36, 40,\r\n  4,  2, -1,  3,  7, 11, 15, 19, 23, 27, 31,\r\n  8,  6, -3, -6, -2,  2,  6, 10, 14, 18, 22,\r\n 12, 10,  1, -2,-13, -9, -5, -1,  3,  7, 11,\r\n 16, 12,  5, -1, -9,-18,-14,-10, -6, -2,  2,\r\n 20, 16,  9,  3, -5,-14,-22,-18,-14,-10, -6,\r\n 24, 20, 13,  7, -1,-10,-18,-28,-24,-20,-16,\r\n 28, 24, 17, 11,  0, -6,-14,-24,-35,-31,-27,\r\n 32, 28, 21, 15,  4, -2,-10,-20,-31,-39,-35,\r\n 36, 31, 25, 19,  8,  2, -6,-16,-27,-35,-43,\r\n  .,  -,  -,  -,  -,  -,  -,  -,  -,  -,  -,\r\n  |,  \\,  \\,  \\,  -,  -,  -,  -,  -,  -,  -,\r\n  |,  \\,  \\,  \\,  -,  -,  -,  -,  -,  -,  -,\r\n  |,  \\,  |,  \\,  \\,  -,  -,  -,  \\,  -,  -,\r\n  |,  \\,  |,  \\,  |,  \\,  -,  -,  -,  -,  -,\r\n  |,  |,  |,  |,  |,  |,  \\,  -,  -,  \\,  -,\r\n  |,  |,  |,  \\,  |,  |,  |,  \\,  -,  -,  -,\r\n  |,  |,  |,  |,  \\,  |,  |,  |,  \\,  -,  -,\r\n  |,  |,  |,  |,  |,  |,  \\,  |,  |,  \\,  -,\r\n  |,  \\,  |,  |,  |,  |,  |,  |,  |,  |,  \\,\r\n* M  4\r\nE E  -5\r\nE E  -5\r\nP P  -7\r\nQ Q  -5\r\nS S  -4\r\nD D  -6\r\nP P  -7\r\nS S  -4\r\nV V  -4\r\n\r\n\r\n\r\nEEPQSDPSV\r\nMTAMEESQSD\r\nDistance:\t4\r\n  0,  4,  8, 12, 16, 20, 24, 28, 32, 36, 40,\r\n  4,  2,  5,  9, 13, 11, 15, 19, 23, 27, 31,\r\n  8,  6,  3,  6, 10,  8,  6, 10, 14, 18, 22,\r\n 12, 10,  7,  4,  8, 11,  9,  7, 11, 15, 19,\r\n 16, 12, 11,  8,  4,  6,  9,  9,  2,  6, 10,\r\n 20, 16, 11, 10,  8,  4,  6,  5,  6, -2,  2,\r\n 24, 20, 15, 13, 12,  6,  2,  6,  5,  2, -8,\r\n 28, 24, 19, 16, 15, 10,  6,  3,  7,  6, -4,\r\n 32, 28, 23, 18, 17, 14, 10,  2,  3,  3,  0,\r\n 36, 31, 27, 22, 17, 18, 14,  6,  4,  5,  4,\r\n  .,  -,  -,  -,  -,  -,  -,  -,  -,  -,  -,\r\n  |,  \\,  \\,  \\,  -,  \\,  \\,  -,  -,  -,  -,\r\n  |,  \\,  \\,  \\,  -,  \\,  \\,  -,  -,  -,  -,\r\n  |,  \\,  \\,  \\,  \\,  \\,  \\,  \\,  \\,  \\,  \\,\r\n  |,  \\,  \\,  \\,  \\,  \\,  \\,  \\,  \\,  -,  -,\r\n  |,  |,  \\,  \\,  |,  \\,  \\,  \\,  |,  \\,  -,\r\n  |,  |,  |,  \\,  |,  \\,  \\,  \\,  \\,  |,  \\,\r\n  |,  |,  |,  \\,  \\,  |,  |,  \\,  \\,  \\,  |,\r\n  |,  |,  \\,  \\,  \\,  |,  \\,  \\,  \\,  \\,  |,\r\n  |,  \\,  |,  |,  \\,  |,  |,  |,  \\,  \\,  |,\r\n* M  4\r\n* T  4\r\n* A  4\r\n* M  4\r\nE E  -5\r\nE E  -5\r\nP S  1\r\nQ Q  -5\r\nS S  -4\r\nD D  -6\r\nP *  4\r\nS *  4\r\nV *  4\r\n\r\n\r\n\r\nTAMEESQSD\r\nMEEPQSDPSV\r\nDistance:\t-9\r\n  0,  4,  8, 12, 16, 20, 24, 28, 32, 36, 40,\r\n  4,  1,  5,  9, 13, 17, 19, 23, 27, 31, 35,\r\n  8,  5,  2,  6, 10, 14, 16, 20, 24, 26, 30,\r\n 12,  3,  6,  4,  8, 10, 14, 18, 22, 25, 25,\r\n 16,  7, -2,  1,  5,  6, 10, 12, 16, 20, 24,\r\n 20, 11,  2, -7, -3,  1,  5,  8, 12, 16, 20,\r\n 24, 15,  6, -3, -6, -3, -3,  1,  5,  8, 12,\r\n 28, 19, 10,  1, -2,-11, -7, -3,  1,  5,  9,\r\n 32, 23, 14,  5,  2, -7,-15,-11, -7, -3,  1,\r\n 36, 27, 18,  9,  6, -3,-11,-21,-17,-13, -9,\r\n  .,  -,  -,  -,  -,  -,  -,  -,  -,  -,  -,\r\n  |,  \\,  \\,  \\,  \\,  \\,  \\,  -,  -,  \\,  -,\r\n  |,  \\,  \\,  \\,  \\,  \\,  \\,  -,  \\,  \\,  -,\r\n  |,  \\,  |,  \\,  \\,  \\,  -,  -,  \\,  \\,  \\,\r\n  |,  |,  \\,  \\,  \\,  \\,  \\,  \\,  -,  -,  -,\r\n  |,  |,  \\,  \\,  -,  -,  -,  \\,  -,  \\,  -,\r\n  |,  |,  |,  |,  \\,  \\,  \\,  -,  -,  \\,  -,\r\n  |,  |,  |,  |,  \\,  \\,  -,  \\,  -,  \\,  -,\r\n  |,  |,  |,  |,  \\,  |,  \\,  -,  -,  \\,  -,\r\n  |,  |,  |,  |,  \\,  |,  |,  \\,  -,  -,  -,\r\nT *  4\r\nA *  4\r\nM M  -5\r\nE E  -5\r\nE E  -5\r\nS P  1\r\nQ Q  -5\r\nS S  -4\r\nD D  -6\r\n* P  4\r\n* S  4\r\n* V  4\r\n\r\n\r\n\r\nTAMEESQSD\r\nMTAMEESQSD\r\nDistance:\t-39\r\n  0,  4,  8, 12, 16, 20, 24, 28, 32, 36, 40,\r\n  4,  1, -1,  3,  7, 11, 15, 19, 23, 27, 31,\r\n  8,  5,  1, -5, -1,  3,  7, 11, 15, 19, 23,\r\n 12,  3,  5, -1,-10, -6, -2,  2,  6, 10, 14,\r\n 16,  7,  4,  3, -6,-15,-11, -7, -3,  1,  5,\r\n 20, 11,  8,  5, -2,-11,-20,-16,-12, -8, -4,\r\n 24, 15, 10,  7,  2, -7,-16,-24,-20,-16,-12,\r\n 28, 19, 14, 11,  6, -3,-12,-20,-29,-25,-21,\r\n 32, 23, 18, 13, 10,  1, -8,-16,-25,-33,-29,\r\n 36, 27, 22, 17, 14,  5, -4,-12,-21,-29,-39,\r\n  .,  -,  -,  -,  -,  -,  -,  -,  -,  -,  -,\r\n  |,  \\,  \\,  -,  -,  -,  -,  -,  -,  -,  -,\r\n  |,  \\,  \\,  \\,  -,  -,  -,  -,  -,  -,  -,\r\n  |,  \\,  |,  |,  \\,  -,  -,  -,  -,  -,  -,\r\n  |,  |,  \\,  |,  |,  \\,  \\,  -,  -,  -,  -,\r\n  |,  |,  \\,  \\,  |,  \\,  \\,  -,  -,  -,  -,\r\n  |,  |,  \\,  \\,  |,  |,  |,  \\,  -,  \\,  -,\r\n  |,  |,  |,  \\,  |,  |,  |,  |,  \\,  -,  -,\r\n  |,  |,  \\,  \\,  |,  |,  |,  \\,  |,  \\,  -,\r\n  |,  |,  |,  |,  |,  |,  |,  |,  |,  |,  \\,\r\n* M  4\r\nT T  -5\r\nA A  -4\r\nM M  -5\r\nE E  -5\r\nE E  -5\r\nS S  -4\r\nQ Q  -5\r\nS S  -4\r\nD D  -6\r\n<\/pre>\n<h2>#3<\/h2>\n<p>We are limited to comparing proteins because comparing the toplevel DNA sequences would take too much memory. Implement the Hirschberg sequence alignment algorithm which only requires O(n+m) space. You do not need to trace back the alignment. You only need to calculate the edit distance. It will output the same results as ProteinCompare. <\/p>\n<pre>\r\n$java ProteinCompareHirschberg &lt;FOLDER WITH PROTEINS&gt; &lt;OPTIONAL COST MATRIX&gt;\r\n...\r\n<\/pre>\n<h2>#4<\/h2>\n<p> Write a program Visualize.java to visualize the edit distances between proteins using the GraphStream library. Have every species be a node and add an edge between nodes at some threshold of similarity (maybe the mean minimum edit distance). Be sure to have the species name or file name visible for each node.<\/p>\n<p>Here are some GraphStream methods you may need.<\/p>\n<pre>\r\nedge.setAttribute(\"layout.weight\", 30); \/\/ set the visible length of the edge\r\nedge.setAttribute(\"ui.label\", \"Label of edge\"); \/\/ set the text to print on the edge\r\nnode.setAttribute(\"ui.label\", \"Label of node\");\r\n<\/pre>\n<p>Your code will run like this:<\/p>\n<pre>\r\n$java Visualize &lt;FOLDER WITH PROTEINS&gt; &lt;OPTIONAL COST MATRIX&gt;\r\n...\r\n<\/pre>\n<h2>Deliverables.<\/h2>\n<p>Submit your code (ProteinCompare.java, ShowAlignment.java, Visualize.java). In a memo.txt file discuss how varying the gap and mismatch penalty impacts the alignments. Also discuss what relationships you observed in the similarity graph.<\/p>\n<p>Visualize should run as follows using the compile.sh, run.sh, and getclasspath.sh scripts from project 1. You need to include the jars you use in the lib folder and also modify run.sh to call Visualize<\/p>\n<pre>$sh compile.sh # compiles files in src into classes folder\r\n$sh run.sh # sets the classpath to classes and all the jars in lib then calls Visualize\r\n<\/pre>\n<h2>Grading (total 25 points):<\/h2>\n<p>Due: 7\/28 @11:59pm.<\/p>\n<p>6 points: Part 1: ProteinCompare<br \/>\n6 points: Part 2: ShowAlignment<br \/>\n6 points: Part 3: Visualize<br \/>\n6 points: Part 4: ProteinCompareHirschberg<br \/>\n1 points: memo.txt and how easy the assignment is to grade<\/p>\n","protected":false},"excerpt":{"rendered":"<p>Luke Chen (CS310 Summer 2015) Hae Young Kim (CS310 Summer 2015) For this project you will be studying the p53 protein between many species and building a graph to visualize the relationship between them. This protein is known to be a tumor suppressor and is discussed here: http:\/\/www.uniprot.org\/uniprot\/P04637 #0 Obtain the FASTA formatted sequences for&#8230;  <a href=\"https:\/\/josephpcohen.com\/teaching\/cs310\/project3\/\" class=\"more-link\" title=\"Read Project 3 : Studying Proteins\">Read more &raquo;<\/a><\/p>\n","protected":false},"author":1,"featured_media":0,"comment_status":"closed","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[2,1],"tags":[],"_links":{"self":[{"href":"https:\/\/josephpcohen.com\/teaching\/cs310\/wp-json\/wp\/v2\/posts\/383"}],"collection":[{"href":"https:\/\/josephpcohen.com\/teaching\/cs310\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/josephpcohen.com\/teaching\/cs310\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/josephpcohen.com\/teaching\/cs310\/wp-json\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/josephpcohen.com\/teaching\/cs310\/wp-json\/wp\/v2\/comments?post=383"}],"version-history":[{"count":28,"href":"https:\/\/josephpcohen.com\/teaching\/cs310\/wp-json\/wp\/v2\/posts\/383\/revisions"}],"predecessor-version":[{"id":788,"href":"https:\/\/josephpcohen.com\/teaching\/cs310\/wp-json\/wp\/v2\/posts\/383\/revisions\/788"}],"wp:attachment":[{"href":"https:\/\/josephpcohen.com\/teaching\/cs310\/wp-json\/wp\/v2\/media?parent=383"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/josephpcohen.com\/teaching\/cs310\/wp-json\/wp\/v2\/categories?post=383"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/josephpcohen.com\/teaching\/cs310\/wp-json\/wp\/v2\/tags?post=383"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}