{"id":383,"date":"2015-07-17T23:52:59","date_gmt":"2015-07-17T23:52:59","guid":{"rendered":"http:\/\/josephpcohen.com\/teaching\/cs310\/?p=383"},"modified":"2015-07-30T16:38:41","modified_gmt":"2015-07-30T16:38:41","slug":"project3","status":"publish","type":"post","link":"https:\/\/josephpcohen.com\/teaching\/cs310-summer2015\/project3\/","title":{"rendered":"Project 3 : Studying Proteins"},"content":{"rendered":"<p>For this project you will be studying the p53 protein between many species and building a graph to visualize the relationship between them. This protein is known to be a tumor suppressor and is discussed here: <a href=\"http:\/\/www.uniprot.org\/uniprot\/P04637\" target=\"_blank\">http:\/\/www.uniprot.org\/uniprot\/P04637<\/a><\/p>\n<p><strong>0.<\/strong> Obtain the <a href=\"http:\/\/limbo.switchlab.org\/content\/fasta-format\" target=\"_blank\">FASTA<\/a> formatted sequences for the p53 protein from at least 20 species. You can find them here: <a href=\"http:\/\/www.ncbi.nlm.nih.gov\/protein\/?term=p53\">http:\/\/www.ncbi.nlm.nih.gov\/protein\/?term=p53<\/a> and here: <a href=\"http:\/\/www.bioinformatics.org\/p53\/protein.html\">http:\/\/www.bioinformatics.org\/p53\/protein.html<\/a>. An example of the FASTA file for Homo sapiens is show below. We can ignore the FASTA header and just use the sequence that follows it.<\/p>\n<pre>&gt;gi|4731632|gb|AAD28535.1|AF135121_1 tumor suppressor protein p53 [Homo sapiens]\r\nMEEPQSDPSVEPPLSQETFSDLWKLLPENNVLSPLPSQAMDDLMLSPDDIEQWFTEDPGPDEAPRMPEAA\r\nPRVAPAPAAPTPAAPAPAPSWPLSSSVPSQKTYQGSYGFRLGFLHSGTAKSVTCTYSPALNKMFCQLAKT\r\nCPVQLWVDSTPPPGTRVRAMAIYKQSQHMTEVVRRCPHHERCSDSDGLAPPQHLIRVEGNLRVEYLDDRN\r\nTFRHSVVVPYEPPEVGSDCTTIHYNYMCNSSCMGGMNRRPILTIITLEDSSGNLLGRNSFEVRVCACPGR\r\nDRRTEKENLRKKGEPHHELPPGSTKRALPNNTSSSPQPKKKPLDGEYFTLQIRGRERFEMFRELNEALEL\r\nKDAQAGKEPGGSRAHSSHLKSKKGQSTSRHKKLMFKTEGPDSD<\/pre>\n<p><strong>1.<\/strong> Compute the minimum edit distance between each species&#8217; p53 protein. Use the Needleman-Wunsch implementation that you wrote for hw4. Loop through the &gt;=20 proteins you found. You can put them all in a folder and then loop through all files in that folder. Print out the alignments in a program ProteinCompare.java as follows:<\/p>\n<pre>$java ProteinCompare\r\nProtein1\tProtein1\t0\r\nProtein1\tProtein2\tcost\r\nProtein1\tProtein3\tcost\r\nProtein2\tProtein2\t0\r\n..\r\n<\/pre>\n<p><strong>2.<\/strong> Show the minimum alignment between two proteins as shown below. Take the two FASTA files in as input.<\/p>\n<pre>$java ShowAlignment p53-AAD28535-homo-sapiens.fa p53-Q95330-rabbit.fa\r\nCost of 61\r\nM M  0\r\nE E  0\r\nE E  0\r\nP S  1\r\nQ Q  0\r\n...\r\nT N  1\r\nE E  0\r\nD D  0\r\nP P  0\r\nG E  1\r\nP    2\r\nD    2\r\nE E  0\r\nA G  1\r\nP L  1\r\nR R  0\r\nM V  1\r\nP P  0\r\nE A  1\r\nA A  0\r\nA P  1\r\nP A  1\r\nR P  1\r\nV E  1\r\nA A  0\r\nP P  0\r\nA A  0\r\nP P  0\r\nA A  0\r\nA A  0\r\nP P  0\r\nT A  1\r\nP L  1\r\nA A  0\r\nA A  0\r\nP P  0\r\nA A  0\r\nP P  0\r\nA A  0\r\nP T  1\r\nS S  0\r\nW W  0\r\nP P  0\r\n...\r\n<\/pre>\n<p><strong>3.<\/strong> Write a program Visualize.java to visualize the results using the GraphStream library. Have every species be a node and add an edge between nodes at some threshold of similarity (maybe the mean minimum edit distance). Be sure to have the species name visible.<\/p>\n<h2>Deliverables.<\/h2>\n<p>Submit your code (ProteinCompare.java, ShowAlignment.java, Visualize.java). In a memo.txt file discuss how varying the gap and mismatch penalty impacts the alignments. Also discuss what relationships you observed in the similarity graph.<\/p>\n<p>Visualize should run as follows using the compile.sh, run.sh, and getclasspath.sh scripts from project 1. You need to include the jars you use in the lib folder and also modify run.sh to call Visualize<\/p>\n<pre>$sh compile.sh # compiles files in src into classes folder\r\n$sh run.sh # sets the classpath to classes and all the jars in lib then calls Visualize\r\n<\/pre>\n<h2>Grading (total 25 points):<\/h2>\n<p>Due: 7\/27 @11pm.<\/p>\n<p>5 points: Part 1: ProteinCompare<br \/>\n8 points: Part 2: ShowAlignment<br \/>\n8 points: Part 3: Visualize<br \/>\n4 points: memo.txt and how easy the assignment is to grade<\/p>\n<h2>Sample Graphs (made by students)<\/h2>\n<a href=\"http:\/\/josephpcohen.com\/teaching\/cs310\/wp-content\/uploads\/sites\/4\/ncadiz.png\"><img decoding=\"async\" loading=\"lazy\" class=\"aligncenter wp-image-419 size-large\" src=\"http:\/\/josephpcohen.com\/teaching\/cs310\/wp-content\/uploads\/sites\/4\/ncadiz-1024x797.png\" alt=\"ncadiz\" width=\"580\" height=\"451\" srcset=\"https:\/\/josephpcohen.com\/teaching\/cs310-summer2015\/wp-content\/uploads\/sites\/4\/ncadiz-1024x797.png 1024w, https:\/\/josephpcohen.com\/teaching\/cs310-summer2015\/wp-content\/uploads\/sites\/4\/ncadiz-300x234.png 300w, https:\/\/josephpcohen.com\/teaching\/cs310-summer2015\/wp-content\/uploads\/sites\/4\/ncadiz.png 1598w\" sizes=\"(max-width: 580px) 100vw, 580px\" \/><\/a>\n<pre>ncadiz (CS310 Summer 2015)<\/pre>\n<a href=\"http:\/\/josephpcohen.com\/teaching\/cs310\/wp-content\/uploads\/sites\/4\/okhan.png\"><img decoding=\"async\" loading=\"lazy\" class=\"aligncenter wp-image-420 size-large\" src=\"http:\/\/josephpcohen.com\/teaching\/cs310\/wp-content\/uploads\/sites\/4\/okhan-1024x748.png\" alt=\"okhan\" width=\"580\" height=\"424\" srcset=\"https:\/\/josephpcohen.com\/teaching\/cs310-summer2015\/wp-content\/uploads\/sites\/4\/okhan-1024x748.png 1024w, https:\/\/josephpcohen.com\/teaching\/cs310-summer2015\/wp-content\/uploads\/sites\/4\/okhan-300x219.png 300w, https:\/\/josephpcohen.com\/teaching\/cs310-summer2015\/wp-content\/uploads\/sites\/4\/okhan.png 1516w\" sizes=\"(max-width: 580px) 100vw, 580px\" \/><\/a>\n<pre> okhan (CS310 Summer 2015)\r\nKey:\r\n(Mean Percentile goes by the Colors of the Rainbow! Think ROYGBIV)\r\n  -  00-01%  of Mean  =  Red\r\n  -  02-20%  of Mean  =  Orange\r\n  -  21-45%  of Mean  =  Yellow\r\n  -  46-60%  of Mean  =  Green\r\n  -  61-100% of Mean  =  Blue\r\n  -  Outside of Mean  =  White (Omitted)\r\n\r\nMean: 229<\/pre>\n<a href=\"http:\/\/josephpcohen.com\/teaching\/cs310\/wp-content\/uploads\/sites\/4\/lchen.png\"><img decoding=\"async\" loading=\"lazy\" class=\"aligncenter wp-image-422 size-large\" src=\"http:\/\/josephpcohen.com\/teaching\/cs310\/wp-content\/uploads\/sites\/4\/lchen-1024x779.png\" alt=\"lchen\" width=\"580\" height=\"441\" srcset=\"https:\/\/josephpcohen.com\/teaching\/cs310-summer2015\/wp-content\/uploads\/sites\/4\/lchen-1024x779.png 1024w, https:\/\/josephpcohen.com\/teaching\/cs310-summer2015\/wp-content\/uploads\/sites\/4\/lchen-300x228.png 300w, https:\/\/josephpcohen.com\/teaching\/cs310-summer2015\/wp-content\/uploads\/sites\/4\/lchen.png 1754w\" sizes=\"(max-width: 580px) 100vw, 580px\" \/><\/a>\n<pre>lchen (CS310 Summer 2015)\r\n\r\n[Xenopus laevis] african-clawed-frog.fasta\r\n[Delphinapterus leucas] beluga-whale.fasta\r\n[Bos primigenius] bos-primigenius.fasta\r\n[Bos taurus] cattle.fasta\r\n[Cricetulus griseus] chinese-hamster.fasta\r\n[Macaca fascicularis] crab-eating-macaque.fasta\r\n[Felis catus] domestic-cat.fasta\r\n[Platichthys flesus] european-flounder.fasta\r\n[Mus musculus] house-mouse.fasta\r\n[Homo sapiens] human.fasta\r\n[Macaca fuscata] Japanese-macaque.fasta\r\n[Oryzias latipes] japanese-medaka.fasta\r\n[Meriones unguiculatus] Mongolian-gerbil.fasta\r\n[Cynops orientalis] oriental-fire-bellied-newt.fasta\r\n[Eospalax baileyi] plateau-zokor.fasta\r\n[P53_RABIT] rabbit.fasta\r\n[Strongylocentrotus purpuratus] purple-sea-urchin.fasta\r\n[Macaca mulatta] rhesus-monkey.fasta\r\n[Microtus oeconomus] Root-vole.fasta\r\n[Ovis aries] sheep.fasta\r\n[Bubalus bubalis] water-buffalo.fasta<\/pre>\n<a href=\"http:\/\/josephpcohen.com\/teaching\/cs310\/wp-content\/uploads\/sites\/4\/ble002.png\"><img decoding=\"async\" loading=\"lazy\" class=\"aligncenter wp-image-424 size-large\" src=\"http:\/\/josephpcohen.com\/teaching\/cs310\/wp-content\/uploads\/sites\/4\/ble002-1024x898.png\" alt=\"ble002\" width=\"580\" height=\"509\" srcset=\"https:\/\/josephpcohen.com\/teaching\/cs310-summer2015\/wp-content\/uploads\/sites\/4\/ble002-1024x898.png 1024w, https:\/\/josephpcohen.com\/teaching\/cs310-summer2015\/wp-content\/uploads\/sites\/4\/ble002-300x263.png 300w, https:\/\/josephpcohen.com\/teaching\/cs310-summer2015\/wp-content\/uploads\/sites\/4\/ble002.png 1576w\" sizes=\"(max-width: 580px) 100vw, 580px\" \/><\/a>\n<pre>ble002 (CS310 Summer 2015)<\/pre>\n<a href=\"http:\/\/josephpcohen.com\/teaching\/cs310\/wp-content\/uploads\/sites\/4\/hkim.png\"><img decoding=\"async\" loading=\"lazy\" class=\"aligncenter wp-image-426 size-large\" src=\"http:\/\/josephpcohen.com\/teaching\/cs310\/wp-content\/uploads\/sites\/4\/hkim-1024x725.png\" alt=\"hkim\" width=\"580\" height=\"411\" srcset=\"https:\/\/josephpcohen.com\/teaching\/cs310-summer2015\/wp-content\/uploads\/sites\/4\/hkim-1024x725.png 1024w, https:\/\/josephpcohen.com\/teaching\/cs310-summer2015\/wp-content\/uploads\/sites\/4\/hkim-300x212.png 300w, https:\/\/josephpcohen.com\/teaching\/cs310-summer2015\/wp-content\/uploads\/sites\/4\/hkim.png 1534w\" sizes=\"(max-width: 580px) 100vw, 580px\" \/><\/a>\n<pre>hkim (CS310 Summer 2015)<\/pre>\n","protected":false},"excerpt":{"rendered":"<p>For this project you will be studying the p53 protein between many species and building a graph to visualize the relationship between them. This protein is known to be a tumor suppressor and is discussed here: http:\/\/www.uniprot.org\/uniprot\/P04637 0. Obtain the FASTA formatted sequences for the p53 protein from at least 20 species. You can find&#8230;  <a href=\"https:\/\/josephpcohen.com\/teaching\/cs310-summer2015\/project3\/\" class=\"more-link\" title=\"Read Project 3 : Studying Proteins\">Read more &raquo;<\/a><\/p>\n","protected":false},"author":1,"featured_media":0,"comment_status":"closed","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[1],"tags":[],"_links":{"self":[{"href":"https:\/\/josephpcohen.com\/teaching\/cs310-summer2015\/wp-json\/wp\/v2\/posts\/383"}],"collection":[{"href":"https:\/\/josephpcohen.com\/teaching\/cs310-summer2015\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/josephpcohen.com\/teaching\/cs310-summer2015\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/josephpcohen.com\/teaching\/cs310-summer2015\/wp-json\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/josephpcohen.com\/teaching\/cs310-summer2015\/wp-json\/wp\/v2\/comments?post=383"}],"version-history":[{"count":27,"href":"https:\/\/josephpcohen.com\/teaching\/cs310-summer2015\/wp-json\/wp\/v2\/posts\/383\/revisions"}],"predecessor-version":[{"id":430,"href":"https:\/\/josephpcohen.com\/teaching\/cs310-summer2015\/wp-json\/wp\/v2\/posts\/383\/revisions\/430"}],"wp:attachment":[{"href":"https:\/\/josephpcohen.com\/teaching\/cs310-summer2015\/wp-json\/wp\/v2\/media?parent=383"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/josephpcohen.com\/teaching\/cs310-summer2015\/wp-json\/wp\/v2\/categories?post=383"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/josephpcohen.com\/teaching\/cs310-summer2015\/wp-json\/wp\/v2\/tags?post=383"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}