Friday, October 19, 2007

Bioinformatics Workshop 2

Sequence Database Workshop

Downloading files from the Internet using your Web browser
Start up your web browser and go to this URL where you can download files using ftp (File Transfer Protocol).

ftp://ftp.ncbi.nih.gov
Find the "gbrel.txt" file and look at it
Do not click on any other file. These are multi-Gigabyte database files and you don't want to download them.
This gbrel.txt file contains the release information for the GenBank database.
Pay attention to the
Size (number of sequences, number of nucleotides, number of species)
Divisions (The database is not a single file, but a collection of files)
In the next part of the workshop, we will be downloading data from sequence databases.
Data Conversion
1. Go to the NCBI Web site
http://www.ncbi.nlm.nih.gov
2. In the Nucleotide database, find the accession number, and download this sequence
Homo sapiens hemoglobin beta chain mRNA complete cds.
There are many hemoglobin sequences in the database. You need to find the specific one that has this description line.
Examine the sequence. Anything look strange for a mRNA sequence?
3. Convert sequences to FASTA format. Why do we need to do this?
4. Translate the RNA into Protein
translate
At what nucleotide should you start the translation?

5. Convert the protein back to RNA (reverse translation or back translation)
backtranslate
What Codon Preference Table should you use? Why do you even need a Codon Preference Table?
Did you get the same nucleotide sequence you started with?
We have software that can answer this question.
Using LALIGN, compare these two nucleotide sequences. We will discuss this program more in an upcoming lecture.

No comments: