Saturday, October 20, 2007

Sample exam in Molecular Biology

  1. You have just isolated 0.03ng of DNA from a single human hair. You now want to amplify a single-copy nuclear gene. Do you have enough DNA for this? Explain your answer.


 

  1. You are doing PCR using Drosophila template. If you wanted 1000 template molecules, how much DNA will you need? Note: The Drosophila genome is about 150Mb and your product is derived from a single copy sequence.


 

  1. You want to make up 200ul of a PCR master mix that contains the following:

10U amplitaq gold    stock=5U/ul

1X buffer        stock=10X

2.0 mM Mg        stock=25mM

200uM dNTPs        stock=10mM of each dNTP

500uM each primer    stock=0.1nMoles/ul (20 mer)

10ng template        stock=5ug/50uls

Water to 200ul


 

How much of each ingredient do you add?


 

  1. To save money, you want to order the least amount of primer possible. The company says the smallest amount you can order is 1 nM. How much master mix could you make from this? Note: Assume that your mix will contain 200uM primer and that the primer is 20 bases long (a 20 mer).


 

  1. You want to make up 10 mls of a stock solution of 1M Tris (MW=121) and 0.1M EDTA (MW=360). The Tris is a solid and the EDTA comes from a 0.5M stock. How much of each ingredient would you need?


 

  1. a. What is 'in situ' PCR?
    1. Why would you want to do it?
    2. How do you keep the in situ PCR product from floating away?


 

  1. Explain what would happen to your sequencing traces if you did the following:
    1. Forgot to add the Exosap in the PCR product clean-up
    2. Forgot to add the EtOH to the sequencing reaction clean-up
    3. Forgot to denature the sample before loading on the gel.


 

  1. What would happen to your PCR product (as seen on an agarose gel) if you did the following:
  2. Misprogramed the thermocycler to do 60 rather than 30 cycles
  3. Forgot to include the 10 min initial 94C denaturing step when using amplitaq gold
  4. Added only one primer to the master mix rather than two.


 

  1. What would happen to your DNA prep if you did the following:
  2. Forgot to add the EtOH to the wash buffer
  3. Centrifuged at full speed rather than 1/2 speed during the wash steps
  4. Used 10mM Tris/1mM EDTA rather than water for the elution step


     

  5. Your RT-PCR reaction didn't work! That is, when you run a gel you see no band of the expected size. You do see a smear below the 50bp marker and you see a very faint band 500 bp larger than your expected RT-PCR product. Outline the troubleshooting steps you would take to get the RT-PCR reaction to work.


     

  6. a. Most thermocyclers have heated lids. Why?
    1. Would you ever not want a thermocycler with a heated lid? Explain your answer


 

  1. Explain how dUTP and UNG can help prevent contamination in a PCR reaction. What types of contamination is not prevented by this procedure?


 

  1. Explain the process of cycle sequencing with dRhodamine labeled dye terminators


 

  1. When is it a good idea to sequence bulk PCR product and when is it a good idea to sequence cloned PCR products?


 

15. Describe two different types of probes used for real time PCR.

Friday, October 19, 2007

My solutions to Homework 2

Question:
Somewhere out on the Internet is a database of restriction enzymes.
a. Where is it located? What is the URL for the database file that could be used with the GCG software?
Answer: Database of restriction enzymes is located at REBASE.
The URL for the site is http://rebase.neb.com/rebase/rebase.html
The URL for the database file that could be used with the GCG software is http://rebase.neb.com/rebase/link_gcg

b. What does a typical entry look like for the restriction enzyme file that is formatted for use with the MacVector program?
Answer: Rebase Format #19 is used with the MacVector program.
Each entry is composed of lines. Different types of lines with their own formats are used to represent data. Each line begins with a two character line code which indicates the type of information provided in the line. “//” acts as the delimiter between individual entries.

Each entry in the database contains the following fields:
ID enzyme name
ET enzyme type
OS microorganism name
PT prototype
RS recognition sequence, cut site
MS methylation site (type)
CR commercial sources for the restriction enzyme
CM commercial sources for the methylase
RN [count]
RA authors
RL jour, vol, pages, year, etc.
//
Example of a typical entry:
ID M.BamHII
ET M
OS Bacillus amyloliquefaciens H
PT BamHII
RS GGATCC, ?;
MS 5;
RN [1]
RA Connaughton J.E., Vanek P.G., Chirikjian J.G.;
RL J. Cell Biol. 107:535a-535a(1988).
//

c. How is the database (formatted for MacVector) organized?
Answer: The database is organized in the form of a flatfile. It is a text only database with no graphics. It is in Bairoch format. It contains an alphabetical listing of types I, II and III restriction enzymes as well as methylases in a format that is compatible with a wide range of data banks (PROSITE, ENZYME, SwissProt, EMBL,ECD, EPD, HAEMB). Each entry is composed of lines. Different types of lines with their own formats are used to represent data. Each line begins with a two character line code which indicates the type of information provided in the line. “//” acts as the delimiter between individual entries.

1. What is the delimiter between individual restriction enzyme entries? How does the computer (or you) know when the information from one restriction enzyme stops and another one starts?
Answer: The delimiter between individual restriction enzyme entries is “//”.

2. Is this format similar to the format used by any other database? Which one?
Answer: I compared data in MacVector format with data in DNA Strider format in REBASE. Though similar in the fact that this format also provides information about restriction enzymes, and that data is organized in FASTA format, there are a few differences also, such as separation of fields etc, number of fields etc.

MacVector
DNA Strider
Each entry has many more descriptive fields than DNA Strider – enzyme name, enzyme type, organism name, prototype, recognition sequence, cut site, methylation site and commercial sources.
Each entry only has two descriptive fields – enzyme name, recognition sequence with cleavage site. Individual fields are separated by a comma (,)
Individual entry is separated by “//”
Individual entries start on a new line
Flatfile format
Flatfile format

Then I compared the MacVector format in REBASE database with the GenBank database: Though the format was similar in that both databases had a common delimiter “//”, most of the other attributes were very different.
MacVector
GenBank
Each entry starts with the ID field.
Each entry starts with the locus field.
Each field is represented by two characters line code such as ID, OS etc.
Each field is represented by one or more descriptive words such as definition, locus etc.
Information is only available in FASTA format.
Information is available in a wide range of formats such as FASTA, XML, Graphics etc.
Individual entries are separated by “//”
Individual entries are separated by “//”
It contains information about the restriction site of the enzyme and does not contain any information about the amino acid or nucleotide sequence
This database contains information about the nucleotide sequence. If coding for an expressed protein, it also contains the translated information.

Literature Search Questions
1) Select a protein and find the entries for this protein in the GenBank DNA database, the SwissProt database, and the PDB Protein database. List the attributes or features that are common to the databases and those which are unique to each.
Answer: I looked up the databases for β sub-unit of human follicle stimulating hormone.
PDB results:
URL: http://www.pdb.org/pdb/explore.do?structureId=1FL7
SwissProt results:
URL: http://au.expasy.org/uniprot/P01225
GenBank results:
URL: http://www.ncbi.nlm.nih.gov/entrez/viewer.fcgi?db=nuccore&id=EF198021

GenBank
SwissProt
PDB
The identifying code for the protein is the locus or the accession number which is usually an eight digit alphanumerical.
The identifying code is called the primary accession number. It is usually a six digit alphanumerical.
The unique identifying code for the protein is small, usually a four digit alphanumerical.
Genbank contains the following information in each entry: Locus, definition, accession #, version, keywords, source,organism, references, gene, mRNA and CDS.
SwissProt contains the following information in each entry: entry name, primary accession number, information about the name and origin of the protein, references, links to cross-references and the amino acid sequence.
PDB contains the following information in each entry: title, references, history, experimental information, molecular description and information about the structure of the protein.
GenBank gives the DNA sequence of the protein. It also contains the translated sequence of the sequence, if it is expressed.
SwissProt gives the amino acid sequence of the protein. Other sequence information can be found by following the links.
PDB gives detailed structural (3D) information about the protein with images and figures to help visualize the molecule. It also contains the amino acid sequence.
GenBank does not provide links for cross-referencing.
SwissProt has many links which allow for easy cross-referencing.
PDB also allows cross referencing via “external links”.
GenBank is a much larger database than SwissProt or PDB.
Swissprot is not as big a database as GenBank, but is bigger than PDB.
PDB is a relatively small database as many proteins that were available at SwissProt and GenBank could not be found here.
Information can be displayed in a wide array of formats such as FASTA, GenBank, XML, Graphical etc.
Sequence information is available only in two formats: SwissProt and FASTA. There is no graphical representation of data.
Sequence information is available in FASTA format. However, there is also ample graphical representation of the data.



2) How many secreted proteins have been discovered in humans? Explain what database you used, and what keywords you used to do the search.
Answer: I performed this search in SRS. I initially searched multiple databases, but the results were redundant, so I repeated the search using only one dataset, the patent proteins dataset, since results would not be duplicated here, and also because most secreted proteins would be entered here.
URL: http://srs.ebi.ac.uk/srsbin/cgi-bin/wgetz
Database searched: Patent Proteins
Search Field 1: All text - Secreted
Search Field 2: Organism name - human
Result: 10,319

Homework 2 Questions

Homework

Curious about how to answer these questions? Want to see an example of a homework answer? See this page for an example homework answer.
Database Questions
Somewhere out on the Internet is a database of restriction enzymes.
a. Where is it located? What is the URL for the database file that could be used with the GCG software?
b. What does a typical entry look like for the restriction enzyme file that is formatted for use with the MacVector program?
c. How is the database (formatted for MacVector) organized?
1. What is the delimiter between individual restriction enzyme entries? How does the computer (or you) know when the information from one restriction enzyme stops and another one starts?
2. Is this format similar to the format used by any other database? Which one?
Literature Search Questions
1) Select a protein and find the entries for this protein in the GenBank DNA database, the SwissProt database, and the PDB Protein database. List the attributes or features that are common to the databases and those which are unique to each.
2) How many secreted proteins have been discovered in humans? Explain what database you used, and what keywords you used to do the search.

Bioinformatics Workshop 3

Literature Workshop

In this workshop, we will be exploring how to search for sequence information using various web sites.
Web Resources
NCBI
Visit the NCBI website at http://www.ncbi.nlm.nih.gov
You have different options for searching for sequence information by querying the sequence annotation.
ENTREZ
PUBMED
OMIM
and several other databases
They are all linked together with links into the sequence databases.
When you do a search, you have to first ask yourself these questions
What information are you looking for?
What database would have that information?
Can you restrict your search to certain fields?
Try searching for literature about human growth hormone.
What MeSH term should you be using for that molecule?
What Database should you be searching?

Often, the best way to find something is to first, do some searches, assign some limits, then view the "history" and combine some queries. You can also do this using the Preview/Index option.

PDB
What is the unique identifying code for a protein structure of Lysozyme? You will find lots of lysozymes. Just pick one.
http://www.rcsb.org

SRS at EBI
http://srs.ebi.ac.uk
Search for Human Growth Hormone using the SRS search program.
How does the SRS search program differ from the NCBI search program that you used today.

Bioinformatics Workshop 2

Sequence Database Workshop

Downloading files from the Internet using your Web browser
Start up your web browser and go to this URL where you can download files using ftp (File Transfer Protocol).

ftp://ftp.ncbi.nih.gov
Find the "gbrel.txt" file and look at it
Do not click on any other file. These are multi-Gigabyte database files and you don't want to download them.
This gbrel.txt file contains the release information for the GenBank database.
Pay attention to the
Size (number of sequences, number of nucleotides, number of species)
Divisions (The database is not a single file, but a collection of files)
In the next part of the workshop, we will be downloading data from sequence databases.
Data Conversion
1. Go to the NCBI Web site
http://www.ncbi.nlm.nih.gov
2. In the Nucleotide database, find the accession number, and download this sequence
Homo sapiens hemoglobin beta chain mRNA complete cds.
There are many hemoglobin sequences in the database. You need to find the specific one that has this description line.
Examine the sequence. Anything look strange for a mRNA sequence?
3. Convert sequences to FASTA format. Why do we need to do this?
4. Translate the RNA into Protein
translate
At what nucleotide should you start the translation?

5. Convert the protein back to RNA (reverse translation or back translation)
backtranslate
What Codon Preference Table should you use? Why do you even need a Codon Preference Table?
Did you get the same nucleotide sequence you started with?
We have software that can answer this question.
Using LALIGN, compare these two nucleotide sequences. We will discuss this program more in an upcoming lecture.

Bio-informatics Workshop 1

These series of workshops are those given to us in our bioinformatics class by Prof Lee Kozar, who is also the director of CMGM at Stanford!

Downloading files from the Internet using your Web browser
Start up your web browser and go to this URL

ftp://ftp.ncbi.nih.gov

Find the "gbrel.txt" file and look at it
This file contains the release information for the GenBank database.
Pay attention to the
Size (number of sequences, number of nucleotides, number of species)
Divisions (The database is not a single file, but a collection of files)
In the next class, we will analyze it more fully and learn how to download specific sequences.

Free software to read sequencing data

These links were given to us by Dr. B after completion of our sequencing experiments.

Chromas sequence viewing software:

chromas11-32.exe (118.426 Kb)

Chromas Lite:

chromaslite201.exe (215.945 Kb)

Link to get sequence scanner:

http://www.appliedbiosystems.com/support/software_community/free_ab_software.cfm