Saturday, October 20, 2007

Sample exam in Molecular Biology

  1. You have just isolated 0.03ng of DNA from a single human hair. You now want to amplify a single-copy nuclear gene. Do you have enough DNA for this? Explain your answer.


 

  1. You are doing PCR using Drosophila template. If you wanted 1000 template molecules, how much DNA will you need? Note: The Drosophila genome is about 150Mb and your product is derived from a single copy sequence.


 

  1. You want to make up 200ul of a PCR master mix that contains the following:

10U amplitaq gold    stock=5U/ul

1X buffer        stock=10X

2.0 mM Mg        stock=25mM

200uM dNTPs        stock=10mM of each dNTP

500uM each primer    stock=0.1nMoles/ul (20 mer)

10ng template        stock=5ug/50uls

Water to 200ul


 

How much of each ingredient do you add?


 

  1. To save money, you want to order the least amount of primer possible. The company says the smallest amount you can order is 1 nM. How much master mix could you make from this? Note: Assume that your mix will contain 200uM primer and that the primer is 20 bases long (a 20 mer).


 

  1. You want to make up 10 mls of a stock solution of 1M Tris (MW=121) and 0.1M EDTA (MW=360). The Tris is a solid and the EDTA comes from a 0.5M stock. How much of each ingredient would you need?


 

  1. a. What is 'in situ' PCR?
    1. Why would you want to do it?
    2. How do you keep the in situ PCR product from floating away?


 

  1. Explain what would happen to your sequencing traces if you did the following:
    1. Forgot to add the Exosap in the PCR product clean-up
    2. Forgot to add the EtOH to the sequencing reaction clean-up
    3. Forgot to denature the sample before loading on the gel.


 

  1. What would happen to your PCR product (as seen on an agarose gel) if you did the following:
  2. Misprogramed the thermocycler to do 60 rather than 30 cycles
  3. Forgot to include the 10 min initial 94C denaturing step when using amplitaq gold
  4. Added only one primer to the master mix rather than two.


 

  1. What would happen to your DNA prep if you did the following:
  2. Forgot to add the EtOH to the wash buffer
  3. Centrifuged at full speed rather than 1/2 speed during the wash steps
  4. Used 10mM Tris/1mM EDTA rather than water for the elution step


     

  5. Your RT-PCR reaction didn't work! That is, when you run a gel you see no band of the expected size. You do see a smear below the 50bp marker and you see a very faint band 500 bp larger than your expected RT-PCR product. Outline the troubleshooting steps you would take to get the RT-PCR reaction to work.


     

  6. a. Most thermocyclers have heated lids. Why?
    1. Would you ever not want a thermocycler with a heated lid? Explain your answer


 

  1. Explain how dUTP and UNG can help prevent contamination in a PCR reaction. What types of contamination is not prevented by this procedure?


 

  1. Explain the process of cycle sequencing with dRhodamine labeled dye terminators


 

  1. When is it a good idea to sequence bulk PCR product and when is it a good idea to sequence cloned PCR products?


 

15. Describe two different types of probes used for real time PCR.

Friday, October 19, 2007

My solutions to Homework 2

Question:
Somewhere out on the Internet is a database of restriction enzymes.
a. Where is it located? What is the URL for the database file that could be used with the GCG software?
Answer: Database of restriction enzymes is located at REBASE.
The URL for the site is http://rebase.neb.com/rebase/rebase.html
The URL for the database file that could be used with the GCG software is http://rebase.neb.com/rebase/link_gcg

b. What does a typical entry look like for the restriction enzyme file that is formatted for use with the MacVector program?
Answer: Rebase Format #19 is used with the MacVector program.
Each entry is composed of lines. Different types of lines with their own formats are used to represent data. Each line begins with a two character line code which indicates the type of information provided in the line. “//” acts as the delimiter between individual entries.

Each entry in the database contains the following fields:
ID enzyme name
ET enzyme type
OS microorganism name
PT prototype
RS recognition sequence, cut site
MS methylation site (type)
CR commercial sources for the restriction enzyme
CM commercial sources for the methylase
RN [count]
RA authors
RL jour, vol, pages, year, etc.
//
Example of a typical entry:
ID M.BamHII
ET M
OS Bacillus amyloliquefaciens H
PT BamHII
RS GGATCC, ?;
MS 5;
RN [1]
RA Connaughton J.E., Vanek P.G., Chirikjian J.G.;
RL J. Cell Biol. 107:535a-535a(1988).
//

c. How is the database (formatted for MacVector) organized?
Answer: The database is organized in the form of a flatfile. It is a text only database with no graphics. It is in Bairoch format. It contains an alphabetical listing of types I, II and III restriction enzymes as well as methylases in a format that is compatible with a wide range of data banks (PROSITE, ENZYME, SwissProt, EMBL,ECD, EPD, HAEMB). Each entry is composed of lines. Different types of lines with their own formats are used to represent data. Each line begins with a two character line code which indicates the type of information provided in the line. “//” acts as the delimiter between individual entries.

1. What is the delimiter between individual restriction enzyme entries? How does the computer (or you) know when the information from one restriction enzyme stops and another one starts?
Answer: The delimiter between individual restriction enzyme entries is “//”.

2. Is this format similar to the format used by any other database? Which one?
Answer: I compared data in MacVector format with data in DNA Strider format in REBASE. Though similar in the fact that this format also provides information about restriction enzymes, and that data is organized in FASTA format, there are a few differences also, such as separation of fields etc, number of fields etc.

MacVector
DNA Strider
Each entry has many more descriptive fields than DNA Strider – enzyme name, enzyme type, organism name, prototype, recognition sequence, cut site, methylation site and commercial sources.
Each entry only has two descriptive fields – enzyme name, recognition sequence with cleavage site. Individual fields are separated by a comma (,)
Individual entry is separated by “//”
Individual entries start on a new line
Flatfile format
Flatfile format

Then I compared the MacVector format in REBASE database with the GenBank database: Though the format was similar in that both databases had a common delimiter “//”, most of the other attributes were very different.
MacVector
GenBank
Each entry starts with the ID field.
Each entry starts with the locus field.
Each field is represented by two characters line code such as ID, OS etc.
Each field is represented by one or more descriptive words such as definition, locus etc.
Information is only available in FASTA format.
Information is available in a wide range of formats such as FASTA, XML, Graphics etc.
Individual entries are separated by “//”
Individual entries are separated by “//”
It contains information about the restriction site of the enzyme and does not contain any information about the amino acid or nucleotide sequence
This database contains information about the nucleotide sequence. If coding for an expressed protein, it also contains the translated information.

Literature Search Questions
1) Select a protein and find the entries for this protein in the GenBank DNA database, the SwissProt database, and the PDB Protein database. List the attributes or features that are common to the databases and those which are unique to each.
Answer: I looked up the databases for β sub-unit of human follicle stimulating hormone.
PDB results:
URL: http://www.pdb.org/pdb/explore.do?structureId=1FL7
SwissProt results:
URL: http://au.expasy.org/uniprot/P01225
GenBank results:
URL: http://www.ncbi.nlm.nih.gov/entrez/viewer.fcgi?db=nuccore&id=EF198021

GenBank
SwissProt
PDB
The identifying code for the protein is the locus or the accession number which is usually an eight digit alphanumerical.
The identifying code is called the primary accession number. It is usually a six digit alphanumerical.
The unique identifying code for the protein is small, usually a four digit alphanumerical.
Genbank contains the following information in each entry: Locus, definition, accession #, version, keywords, source,organism, references, gene, mRNA and CDS.
SwissProt contains the following information in each entry: entry name, primary accession number, information about the name and origin of the protein, references, links to cross-references and the amino acid sequence.
PDB contains the following information in each entry: title, references, history, experimental information, molecular description and information about the structure of the protein.
GenBank gives the DNA sequence of the protein. It also contains the translated sequence of the sequence, if it is expressed.
SwissProt gives the amino acid sequence of the protein. Other sequence information can be found by following the links.
PDB gives detailed structural (3D) information about the protein with images and figures to help visualize the molecule. It also contains the amino acid sequence.
GenBank does not provide links for cross-referencing.
SwissProt has many links which allow for easy cross-referencing.
PDB also allows cross referencing via “external links”.
GenBank is a much larger database than SwissProt or PDB.
Swissprot is not as big a database as GenBank, but is bigger than PDB.
PDB is a relatively small database as many proteins that were available at SwissProt and GenBank could not be found here.
Information can be displayed in a wide array of formats such as FASTA, GenBank, XML, Graphical etc.
Sequence information is available only in two formats: SwissProt and FASTA. There is no graphical representation of data.
Sequence information is available in FASTA format. However, there is also ample graphical representation of the data.



2) How many secreted proteins have been discovered in humans? Explain what database you used, and what keywords you used to do the search.
Answer: I performed this search in SRS. I initially searched multiple databases, but the results were redundant, so I repeated the search using only one dataset, the patent proteins dataset, since results would not be duplicated here, and also because most secreted proteins would be entered here.
URL: http://srs.ebi.ac.uk/srsbin/cgi-bin/wgetz
Database searched: Patent Proteins
Search Field 1: All text - Secreted
Search Field 2: Organism name - human
Result: 10,319

Homework 2 Questions

Homework

Curious about how to answer these questions? Want to see an example of a homework answer? See this page for an example homework answer.
Database Questions
Somewhere out on the Internet is a database of restriction enzymes.
a. Where is it located? What is the URL for the database file that could be used with the GCG software?
b. What does a typical entry look like for the restriction enzyme file that is formatted for use with the MacVector program?
c. How is the database (formatted for MacVector) organized?
1. What is the delimiter between individual restriction enzyme entries? How does the computer (or you) know when the information from one restriction enzyme stops and another one starts?
2. Is this format similar to the format used by any other database? Which one?
Literature Search Questions
1) Select a protein and find the entries for this protein in the GenBank DNA database, the SwissProt database, and the PDB Protein database. List the attributes or features that are common to the databases and those which are unique to each.
2) How many secreted proteins have been discovered in humans? Explain what database you used, and what keywords you used to do the search.

Bioinformatics Workshop 3

Literature Workshop

In this workshop, we will be exploring how to search for sequence information using various web sites.
Web Resources
NCBI
Visit the NCBI website at http://www.ncbi.nlm.nih.gov
You have different options for searching for sequence information by querying the sequence annotation.
ENTREZ
PUBMED
OMIM
and several other databases
They are all linked together with links into the sequence databases.
When you do a search, you have to first ask yourself these questions
What information are you looking for?
What database would have that information?
Can you restrict your search to certain fields?
Try searching for literature about human growth hormone.
What MeSH term should you be using for that molecule?
What Database should you be searching?

Often, the best way to find something is to first, do some searches, assign some limits, then view the "history" and combine some queries. You can also do this using the Preview/Index option.

PDB
What is the unique identifying code for a protein structure of Lysozyme? You will find lots of lysozymes. Just pick one.
http://www.rcsb.org

SRS at EBI
http://srs.ebi.ac.uk
Search for Human Growth Hormone using the SRS search program.
How does the SRS search program differ from the NCBI search program that you used today.

Bioinformatics Workshop 2

Sequence Database Workshop

Downloading files from the Internet using your Web browser
Start up your web browser and go to this URL where you can download files using ftp (File Transfer Protocol).

ftp://ftp.ncbi.nih.gov
Find the "gbrel.txt" file and look at it
Do not click on any other file. These are multi-Gigabyte database files and you don't want to download them.
This gbrel.txt file contains the release information for the GenBank database.
Pay attention to the
Size (number of sequences, number of nucleotides, number of species)
Divisions (The database is not a single file, but a collection of files)
In the next part of the workshop, we will be downloading data from sequence databases.
Data Conversion
1. Go to the NCBI Web site
http://www.ncbi.nlm.nih.gov
2. In the Nucleotide database, find the accession number, and download this sequence
Homo sapiens hemoglobin beta chain mRNA complete cds.
There are many hemoglobin sequences in the database. You need to find the specific one that has this description line.
Examine the sequence. Anything look strange for a mRNA sequence?
3. Convert sequences to FASTA format. Why do we need to do this?
4. Translate the RNA into Protein
translate
At what nucleotide should you start the translation?

5. Convert the protein back to RNA (reverse translation or back translation)
backtranslate
What Codon Preference Table should you use? Why do you even need a Codon Preference Table?
Did you get the same nucleotide sequence you started with?
We have software that can answer this question.
Using LALIGN, compare these two nucleotide sequences. We will discuss this program more in an upcoming lecture.

Bio-informatics Workshop 1

These series of workshops are those given to us in our bioinformatics class by Prof Lee Kozar, who is also the director of CMGM at Stanford!

Downloading files from the Internet using your Web browser
Start up your web browser and go to this URL

ftp://ftp.ncbi.nih.gov

Find the "gbrel.txt" file and look at it
This file contains the release information for the GenBank database.
Pay attention to the
Size (number of sequences, number of nucleotides, number of species)
Divisions (The database is not a single file, but a collection of files)
In the next class, we will analyze it more fully and learn how to download specific sequences.

Free software to read sequencing data

These links were given to us by Dr. B after completion of our sequencing experiments.

Chromas sequence viewing software:

chromas11-32.exe (118.426 Kb)

Chromas Lite:

chromaslite201.exe (215.945 Kb)

Link to get sequence scanner:

http://www.appliedbiosystems.com/support/software_community/free_ab_software.cfm

An example of a good summary and critical analysis

This was posted by Dr Claudia Stone as a template for us to follow.

Lammich et al. (2004) Expression of the Alzheimer protease BACE1 is suppressed via its 50-untranslated region.

1) Summary:
· Specific questions
o The entire study focused around whether the 5’-untranslated region of BACE1 mRNA was responsible for translational repression of the BACE1 protein, and specifically what characteristics of the 5’UTR are responsible for the observed repression.
Theoretical context
It is assumed that the cause of Alzheimer’s disease is linked to the collection of the amyloid β-peptide (Aβ).
This is created by the actions of two proteases on the membrane amyloid precursor protein.
γ- secretase
β- secretase
Also known as BACE1
Previous studies.
Vassar, 2002
Mice with targeted deletion of BACE1 do not produce any Aβ.
These mice show no any overt phenotype making BACE1 an ideal drug target.
Fukumoto et al, 2002; Holsinger et al, 2002; Yang et al, 2003
BACE1 protein levels are significantly upregulated in the brains of AD patients compared with non-AD controls.
Yasojima et al, 2001; Holsinger et al, 2002; Preece et al, 2003
These increased BACE1 levels corresponded to unchanged mRNA levels.
Suggests that post-transcriptional mechanisms are at play.
Why pinpoint the 5’UTR as the key?
The structure of this segment is:
446 nucleotides long
GC content of 77%
Three uORFs
All of these characteristics are assumed to be important for the inhibition of translation.
Importance of these questions
Determining mechanisms for the regulation of the BACE1 protein, especially at the translational level, would give ideal targets for future therapy in the progression and prevention of Alzheimer’s disease.
Key experiments with results
Determined whether the 5’ UTR may affect the expression of BACE1.
Expression vectors encoding the ORF of BACE1 alone, with the 5’UTR, or with the 3’UTR were transiently transfected into human embryonic kidney HEK293 cells.
Detection done via immunoblotting of the cell lysate.
Showed the presence of the 5’UTR greatly reduced BACE1 protein levels.
Determined whether the BAE1 5’ UTR could inhibit the expression of a similar downstream open reading frame different from BACE1.
Vectors encoding luciferase with or without the 5’UTR, or an empty control vector, were expressed in HEK293 cells.
Luciferase activity was measured in cell lysates.
Luciferase activity was greatly reduced in cells containing the 5’UTR of BACE1.
Proved that the 5’UTR lowered BACE1 protein levels by selectively reducing the translation of BACE1.
Vectors encoding BACE1, with or without the 5’UTR or empty control vector, were expressed in HEK293 cells.
BACE1 protein was measured by immunoblotting cell lysates.
mRNA levels were measured via northern blotting.
The presence of the 5’UTR had no significant effect on mRNA levels, while simultaneously showing lowered BACE1 protein levels.
Additionally demonstrated that the 5’UTR represses the expression of BACE1 at the translational level.
In vitro-transcribed BACE1 mRNA, with and without the 5’UTR, were translated in a nuclease-treated rabbit reticulocyte lysate.
BACE1 protein was detected without the 5’UTR, however was not observed when using the 5’UTR.
Determined whether the high GC content of the long 5’UTR is sufficient for repressing BACE1 expression or whether the uORFs and their encoded short peptides are required
Mutated the start codon of three uORFs from ATG to ATA
Mutated BACE1 plasmids were transfected into HEK293-APP cells.
BACE1 protein levels were measured in the cell lysate by immunoblotting.
Combined mutations of upstream ATGs showed a slight but significant increase in BACE1 expression compared with single mutations
Reveals that the uORFs account for only partial repression of BACE1 expression
Investigated the effect of several deletion mutants of the 5’UTR of BACE 1 with lowered GC content.
Mutations occurred either at nucleotides 1-223, 224-446, or 1-390 and the expression of BACE1 protein was measured.
Showed that both the 5’- and the 3’half of the UTR have a strong inhibitory effect, with it being more pronounced at the 5’ end.

2) Critical Analysis

The article is a systematic progression of the author’s ideas and logic in the development of the study. The reader is given a clear background discussion to serve as the introduction to the topic and why the author has chosen to formulate such a study. Each experiment is explained in the results and discussion section, and reasoning is given unto why the next experiment should be carried out. Additionally, each experiment is explained concisely, with the necessary specifics laid out in the methods section, allowing the reader to follow the thought process of the author and constantly anticipate the next direction of the study.

There is more than enough evidence supporting the author’s claim that the 5’UTR is responsible for repressing the translation of BACE1 without the need to repress transcription. Most of the experiments were designed to test this hypothesis explicitly, and even when it had been proven with an experiment, the study goes one step further by confirming it with an additional experiment.

The study loses the flow of direct evidence during the discussion that the “GC-rich region of the 5’UTR forms a constitutive transition barrier, which may prevent the ribosome from efficiently translating the BACE1 mRNA.” The author states that the 5’UTR repression is functioning because of either the high GC content, or because of the uORFs. An experiment is conducted that refutes the idea of the uORFs, however the author immediately states that it must because of the GC content creating a tightly folded secondary structure. Although computer modeling (MFOLD program) of the 5’UTR shows that it’s free energy is sufficient for inhibiting translation, no subsequent testing of this ribosome blocking theory is carried out. An additional experiment is carried out which shows that substituting certain regions of the GC-rich sections of the 5’UTR does indeed increase expression of the BACE1 protein. My belief is that the author is using the previous studies of Wood et al, 1996 and Clemens & Bommer, 1999 as support for the ribosome assumption, but without direct reference.

I agree with the author that these studies are important. The mere fact that there are over 24 million cases of dementia worldwide with around 60% due to AD, shows that identifying a specific mechanism and target for therapy could benefit many individuals. Because the specific, and probably varied, cause of the disease remains undiscovered, the ability to block a mechanism this far downstream would negate may factors that reside earlier, such as at the chromosomal level. Thus a treatment designed at this point could be applied to many patients regardless of disease origin.