500 Women Scientists Eugene would like to thank the organizations that helped make this event possible. First and foremost, First National Taphouse in Eugene, who shared their wonderful space with us and where we will be putting on future Salons, and donated a keg to the event! We are also extremely grateful to several organizations which contributed raffle items for us to raise additional funds, including Broadway Metro, Sizzle Pie, and the Eugene Science Center. Our beautiful logo was crafted by Cassie Cook, our amazing event posters were designed by Serena Lim, and photographer Danielle Cosme took some incredible event photos. Fertilab generously lent us a sound system, the Biology and the Built Environment Center donated the bacterial culture supplies, and both Theresa Cheng and Jessica Flannery provided materials and support for the interactive portion of the event. And of course, we want to acknowledge the national leadership of 500 Women Scientists, who brought us together, gave us a voice, and who suggested these Science Salons as a way to help CienciaPR, a organization which similarly supports science education and infrastructure.
I’d also like to acknowledge the powerhouse team of women who came together to organize this event, and who turned my silly event title into a reality: Karen Yook, Theresa Cheng, Leslie Dietz, and Hannah Tavalire. 500 Women Scientists was formed in the spirit of cooperation and support, and this team truly took that to heart. I can’t wait to organize the next one with you ladies, and the next one, and the next one, and the next one…
Last night, I gave my first “science stand-up” as part of the Oregon Museum of Science and Industry (OMSI) Science Pub series at Whirled Pies in Eugene, OR. I really enjoy giving public presentations of my work, and while I’ve been on stage with a microphone before, it was the first time I got a stool to put my drink on.
I gave a talk which encompassed much of my previous work on host-associated microbiomes in moose and other ruminants, as well as more current research from others on the human gut. It’s difficult enough to fit the field of host-associated microbiomes into a semester-long class, nevermind an hour (I digress), so I kept it to the highlights: “A crash course on the microbiome of the digestive tract“. You can find the slides here: Ishaq OMSI SciPub 20180208, although there is no video presentation at this time. I was honored to have such a well-attended lecture (about 120 people!) with an engaged audience, who had some really on-track questions about the intersection of microbial diversity and health.
As I’ve discussed here before, academic outreach is a sometimes overlooked, yet nevertheless extremely important, aspect of science. The members of the general public are a large portion of our stakeholder audience, and outreach helps disseminate that research knowledge, facilitate transparency of the research process, and engage people who might benefit from or be interested in our work. As I told the audience last night, scientists do like when people ask us about our work, but “we’re more scared of you than you are of us”. I encourage everyone to add science to their life by getting informed, getting involved, and getting out to vote.
Thanks again to OMSI for inviting me to participate, and to Whirled Pies for hosting!
As a thank you, I received this awesome pint glass!
As the 2016 growing season comes to a close in Montana, here in the lab we aren’t preparing to overwinter just yet. In the last few weeks, I have been setting up my first greenhouse trial to expand upon the work we were doing in the field. My ongoing project is to look at changes in microbial diversity in response to climate change. The greenhouse trial will expand on that by looking at the potential legacy effects of soil diversity following climate change, as well as other agricultural factors.
First, though, we had to prep all of our materials, and since we are looking at microbial diversity, we wanted to minimize the potential for microbial influences. This meant that the entire greenhouse bay needed to be cleaned and decontaminated. To mitigate the environmental impact of our research, we washed and reused nearly 700 plant pots and tags in order to reduce the amount of plastic that will end up in the Bozeman landfill.
Each pot needed to be scrubbed with disinfectant soap and then soaked in bleach.
Lines of pots drying on the rack.
I scrubbed 700 labels clean in order to reuse them.
We also needed to autoclave all our soil before we could use it, to make sure we are starting with only the microorganisms we are intentionally putting in. These came directly from my plots in the field study, and are being used as an inoculum, or probiotic, into soil as we grow a new crop of wheat.
This is trial one of three, each of which has three phases, so by the end of 2016 I’ll have cleaned and put soil into 648 pots with 648 tags; planted, harvested, dried and weighed 11,664 plants; and sampled, extracted DNA from, sequenced, and analyzed 330 soil and environmental samples!
Each pot gets six tiny winter wheat seeds planted.
Trial 1: 216 pots ready to grow!
Stay tuned for more updates and results (eventually) from this and my field study!
Bioinformatics brings statistics, mathematics, and computer programming to biology and other sciences. In my area, it allows for the analysis of massive amounts of genomic (DNA), transcriptomic (RNA), proteomic (proteins), or metabolomic (metabolites) data.
In recent years, the advances in sequencing have allowed for the large-scale investigation of a variety of microbiomes. Microbiome refers to the collective genetic material or genomes of all the microorganisms in a specific environment, such as the digestive tract or the elbow. The term microbiome is often casually thrown around: some people mistakenly use it interchangeably with “microbiota”, or use it to describe only the genetic material of a specific type of microorganism (i.e. “microbiome” instead of “bacterial microbiome”). Not only have targeted, or amplicon sequencing techniques improved, but methods that use single or multiple whole genomes have become much more efficient. In both cases, this has resulted in more sequences being amplified more times. This creates “sequencing depth”, a.k.a. better “coverage”: if you can sequence one piece of DNA 10 times instead of just once of twice, then you can determine if changes in the sequence are random errors or really there. Unfortunately, faster sequencing techniques usually have more spontaneous errors, so your data are “messy” and harder to deal with. More and messier data creates the problem of handling data.
DNA analysis requires very complex mathematical equations in order to have a standardized way to quantitatively and statistically compare two or two million DNA sequences. For example, you can use equations for estimating entropy (chaos) and estimate how many sequences you might be missing due to sequencing shortcomings based on how homogeneous (similar) or varied your dataset is. If you look at your data in chunks of 100 sequences, and 90 of them are different from each other, then sequencing your dataset again will probably turn up something new. But if 90 are the same, you have likely found nearly all the species in that sample.
Bioinformatics takes these complex equations and uses computer programs to break them down into many simple pieces and automate them. However, the more data you have, the more equations the computer will need to do, and the larger your files will be. Thus, many researchers are limited by how much data they can process.
There are several challenges to analyzing any dataset. The first is assembly.
Sequencing technology can only add so many nucleotide bases to a synthesized sequence before it starts introducing more and more errors, or just stops adding altogether. To combat this increase in errors, DNA or RNA is cut into small fragments, or primers are used to amplify only certain small regions. These pieces can be sequenced from one end to another, or can be sequenced starting at both ends and working towards the middle to create a region of overlap. In that case, to assemble, the computer needs to match up both ends and create one contiguous segment (“contig”). With some platforms, like Illumina, the computer tags each sequence by where on the plate it was, so it knows which forward piece matches which reverse.
When sequencing an entire genome (or many), the pieces are enzymatically cut, or sheared by vibrating them at a certain frequency, and all the pieces are sequenced multiple times. The computer then needs to match the ends up using short pieces of overlap. This can be very resource-intensive for the computer, depending on how many pieces you need to put back together, and whether you have a reference genome for it to use (like the picture on a puzzle box), or whether you are doing it de novo from scratch (putting together a puzzle without a picture, by trial and error, two pieces at a time).
Once assembled into their respective consensus sequences, you need to quality-check the data.
This can take a significant amount of time, depending on how you go about it. It also requires good judgement, and a willingness to re-run the steps with different parameters to see what will happen. An easy and quick way is to have the computer throw out any data below a certain threshold: longer or shorter than what your target sequence length was, ambiguous bases (N) which the computer couldn’t call as a primary nucleotide (A, T, C, or G), or the confidence level (quality score) of the base call was low. These scores are generated by the sequencing machine as a relative measure of how “confident” the base call is, and this roughly translates to potential number of base call errors (ex. marking it an A instead of a T) per 1,000 bases. You can also cut off low-quality pieces, like the very beginning or ends of sequences which tend to sequence poorly and have low quality. This is a great example of where judgement is needed: if you quality-check and trim off low quality bases first, and then assemble, you are likely to have cut off the overlapping ends which end up in the middle of a contig and won’t be able to put the two halves together. If you assemble first, you might end up with a sequence that is low-quality in the middle, or very short if you trim it on the low quality portions. If your run did not sequence well and you have lot of spontaneous errors, you will have to decide whether to work with a lot of poor-quality data, or a small amount of good-quality data leftover after you trim out the rest, or spend the money to try and re-sequence.
There are several steps that I like to add, some of which are necessary and some which are technically optional. One of them is to look for chimeras, which are two sequence pieces that mistakenly got joined together. This happens during the PCR amplification step, often if there is an inconsistent electrical current or other technical problem with the machine. While time- and processor-consuming, chimera checking can remove these fake sequences before you accidentally think you’ve discovered a new species. Your screen might end up looking something like this…
Eventually, you can taxonomically and statistically assess your data.
In order to assign taxonomic identification (ex. genus or species) to a sequence, you need to have a reference database. This is a list of sequences labelled with their taxonomy (ex. Bacillus licheniformis), so that you can match your sequences to the reference and identify what you have. There are several pre-made ones publicly available, but in many cases you need to add to or edit these, and several times I have made my own using available data in online databases.
You can also statistically compare your samples. This can get complicated, but in essence tries to mathematically compare datasets to determine if they are actually different, and if that difference could have happened by chance or not. You can determine if organically-farmed soil contains more diversity than conventionally-farmed soils. Or whether you have enough sequencing coverage, or need to go back and do another run. You can also see trends across the data, for example, whether moose from different geographic locations have similar bacterial diversity to each other (left). Or whether certain species or environmental factors have a positive/negative/ or no correlation (below).
Bioinformatics can be complicated and frustrating, especially because computers are very literal machines and need to have things written in very specific ways to get them to accomplish tasks. They also aren’t very good at telling you what you are doing wrong; sometimes it’s as simple as having a space where it’s not supposed to be. It takes dedication and patience to go back through code to look for minute errors, or to backtrack in an analysis and figure out at which step several thousand sequences disappeared and why. Like any skill, computer science and bioinformatics take time and practice to master. In the end, the interpretation of the data and identifying trends can be really interesting, and it’s really rewarding when you finally manage to get your statistical program to create a particularly complicated graph!
Stay tuned for an in-depth look at my current post-doctoral work with weed management in agriculture and soil microbial diversity!
Microbiome studies do not usually employ culturing techniques, and many microorganisms are too recalcitrant to grow in the laboratory. Instead, presumptive identification is made using gene sequence comparisons to known species. The ribosome is an organelle found in all living cells (they are ubiquitous), and it is responsible for translating RNA into amino acid chains. The genes in DNA which encode the parts of the ribosome are great targets for identification-based sequencing. In particular, the small subunit of the ribosome (SSU rRNA) provides a good platform for current molecular methods, although the gene itself does not provide any information about the phenotypic functionality of the organism.
Prokaryotes, such as bacteria and archaea, have a 16S rRNA gene which is approximately 1,600 nucleotide base pairs in length. Eukaryotes, such as protozoa, fungi, plants, animals, etc., have an 18S rRNA gene which is up to 2,300 base pairs in length, depending on the kingdom. In both cases, the 16 or 18 refers to sedimentation rates, and the S stands for Svedberg Units, all-together it is a relative measure of weight and size. Thus, the 18S is larger than the 16S, and would sink faster in water. In both genes, there exist regions which are conserved (identical or near-identical) across taxa, and nine variable regions (V1-V9) . The variable regions are generally found on the exterior of the ribosome, where they are more exposed and prone to higher evolutionary rates. Since the outside of the ribosome is not integral to maintaining its structure, the variable regions are not under functional constraint and may evolve without destroying the ribosome. They provide a means for identification and classification through analysis [2-6]. The conserved areas are targets for primers, as a single primer can bind universally (to all or nearly-all) to its target taxa. The conserved regions are all on the internal structure of the ribosome, and too much change in the sequence will cause its 3D (tertiary) structure to change, thus it won’t be able to interact with the many components in the cell. Mutations or changes in the conserved regions often causes a non-functional ribosome and will kill the cell.
In addition to a small subunit, ribosomes also possess a large subunit (LSU rRNA), the 23S rRNA in prokaryotes, and the 28S rRNA in eukaryotes. Eukaryotes have an additional 5.8S subunit which is non-coding, and all small and large units of RNA have associated proteins which aid in structure and function. Taken together, this gives a combined 70S ribosome in prokaryotes, and a combined 80S ribosome rRNA in eukaryotes.
The way to study the rRNA gene is to sequence it. First, you need to extract the DNA from cells, and then you need to make millions of copies of the gene you want using Polymerase Chain Reaction (PCR). PCR and sequencing technology more or less work the same way as a cell would make copies of DNA for cell processes or division (mitosis). You take template DNA, building block nucleotides, and a polymerase enzyme which is responsible for reading the DNA sequence and making an identical copy, and with hours of troubleshooting get a billion copies! Many sequencing machines use nucleotides that have colored dyes attached, and when a nucleotide is added, that dye gets cut (cleaved) off, and the camera can catch and interpret that action. It then records each nucleotide being added to each separate DNA strand, and outputs the sequences for the microorganisms that were in your original sample!
The two main challenges facing high-throughput sequencing are in choosing a target for amplification, and being able to integrate the generated data into an increased understanding of the microbiome of the environment being studied. High-throughput sequencing can currently sequence thousands to millions of reads which are up to 600-1000 bases in length, depending on the platform. This has forced studies to choose which variable regions of the rRNA gene to amplify and sequence, and has opened up an arena for debate on which variable region to choose . And of course, the DNA analysis of all this data you’ve now created is quickly being recognized as the most difficult part- which is what I focused on during my post-doc in the Yeoman Lab. Stay tuned for a blog post on the wonderful world of bioinformatics!
Neefs J-M, Van de Peer Y, Hendriks L, De Wachter R: Compilation of small ribosomal subunit RNA sequences. Nucleic Acids Res 1990, 18:2237–2318.
Kim M, Morrison M, Yu Z: Evaluation of different partial 16S rRNA gene sequence regions for phylogenetic analysis of microbiomes.J Microbiol Methods 2010, 84:81–87.
Doud MS, Light M, Gonzalez G, Narasimhan G, Mathee K: Combination of 16S rRNA variable regions provides a detailed analysis of bacterial community dynamics in the lungs of cystic fibrosis patients.Hum. Genomics 2010, 4:147–169.
Yu Z, Morrison M: Comparisons of different hypervariable regions of rrs genes for use in fingerprinting of microbial communities by PCR-denaturing gradient gel electrophoresis.Appl Env Microbiol 2004, 70:4800–4806.
Lane DJ, Pace B, Olsen GJ, Stahl DA, Sogin ML, Pace NR: Rapid determination of 16S ribosomal RNA sequences for phylogenetic analyses.Proc Natl Acad Sci USA 1985, 82:6955–6959.
Yu Z, García-González R, Schanbacher FL, Morrison M: Evaluations of different hypervariable regions of archaeal 16S rRNA genes in profiling of methanogens by archaea-specific PCR and denaturing gradient gel electrophoresis.Appl Env Microbiol 2007, 74:889–893.