This piece also debuts the special series we are curating in partnership with the scientific journal mSystems; “Special Series: Social Equity as a Means of Resolving Disparities in Microbial Exposure“. Over the next few months to a year, we will be adding additional peer-reviewed, cutting edge research, review, concept, and perspective pieces from researchers around the globe on a myriad of topics which center around social inequity and microbial exposures.
Beginning in early 2019, I participated as one of the guest editors for the Microbiomes Across Biological Systems special call hosted by three PLoS Journals. The journal collection was officially released in early 2020, but due to the global upheaval this year, the overview piece planned by the guest editors was not able to be completed. Here is a partial overview, written by myself and written by Dr. Noelle Noyes, Assistant Professor at the University of Minnesota.
Diet and gut systems
Ecosystem dynamics are important at any scale
As humans, animals, and plants are key members of their environmental ecosystems, so too are microorganisms key members of the host-associated ecosystems in which they reside. Throughout eons of interactions between microorganisms and macroorganism hosts, specialized and reproducible host-microbial interactions developed, leading to inherent differences in the microbial communities residing within even closely-related microorganism hosts (Bennett et al. 2020; Loo et al. 2020; Sun et al. 2020). The strength and outcome of each of these host-microbial interactions can sway the trajectory of that host’s life, and decades of research has only barely uncovered the mechanisms behind the exorbitantly complicated relationships between host and microbial community. In part, this is because the host microbiome does not develop in isolation; it is dependent on the environment (e.g. Bennett et al. 2020), on diet (e.g. Taylor et al. 2020), on the host signalment (e.g. Jacobs et al. 2020), and upon all the minute details of that hosts’s life which informs the “who”, when, why, and how of host-microbial interactions. To better understand biological systems, we must evaluate them at different scales, from the microbial ecosystems to the environmental ones, and how microbial selection and transfer are the mechanisms by which these scalable ecosystems are connected.
Your gut microbiota are what you eat
Diet is the most consistent and striking aspect of a host’s lifestyle which can select for different microbial communities in the gastrointestinal tract (Bloodgood et al. 2020; Loo et al. 2020; Lü et al. 2020; Ogato et al. 2020; Sun et al. 2020; Taylor et al. 2020), and especially at different locations along the GI tract depending on localized anatomy and organ-specific environmental conditions (Subotic et al. 2020; Lourenco et al. 2020). The amount of different macronutrients, such as proteins, fats, or carbohydrates, in a diet selectively encourage different biochemical capabilities in the gut microbiome and the microbial members which can thrive under those conditions (Lourenco et al. 2020). At a finer resolution, the specific types of each nutrient, and their availability for catabolism will also affect the gut microbiome (Taylor et al. 2020).
Yet, diet may affect the microbiome of different host species in nuanced ways, based on dissimilar anatomy of the gastrointestinal tract, the relative stability of host-microbial interactions and host reliance on their gut microbiota, and the relative stability of the diet of the host. For example, specialized herbivores which possess a four-chambered stomach, known as ruminants, are dependent on the presence of fibrolytic microbiota, yet due to the overwhelming microbial diversity present in their GI tract they have functional redundancy which allows for a great deal of latitude in the specific microbial species present in their communities. Microbiota in the rumen of cattle are easily swayed by changes in diet composition (Lourenco et al. 2020; Ogato et al. 2020), as were microbiota in sea turtles (Bloodgood et al. 2020) and potato ladybird beetles (Lü et al. 2020), whereas diet composition seems to affect only the less abundant community members in the honeybee gut (Taylor et al. 2020).
The impact of diet on the gut microbiome and host health is an active and long-standing research field, yet the depth and breadth of dietary effects leaves many questions yet unanswered, particularly in cases where feeding the animal host is prioritized over feeding the gut microbiota specifically. Animal production and weight gain is a primary goal of feeding strategies in agriculture, often with detrimental effects to the functionality of gut microbiome which can lead to systemic health problems in the animal if the perturbation to the microbiome is extensive or protracted. An understanding of how host-microbial ecosystems can be altered over time to prevent such health problems is important (Ogato et al. 2020). Similarly, wild animal recovery programs opt for diets to support weight gain in malnourished animals, even when the diet composition is contrary to their natural diet. In recovering juvenile sea turtles, feeding an omnivorous diet to promote weight gain over the herbivorous diet these turtles consume at this stage of life causes changes in gut microbiota profiles and it is unknown how this may affect long-term digestive function and health (Bloodgood et al. 2020).
An interesting and understudied aspect of the effects of diet on the gut microbiome is the potential for knock-on effects across microbial ecosystems. For example, changing the diet may impact the gut bacterial profiles based on “who” is directly catabolizing those nutrients, but may also impact other microorganisms which are supported by the byproducts of that microbial digestion. Similarly, therapeutics targeting some microbial community members may inadvertently alter other community members. A deeper understanding of how diet and medication affects the entire microbial community and not just selected members can reveal insight into community dynamics and the relative risk of medications to cause disruptions. For example, anthelmintic in beagles were shown to not alter fecal microbial communities (Fujishiro et al. 2020).
Environment to host to host: microbial transfer highlights connections between systems
Yet, what constitutes a beneficial microbiome for one animal species may be detrimental to another animal species. A dramatic example of this is vector-borne infectious disease, in which symbiotic or neutral members of an insect microbiome are highly pathogenic in other animals which have not learned to tolerate or control those particular microorganisms. Bacteria carried by arthropods, such as mosquitos, flies, or ticks, may provide nutrition or disease-mitigation benefits to its arthropod host yet cause widespread disease and mortality in humans and animals (Bennett et al. 2020). Interaction with the ecosystem can recruit microbial members to a host-associated microbial community. Habitat destruction alters the quality of the environment and thus microbial transfer from environment to insect, and this can make arthropod microbial communities more variable (Bennett et al. 2020). It is yet unknown if these knock-on changes to the arthropod microbiota will have positive or negative impacts for vector-borne diseases.
Studies such as Bennet et al. (2020), which put host-associated gut microbial community assessment into the context of habitat quality and environmental microbial transfer, remind us that microbial communities do not exist in isolation. Understanding how the environment shapes the microbial communities which shape the host is a critical aspect to understanding the connectedness between biological systems. Further, it better illuminates the dynamics of microbial transmission and when they are and are not transferred. Maternal transfer is a well-demonstrated mechanism of vertical transmission of microorganisms, and transfer between social pairs is a method of horizontal transmission of microorganisms, both often demonstrated via microbial community similarity analysis. However, when pair-bonded tree swallows are sampled asynchronously, there is no significant level of similarity in their gut microbiota (Hernandez et al. 2020).
The need to put host-associated gut microbial community assessment into the context of environment is also highlighted in Loo et al. (2020), in which habitat and geographic location impacted the gut microbiome of island finches independently of foraging diet data. Environmental conditions, localized plant diversity, and localized niche competition can also impact the type, nutritional content, and life stage of plant life, which can in turn impact the gut microbiota recruited in those host animals consuming plants. As discussed in Jacobs et al. (2020), when animals are removed from their natural environments and held in captivity, where local macro-biodiversity is dramatically reduced, there is often a corresponding decline in host gut microbial diversity which can impact animal health. In semi-captive situations, such as beehives, animals may still freely encounter diverse environmental microorganisms, but the habitat or housing design may impact host behavior and/or stress response due to interactions with humans. Chronic stress has been demonstrated to negatively impact the diversity and functionality of host-associated microbial communities in the gut by altering the host immune system and its latitude for microbial tolerance. Thus, even at the very localized scale, environmental conditions and habitat play a role in host-microbial interactions (Subotic et al. 2020).
Bennett et al. Habitat disturbance and the organization of bacterial communities in Neotropical hematophagous arthropods
Bloodgood et al. The effect of diet on the gastrointestinal microbiome of juvenile rehabilitating green turtles (Chelonia mydas)
Fujishiro et al. Evaluation of the effects of anthelmintic administration on the fecal microbiome of healthy dogs with and without subclinical Giardia spp. and Cryptosporidium canis infections
Hernandez et al. Cloacal bacterial communities of tree swallows (Tachycineta bicolor): Similarity within a population, but not between pair-bonded social partners
Jacobs et al. California condor microbiomes: Bacterial variety and functional properties in captive-bred individuals
Loo et al. An inter-island comparison of Darwin’s finches reveals the impact of habitat, host phylogeny, and island on the gut microbiome
Lourenco et al. Comparison of the ruminal and fecal microbiotas in beef calves supplemented or not with concentrate
Lü et al. Host plants influence the composition of the gut bacteria in Henosepilachna vigintioctopunctata.
Ogato et al. Long-term high-grain diet altered the ruminal pH, fermentation, and composition and functions of the rumen bacterial community, leading to enhanced lactic acid production in Japanese Black beef cattle during fattening
Subotic et al. Honey bee microbiome associated with different hive and sample types over a honey production season
Taylor et al. The effect of carbohydrate sources: Sucrose, invert sugar and components of mānuka honey, on core bacteria in the digestive tract of adult honey bees (Apis mellifera)
Environmental and physiochemical factors structure water-associated microbiomes
Water bodies are highly diverse ecosystems, and this is reflected in the articles of this Special Edition, which investigate the microbiomes of Indian mangroves, Icelandic cold springs, Antarctic lakes, urban lakes in Beijing, and Pacific seawater. A common theme emerging from this diverse collection is that water-associated microbiomes are highly influenced by the nutrient and physiochemical properties of the water body itself; and that these properties, in turn, are influenced by the surrounding atmospheric and environmental inputs. For example, nitrogen levels were correlated with microbial composition in Mangrove-associated and Beijing urban lake water samples (Dhal et al 2020, Wang et al 2020), and nitrogen fixation capacity of the microbiome was found to vary significantly by water depth within Antarctic benthic mats (Dillon et al 2020). pH levels were found to influence the microbiomes of Icelandic cold springs (Guðmundsdóttir et al 2019) and Mangrove-associated waters (Dhal et al 2020); and the archaeal composition of Beijing urban lake sediment (Wang et al 2020).
Water-associated microbiomes are dynamic across gradients, geography and time
While water bodies exhibit heterogeneous physiochemical and nutrient properties depending on their environmental and geographical circumstances, the articles in this collection demonstrate that many water-associated microbiomes fluctuate predictably and periodically. For example, the diversity of seawater microbiomes exhibited diel fluctuation, which itself was characterized by rhythmic changes in temperature and concentrations of nitrate, ammonium, phosphate and silicate (Weber et al 2020). Microbiome structure and composition also correlated with gradients established by water depth (Dillon et al 2020, Weber et al 2020), eutrophication (Dhal et al 2020), and distance from coral reefs (Weber et al 2020). These findings emphasize the importance of the physical environment from which water samples are collected, and the fact that water and water-associated samples are inherently connected to — and impacted by — features that may be located far away from the actual sampling site. This highlights the importance of contextualizing sampling sites both temporally and spatially.
I was invited to present in a session on the “Utility of Microbiomes for Population Management”, which presented research from scientists working on clams, fish, frogs, salamanders, koalas, and moose all focused on understanding the microbiome in order better understand wildlife. I had a great time talking wildlife microbes with this group!
Unfortunately, I won’t be staying longer in Reno, either. In a few hours I’m heading to Bozeman, Montana, to meet with collaborators and teach bioinformatics to a grad student.
The picture is just one instant in an event involving hundreds or thousands of organisms that were all doing a lot of different things, sometimes for just a few seconds. How would you describe it?
Maybe using the number of members present in this community? Or a list of names of attendees? The 16S rRNA gene for prokaryotes, or the 18S rRNA or ITS genes for eukaryotes, for examples, would tell us that. Those genes are found in all types of those organisms, and is a pretty effective means of basic identification. But, it’s only as good as how often that gene is found in the organisms you are looking for. There is no one gene that’s found exactly the same in all organisms, so you might need to target multiple different identification genes to look at all the different types of microorganisms, such as bacteria, fungi, protozoa, or archaea. Viruses don’t share a common gene across types, to look at viruses you’d need something else.
From our identification genes we could identify all the organisms wearing yellow; ex. phylogenetic Family = Ducks. That wouldn’t tell us if they were always found in this ecosystem (native Eugene population) or just passing through (transient population), but we could figure that out if we looked at every home game of the season and found certain community members there time and again.
But knowing they are Ducks doesn’t tell us anything else about that community member. What will they do if it starts raining? Are they able to go mountain biking? Perhaps we could identify their potential for activity by looking at the objects they are carrying? That would be akin to metagenomics, identifying all the DNA present from all the organisms, which tells us what genes are present, but not if they are currently or ever used. It can be challenging to interpret: think of sequencing data from one organism’s genome as one 1,000,000-piece puzzle and all the genomes in a community as 1,000 1,000,000-piece puzzles all dumped in a pile. In the crowd, metagenomics would tell us who had a credit card that was specifically used to buy umbrellas, but not whether they’d actually use the umbrella if it rains (ex. Eugeneans would not).
We could describe what everyone is doing at this moment. That would be transcriptomics, identifying all the RNA to determine which genes were actively being transcribed into proteins for use in some cellular function. If we see someone in the crowd using that credit card for an umbrella (DNA), the receipt would be the RNA. RNA is a working copy you make of the DNA to take to another part of the cell and use as a blueprint to make a protein. You don’t want your entire genome moving around, or need it to make one protein, so you make a small piece of RNA that will only hang around for a short period before degrading (i.e. you crumpling that RNA receipt and throwing it away because who keeps receipts anymore).
Using transcriptomics, we’d see you were activating your money to get that umbrella, but we wouldn’t see the umbrella itself. For that, we’d need metabolomics, which uses chemistry and physics instead of genomics, in order to identify chemicals (most often proteins). Think of metabolomics as describing this crowd by all the trash and crumbs and miscellaneous items they left behind. It’s one way to know what biological processes occurred (popcorn consumption and digestion).
From a technical standpoint, researching a microbiome might mean looking at all the DNA from all the organisms present to know who they are and of what they are capable. It might also mean looking at all the RNA present, which would tell you what genes were being used by “everyone” for whatever they were doing at a particular moment. Or you might also add metabolomics to identify all the chemical metabolites, which would be all the end products of what those cells were doing, and which are more stable than RNA so they could give you data about a longer frame of time. Collectively, -omics are technology that looks at all of a certain biological substance to help you understand a dynamic community. However, it’s important to remember that each technology gives a particular view of the community and comes with its own limitations.
To study DNA or RNA, there are a number of “wet-lab” (laboratory) and “dry-lab” (analysis) steps which are required to access the genetic code from inside cells, polish it to a high-sheen such that the delicate technology we rely on can use it, and then make sense of it all. Destructive enzymes must be removed, one strand of DNA must be turned into millions of strands so that collectively they create a measurable signal for sequencing, and contamination must be removed. Yet, what constitutes contamination, and when or how to deal with it, remains an actively debated topic in science. Major contamination sources include human handlers, non-sterile laboratory materials, other samples during processing, and artificial generation due to technological quirks.
Contamination from human handlers
This one is easiest to understand; we constantly shed microorganisms and our own cells and these aerosolized cells may fall into samples during collection or processing. This might be of minimal concern working with feces, where the sheer number of microbial cells in a single teaspoon swamp the number that you might have shed into it, or it may be of vital concern when investigating house dust which not only has comparatively few cells and little diversity, but is also expected to have a large amount of human-associated microorganisms present. To combat this, researchers wear personal protective equipment (PPE) which protects you from your samples and your samples from you, and work in biosafety cabinets which use laminar air flow to prevent your microbial cloud from floating onto your workstation and samples.
Fun fact, many photos in laboratories are staged, including this one, of me as a grad student. I’m just pretending to work. Reflective surfaces, lighting, cramped spaces, busy scenes, and difficulty in positioning oneself makes “action shots” difficult. That’s why many lab photos are staged, and often lack PPE.
Photo Credit: Kristina Drobny
Contamination from laboratory materials
Microbiology or molecular biology laboratory materials are sterilized before and between uses, perhaps using chemicals (ex. 70% ethanol), an ultraviolet lamp, or autoclaving which combines heat and pressure to destroy, and which can be used to sterilize liquids, biological material, clothing, metal, some plastics, etc. However, microorganisms can be tough – really tough, and can sometimes survive the harsh cleaning protocols we use. Or, their DNA can survive, and get picked up by sequencing techniques that don’t discriminate between live and dead cellular DNA.
In addition to careful adherence to protocols, some of this biologically-sourced contamination can be handled in analysis. A survey of human cell RNA sequence libraries found widespread contamination by bacterial RNA, which was attributed to environmental contamination. The paper includes an interesting discussion on how to correct this bioinformatically, as well as a perspective on contamination. Likewise, you can simply remove sequences belonging to certain taxa during quality control steps in sequence processing. There are a number of hardy bacteria that have been commonly found in laboratory reagents and are considered contaminants, the trouble is that many of these are also found in the environment, and in certain cases may be real community members. Should one throw the Bradyrhizobium out with the laboratory water bath?
Like the mythical creatures these are named for, sequence chimeras are DNA (or cDNA) strands which are accidentally created when two other DNA strands merged. Chimeric sequences can be made up of more than two DNA strand parents, but the probability of that is much lower. Chimeras occur during PCR, which takes one strand of genetic code and makes thousands to millions of copies, and a process used in nearly all sequencing workflows at some point. If there is an uneven voltage supplied to the machine, the amplification process can hiccup, producing partial DNA strands which can concatenate and produce a new strand, which might be confused for a new species. These can be removed during analysis by comparing the first and second half of each of your sequences to a reference database of sequences. If each half matches to a different “parent”, it is deemed chimeric and removed.
Cross – sample contamination
During DNA or RNA extraction, genetic code can be flicked from one sample to another during any number of wash or shaking steps, or if droplets are flicked from fast moving pipettes. This can be mitigated by properly sealing all sample containers or plates, moving slowly and carefully controlling your technique, or using precision robots which have been programmed with exacting detail — down to the curvature of the tube used, the amount and viscosity of the liquid, and how fast you want to pipette to move, so that the computer can calculate the pressure needed to perform each task. Sequencing machines are extremely expensive, and many labs are moving towards shared facilities or third-party service providers, both of which may use proprietary protocols. This makes it more difficult to track possible contamination, as was the case in a recent study using RNA; the researchers found that much of the sample-sample contamination occurred at the facility or in shipping, and that this negatively affected their ability to properly analyze trends in the data.
Sample-sample contamination during sequencing
Controlling sample-sample contamination during sequencing, however, is much more difficult to control. Each sequencing technology was designed with a different research goal in mind, for example, some generate an immense amount of short reads to get high resolution on specific areas, while others aim to get the longest continuous piece of DNA sequenced as possible before the reaction fails or become unreliable. they each come with their own quirks and potential for quality control failures.
Due to the high cost of sequencing, and the practicality that most microbiome studies don’t require more than 10,000 reads per sample, it is very common to pool samples during a run. During wet-lab processing to prepare your biological samples into a “sequencing library”, a unique piece of artificial “DNA” called a barcode, tag, or index, is added to all the pieces of genetic code in a single sample (in reality, this is not DNA but a single strand of nucleotides without any of DNA’s bells and whistles). Each of your samples gets a different barcode, and then all your samples can be mixed together in a “pool”. After sequencing the pool, your computer program can sort the sequences back into their respective samples using those barcodes.
While this technique has made sequencing significantly cheaper, it adds other complications. For example, Illumina MiSeq machines generate a certain number of sequence reads (about 200 million right now) which are divided up among the samples in that run (like a pie). The samples are added to a sequencing plate or flow cell (for things like Illumina MiSeq). The flow cells have multiple lanes where samples can be added; if you add a smaller number of samples to each lane, the machine will generate more sequences per sample, and if you add a larger number of samples, each one has fewer sequences at the end of the run. you have contamination. One drawback to this is that positive controls always sequence really well, much better than your low-biomass biological samples, which can mean that your samples do not generate many sequences during a run or means that tag switching is encouraged from your high-biomass samples to your low-biomass samples.
Cross-contamination can happen on a flow cell when the sample pool wasn’t thoroughly cleaned of adapters or primers, and there are great explanations of this here and here. To generate many copies of genetic code from a single strand, you mimic DNA replication in the lab by providing all the basic ingredients (process described here). To do that, you need to add a primer (just like with painting) which can attach to your sample DNA at a specific site and act as scaffolding for your enzyme to attach to the sample DNA and start adding bases to form a complimentary strand. Adapters are just primers with barcodes and the sequencing primer already attached. Primers and adapters are small strands, roughly 10 to 50 nucleotides long, and are much shorter than your DNA of interest, which is generally 100 to 1000 nucleotides long. There are a number of methods to remove them, but if they hang around and make it to the sequencing run, they can be incorporated incorrectly and make it seem like a sequence belongs to a different sample.
This may sound easy to fix, but sequencing library preparation already goes through a lot of stringent cleaning procedures to remove everything but the DNA (or RNA) strands you want to work with. It’s so stringent, that the problem of barcode swapping, also known as tag switching or index hopping, was not immediately apparent. Even when it is noted, it typically affects a small number of the total sequences. This may not be an issue, if you are working with rumen samples and are only interested in sequences which represent >1% of your total abundance. But it can really be an issue in low biomass samples, such as air or dust, particularly in hospitals or clean rooms. If you were trying to determine whether healthy adults were carrying but not infected by the pathogen C. difficile in their GI tract, you would be very interested in the presence of even one C. difficile sequence and would want to be extremely sure of which sample it came from. Tag switching can be made worse by combining samples from very different sample types or genetic code targets on the same run.
There are a number of articles proposing methods of dealing with tag switching using double tags to reduce confusion or other primer design techniques, computational correction or variance stabilization of the sequence data, identification and removal of contaminant sequences, or utilizing synthetic mock controls. Mock controls are microbial communities which have been created in the lab by mixed a few dozen microbial cultures together, and are used as a positive control to ensure your procedures are working. because you are adding the cells to the sample yourself, you can control the relative concentrations of each species which can act as a standard to estimate the number of cells that might be in your biological samples. Synthetic mock controls don’t use real organisms, they instead use synthetically created DNA to act as artificial “organisms”. If you find these in a biological sample, you know you have contamination. One drawback to this is that positive controls always sequence really well, much better than your low-biomass biological samples, which can mean that your samples do not generate many sequences during a run or means that tag switching is encouraged from your high-biomass samples to your low-biomass samples.
Incorrect base calls
Cross-contamination during sequencing can also be a solely bioinformatic problem – since many of the barcodes are only a few nucleotides (10 or 12 being the most commonly used), if the computer misinterprets the bases it thinks was just added, it can interpret the barcode as being a different one and attribute that sequence to being from a different sample than it was. This may not be a problem if there aren’t many incorrect sequences generated and it falls below the threshold of what is “important because it is abundant”, but again, it can be a problem if you are looking for the presence of perhaps just a few hundred cells.
When researching environments that have very low biomass, such as air, dust, and hospital or cleanroom surfaces, there are very few microbial cells to begin with. Adding even a few dozen or several hundred cells can make a dramatic impactinto what that microbial community looks like, and can confound findings.
Collectively, contamination issues can lead to batch effects, where all the samples that were processed together have similar contamination. This can be confused with an actual treatment effect if you aren’t careful in how you process your samples. For example, if all your samples from timepoint 1 were extracted, amplified, and sequenced together, and all your samples from timepoint 2 were extracted, amplified, and sequenced together later, you might find that timepoint 1 and 2 have significantly different bacterial communities. If this was because a large number of low-abundance species were responsible for that change, you wouldn’t really know if that was because the community had changed subtly or if it was because of the collective effect of low-level contamination.
Stay tuned for a piece on batch effects in sequencing!
500 Women Scientists Eugene would like to thank the organizations that helped make this event possible. First and foremost, First National Taphouse in Eugene, who shared their wonderful space with us and where we will be putting on future Salons, and donated a keg to the event! We are also extremely grateful to several organizations which contributed raffle items for us to raise additional funds, including Broadway Metro, Sizzle Pie, and the Eugene Science Center. Our beautiful logo was crafted by Cassie Cook, our amazing event posters were designed by Serena Lim, and photographer Danielle Cosme took some incredible event photos. Fertilab generously lent us a sound system, the Biology and the Built Environment Center donated the bacterial culture supplies, and both Theresa Cheng and Jessica Flannery provided materials and support for the interactive portion of the event. And of course, we want to acknowledge the national leadership of 500 Women Scientists, who brought us together, gave us a voice, and who suggested these Science Salons as a way to help CienciaPR, a organization which similarly supports science education and infrastructure.
I’d also like to acknowledge the powerhouse team of women who came together to organize this event, and who turned my silly event title into a reality: Karen Yook, Theresa Cheng, Leslie Dietz, and Hannah Tavalire. 500 Women Scientists was formed in the spirit of cooperation and support, and this team truly took that to heart. I can’t wait to organize the next one with you ladies, and the next one, and the next one, and the next one…
Last night, I gave my first “science stand-up” as part of the Oregon Museum of Science and Industry (OMSI) Science Pub series at Whirled Pies in Eugene, OR. I really enjoy giving public presentations of my work, and while I’ve been on stage with a microphone before, it was the first time I got a stool to put my drink on.
I gave a talk which encompassed much of my previous work on host-associated microbiomes in moose and other ruminants, as well as more current research from others on the human gut. It’s difficult enough to fit the field of host-associated microbiomes into a semester-long class, nevermind an hour (I digress), so I kept it to the highlights: “A crash course on the microbiome of the digestive tract“. You can find the slides here: Ishaq OMSI SciPub 20180208, although there is no video presentation at this time. I was honored to have such a well-attended lecture (about 120 people!) with an engaged audience, who had some really on-track questions about the intersection of microbial diversity and health.
As I’ve discussed here before, academic outreach is a sometimes overlooked, yet nevertheless extremely important, aspect of science. The members of the general public are a large portion of our stakeholder audience, and outreach helps disseminate that research knowledge, facilitate transparency of the research process, and engage people who might benefit from or be interested in our work. As I told the audience last night, scientists do like when people ask us about our work, but “we’re more scared of you than you are of us”. I encourage everyone to add science to their life by getting informed, getting involved, and getting out to vote.
Thanks again to OMSI for inviting me to participate, and to Whirled Pies for hosting!
As a thank you, I received this awesome pint glass!
As the 2016 growing season comes to a close in Montana, here in the lab we aren’t preparing to overwinter just yet. In the last few weeks, I have been setting up my first greenhouse trial to expand upon the work we were doing in the field. My ongoing project is to look at changes in microbial diversity in response to climate change. The greenhouse trial will expand on that by looking at the potential legacy effects of soil diversity following climate change, as well as other agricultural factors.
First, though, we had to prep all of our materials, and since we are looking at microbial diversity, we wanted to minimize the potential for microbial influences. This meant that the entire greenhouse bay needed to be cleaned and decontaminated. To mitigate the environmental impact of our research, we washed and reused nearly 700 plant pots and tags in order to reduce the amount of plastic that will end up in the Bozeman landfill.
Each pot needed to be scrubbed with disinfectant soap and then soaked in bleach.
Lines of pots drying on the rack.
I scrubbed 700 labels clean in order to reuse them.
We also needed to autoclave all our soil before we could use it, to make sure we are starting with only the microorganisms we are intentionally putting in. These came directly from my plots in the field study, and are being used as an inoculum, or probiotic, into soil as we grow a new crop of wheat.
This is trial one of three, each of which has three phases, so by the end of 2016 I’ll have cleaned and put soil into 648 pots with 648 tags; planted, harvested, dried and weighed 11,664 plants; and sampled, extracted DNA from, sequenced, and analyzed 330 soil and environmental samples!
Each pot gets six tiny winter wheat seeds planted.
Trial 1: 216 pots ready to grow!
Stay tuned for more updates and results (eventually) from this and my field study!
Bioinformatics brings statistics, mathematics, and computer programming to biology and other sciences. In my area, it allows for the analysis of massive amounts of genomic (DNA), transcriptomic (RNA), proteomic (proteins), or metabolomic (metabolites) data.
In recent years, the advances in sequencing have allowed for the large-scale investigation of a variety of microbiomes. Microbiome refers to the collective genetic material or genomes of all the microorganisms in a specific environment, such as the digestive tract or the elbow. The term microbiome is often casually thrown around: some people mistakenly use it interchangeably with “microbiota”, or use it to describe only the genetic material of a specific type of microorganism (i.e. “microbiome” instead of “bacterial microbiome”). Not only have targeted, or amplicon sequencing techniques improved, but methods that use single or multiple whole genomes have become much more efficient. In both cases, this has resulted in more sequences being amplified more times. This creates “sequencing depth”, a.k.a. better “coverage”: if you can sequence one piece of DNA 10 times instead of just once of twice, then you can determine if changes in the sequence are random errors or really there. Unfortunately, faster sequencing techniques usually have more spontaneous errors, so your data are “messy” and harder to deal with. More and messier data creates the problem of handling data.
DNA analysis requires very complex mathematical equations in order to have a standardized way to quantitatively and statistically compare two or two million DNA sequences. For example, you can use equations for estimating entropy (chaos) and estimate how many sequences you might be missing due to sequencing shortcomings based on how homogeneous (similar) or varied your dataset is. If you look at your data in chunks of 100 sequences, and 90 of them are different from each other, then sequencing your dataset again will probably turn up something new. But if 90 are the same, you have likely found nearly all the species in that sample.
Bioinformatics takes these complex equations and uses computer programs to break them down into many simple pieces and automate them. However, the more data you have, the more equations the computer will need to do, and the larger your files will be. Thus, many researchers are limited by how much data they can process.
There are several challenges to analyzing any dataset. The first is assembly.
Sequencing technology can only add so many nucleotide bases to a synthesized sequence before it starts introducing more and more errors, or just stops adding altogether. To combat this increase in errors, DNA or RNA is cut into small fragments, or primers are used to amplify only certain small regions. These pieces can be sequenced from one end to another, or can be sequenced starting at both ends and working towards the middle to create a region of overlap. In that case, to assemble, the computer needs to match up both ends and create one contiguous segment (“contig”). With some platforms, like Illumina, the computer tags each sequence by where on the plate it was, so it knows which forward piece matches which reverse.
When sequencing an entire genome (or many), the pieces are enzymatically cut, or sheared by vibrating them at a certain frequency, and all the pieces are sequenced multiple times. The computer then needs to match the ends up using short pieces of overlap. This can be very resource-intensive for the computer, depending on how many pieces you need to put back together, and whether you have a reference genome for it to use (like the picture on a puzzle box), or whether you are doing it de novo from scratch (putting together a puzzle without a picture, by trial and error, two pieces at a time).
Once assembled into their respective consensus sequences, you need to quality-check the data.
This can take a significant amount of time, depending on how you go about it. It also requires good judgement, and a willingness to re-run the steps with different parameters to see what will happen. An easy and quick way is to have the computer throw out any data below a certain threshold: longer or shorter than what your target sequence length was, ambiguous bases (N) which the computer couldn’t call as a primary nucleotide (A, T, C, or G), or the confidence level (quality score) of the base call was low. These scores are generated by the sequencing machine as a relative measure of how “confident” the base call is, and this roughly translates to potential number of base call errors (ex. marking it an A instead of a T) per 1,000 bases. You can also cut off low-quality pieces, like the very beginning or ends of sequences which tend to sequence poorly and have low quality. This is a great example of where judgement is needed: if you quality-check and trim off low quality bases first, and then assemble, you are likely to have cut off the overlapping ends which end up in the middle of a contig and won’t be able to put the two halves together. If you assemble first, you might end up with a sequence that is low-quality in the middle, or very short if you trim it on the low quality portions. If your run did not sequence well and you have lot of spontaneous errors, you will have to decide whether to work with a lot of poor-quality data, or a small amount of good-quality data leftover after you trim out the rest, or spend the money to try and re-sequence.
There are several steps that I like to add, some of which are necessary and some which are technically optional. One of them is to look for chimeras, which are two sequence pieces that mistakenly got joined together. This happens during the PCR amplification step, often if there is an inconsistent electrical current or other technical problem with the machine. While time- and processor-consuming, chimera checking can remove these fake sequences before you accidentally think you’ve discovered a new species. Your screen might end up looking something like this…
Eventually, you can taxonomically and statistically assess your data.
In order to assign taxonomic identification (ex. genus or species) to a sequence, you need to have a reference database. This is a list of sequences labelled with their taxonomy (ex. Bacillus licheniformis), so that you can match your sequences to the reference and identify what you have. There are several pre-made ones publicly available, but in many cases you need to add to or edit these, and several times I have made my own using available data in online databases.
You can also statistically compare your samples. This can get complicated, but in essence tries to mathematically compare datasets to determine if they are actually different, and if that difference could have happened by chance or not. You can determine if organically-farmed soil contains more diversity than conventionally-farmed soils. Or whether you have enough sequencing coverage, or need to go back and do another run. You can also see trends across the data, for example, whether moose from different geographic locations have similar bacterial diversity to each other (left). Or whether certain species or environmental factors have a positive/negative/ or no correlation (below).
Bioinformatics can be complicated and frustrating, especially because computers are very literal machines and need to have things written in very specific ways to get them to accomplish tasks. They also aren’t very good at telling you what you are doing wrong; sometimes it’s as simple as having a space where it’s not supposed to be. It takes dedication and patience to go back through code to look for minute errors, or to backtrack in an analysis and figure out at which step several thousand sequences disappeared and why. Like any skill, computer science and bioinformatics take time and practice to master. In the end, the interpretation of the data and identifying trends can be really interesting, and it’s really rewarding when you finally manage to get your statistical program to create a particularly complicated graph!
Stay tuned for an in-depth look at my current post-doctoral work with weed management in agriculture and soil microbial diversity!