As a new assistant professor at the University of Maine, 50% of my appointment is research. To establish my research, I started with curating a space to fulfill the needs of my work — “professional nesting”, if you will. I was allotted two adjacent rooms for my lab work, one as a microbial culturing space, and one for genomics work. I asked for and was granted separate spaces to reduce to likelihood of contamination sourced from my culturing space.
Prior to my arrival at the University of Maine, both lab spaces were set up to perform different research from what I do. This may not seem like it would interfere with my work, but the type of research you do will influence the machinery you need, each of which may have space or utilities requirements, as well as the flow of traffic through the room. To reduce the amount of time you spend moving around the room in search of elusive supplies, it’s best to curate work stations within the room. To that end, the Ishaq lab team spent several days re-arranging the large machinery and the table-top equipment, and then moving the supplies to the cabinets in corresponding locations. This change was most evident in the genomics room, that was previously used for human cell culture and biochemistry, shown below. At this time, I’m still working on updating the microbial culture room, which is larger and contained many more bits and pieces to organize.
Most research labs use extremely specialized equipment and machinery. Some of this was made available to me immediately; when research labs are discontinued, ownership of equipment and consumable materials reverts back to the researcher’s home department. I needed to purchase some of the more research-specific equipment, using some of the funds allotted to me for this purpose. Buying equipment can be stressful, because it can be incredibly expensive, and you want to be sure you selected the machine brand and range of capabilities for what you might want to do over the next 5 – 10 years, at least.
Finally, you need to stock your lab with reagents and researchers, but both of these have been temporarily put on hold as of March 2020, as we do our part to reduce the transmission of the Covid-19 virus. Whenever it is safe to do so, I look forward to completing the updates to my spaces and opening them up for collaborative work.
Now that I’m an assistant professor, a significant amount of my time is spent writing grant proposals to fund projects I’d like to do in the future.
Many large federal or foundational grants take up to a year from submission to funds distribution, and the success rate, especially for newly-established researches, can be quite low. It’s prudent to start writing well in advance of the due date, and to start small, with “pilot projects”.
To that end, I’m pleased to announce that Dr. Lily Calderwood and I just received word that the Wild Blueberry Commission of Maine is funding a pilot project of ours; “Exploration of Soil Microbiota in Wild Blueberry Soils“. We’ll be recruiting 1 – 2 UMaine students for summer/fall 2020 to participate in the research for their Capstone senior research projects.
Dr. Calderwood is an Extension Wild Blueberry Specialist, and Assistant Professor of Horticulture in the School of Food and Agriculture at UMaine. She and I developed this project when meeting for the first time, over coffee. We realized we’d both been at the University of Vermont doing our PhD’s concurrently, and in neighboring buildings! We got to chatting about my work in wheat soil microbial communities, and her work on blueberry production, and the untapped research potential between the two.
This pilot will generate some preliminary data to help us get a first look at the soil microbiota associated with blueberries, and in response to management practices and environmental conditions. From this seed funding, Lily and I hope to cultivate fruitful research projects for years to come!
In summer 2019, I developed and taught a course on ‘Microbes and Social Equity‘ to the Clark Honors College at the University of Oregon. The course assignments were literature review essays on various topics, which were compiled into a single manuscript as the group-based final project for the course. This large version is available as a preprint; however, the published version is more focused.
Suzanne L. Ishaq1,2*, Maurisa Rapp2,3, Risa Byerly2,3, Loretta S. McClellan2, Maya R. O’Boyle2, Anika Nykanen2, Patrick J. Fuller2,4, Calvin Aas2, Jude M. Stone2, Sean Killpatrick2,4, Manami M. Uptegrove2, Alex Vischer2, Hannah Wolf2, Fiona Smallman2, Houston Eymann2,5, Simon Narode2, Ellee Stapleton6, Camille C. Cioffi7, Hannah Tavalire8
Biology and the Built Environment Center, University of Oregon
Robert D. Clark Honors College, University of Oregon
Department of Human Physiology, University of Oregon
Charles H. Lundquist College of Business, University of Oregon
School of Journalism and Communication, University of Oregon
Department of Landscape Architecture, University of Oregon
Counseling Psychology and Human Services, College of Education, University of Oregon
Institute of Ecology and Evolution, University of Oregon
What do ‘microbes’ have to do with social equity? On the surface, very little. But these little organisms are integral to our health, the health of our natural environment, and even impact the ‘health’ of the environments we have built. Early life and the maturation of the immune system, our diet and lifestyle, and the quality of our surrounding environment can all impact our health. Similarly, the loss, gain, and retention of microorganisms — namely their flow from humans to the environment and back — can greatly impact our health and well-being. It is well-known that inequalities in access to perinatal care, healthy foods and fiber, a safe and clean home, and to the natural environment can create and arise from social inequality. Here, we focus on the argument that access to microorganisms as a facet of public health, and argue that health inequality may be compounded by inequitable microbial exposure.
After several years of bouncing through internal and external review, I’m pleased to announce that the first microbes paper out of the Montana State University Fort Ellis project has been published in Geoderma! The Fort Ellis research has encompassed multiple labs, projects, and many personnel, as it was a large collaboration looking at the effect of different farming systems on biodiversity at the macro (plant), mini (insect), and micro (-be) levels. Spanning multiple years, this project has been a massive undertaking that I briefly participated in but anticipate getting four publications out of (two more are in preparation).
Despite knowledge that management practices, seasonality, and plant phenology impact soil microbiota; farming system effects on soil microbiota are not often evaluated across the growing season. We assessed the bacterial diversity in soil around wheat roots through the spring and summer of 2016 in winter wheat (Triticum aestivium L.) in Montana, USA, from three contrasting farming systems: a chemically-managed no-tillage system, and two USDA-certified organic systems in their fourth year, one including tillage and one where sheep grazing partially offsets tillage frequency. Bacterial richness (range 605 – 1174 OTUs) and evenness (range 0.80 – 0.92) peaked in early June and dropped by late July (range 92 – 1190, 0.62-0.92, respectively), but was not different by farming systems. Organic tilled plots contained more putative nitrogen-fixing bacterial genera than the other two systems. Bacterial community similarities were significantly altered by sampling date, minimum and maximum temperature at sampling, bacterial abundance at date of sampling, total weed richness, and coverage of Taraxacum officinale, Lamium ampleuxicaule, and Thlaspi arvense. This study highlights that weed diversity, season, and farming management system all influence soil microbial communities. Local environmental conditions will strongly condition any practical applications aimed at improving soil diversity, especially in semi-arid regions where abiotic stress and seasonal variability in temperature and water availability drive primary production. Thus, it is critical to incorporate or address seasonality in soil sampling for microbial diversity.
The picture is just one instant in an event involving hundreds or thousands of organisms that were all doing a lot of different things, sometimes for just a few seconds. How would you describe it?
Maybe using the number of members present in this community? Or a list of names of attendees? The 16S rRNA gene for prokaryotes, or the 18S rRNA or ITS genes for eukaryotes, for examples, would tell us that. Those genes are found in all types of those organisms, and is a pretty effective means of basic identification. But, it’s only as good as how often that gene is found in the organisms you are looking for. There is no one gene that’s found exactly the same in all organisms, so you might need to target multiple different identification genes to look at all the different types of microorganisms, such as bacteria, fungi, protozoa, or archaea. Viruses don’t share a common gene across types, to look at viruses you’d need something else.
From our identification genes we could identify all the organisms wearing yellow; ex. phylogenetic Family = Ducks. That wouldn’t tell us if they were always found in this ecosystem (native Eugene population) or just passing through (transient population), but we could figure that out if we looked at every home game of the season and found certain community members there time and again.
But knowing they are Ducks doesn’t tell us anything else about that community member. What will they do if it starts raining? Are they able to go mountain biking? Perhaps we could identify their potential for activity by looking at the objects they are carrying? That would be akin to metagenomics, identifying all the DNA present from all the organisms, which tells us what genes are present, but not if they are currently or ever used. It can be challenging to interpret: think of sequencing data from one organism’s genome as one 1,000,000-piece puzzle and all the genomes in a community as 1,000 1,000,000-piece puzzles all dumped in a pile. In the crowd, metagenomics would tell us who had a credit card that was specifically used to buy umbrellas, but not whether they’d actually use the umbrella if it rains (ex. Eugeneans would not).
We could describe what everyone is doing at this moment. That would be transcriptomics, identifying all the RNA to determine which genes were actively being transcribed into proteins for use in some cellular function. If we see someone in the crowd using that credit card for an umbrella (DNA), the receipt would be the RNA. RNA is a working copy you make of the DNA to take to another part of the cell and use as a blueprint to make a protein. You don’t want your entire genome moving around, or need it to make one protein, so you make a small piece of RNA that will only hang around for a short period before degrading (i.e. you crumpling that RNA receipt and throwing it away because who keeps receipts anymore).
Using transcriptomics, we’d see you were activating your money to get that umbrella, but we wouldn’t see the umbrella itself. For that, we’d need metabolomics, which uses chemistry and physics instead of genomics, in order to identify chemicals (most often proteins). Think of metabolomics as describing this crowd by all the trash and crumbs and miscellaneous items they left behind. It’s one way to know what biological processes occurred (popcorn consumption and digestion).
From a technical standpoint, researching a microbiome might mean looking at all the DNA from all the organisms present to know who they are and of what they are capable. It might also mean looking at all the RNA present, which would tell you what genes were being used by “everyone” for whatever they were doing at a particular moment. Or you might also add metabolomics to identify all the chemical metabolites, which would be all the end products of what those cells were doing, and which are more stable than RNA so they could give you data about a longer frame of time. Collectively, -omics are technology that looks at all of a certain biological substance to help you understand a dynamic community. However, it’s important to remember that each technology gives a particular view of the community and comes with its own limitations.
Last year, one of my former research groups at Montana State University was awarded a USDA NIFA Foundational program grant, and I am a sub-award PI on that grant. We’ll be working together to investigate the effect of diversified farming systems – such as those that use cover crops, rotations, or integrate livestock grazing into field management – on crop production and soil bacterial communities: “Diversifying cropping systems through cover crops and targeted grazing: impacts on plant-microbe-insect interactions, yield and economic returns.”
The first soil samples were collected in Montana this summer, and I have been processing them for the past few weeks. I am using the opportunity to train a master’s student on microbiology and molecular genetics lab work.
Tindall Ouverson started this fall as a master’s student at MSU, working with Fabian Menalled and Tim Seipel in Bozeman, MT. She’s an environmental and soil scientist, and this is her first time working with microbes. She was here in Eugene for just a few days to learn everything needed for sequencing: DNA extraction, polymerase chain reaction, gel electrophoresis and visualization, DNA cleanup using magnetic beads, quantification, and pooling. Despite not having experience in microbiology or molecular biology, Tindall showed a real aptitude and picked up the techniques faster than I expected!
Once the sequences are generated, I’ll be (remotely) training Tindall on DNA sequence analysis. I’ll also be serving as one of her thesis committee members! Tindall will be the first of (hopefully) many cross-trained graduate students between myself and collaborators at MSU.
Sequence data contamination from biological or digital sources can obscure true results and falsely raise one’s hopes. Contamination is a persist issue in microbial ecology, and each experiment faces unique challenges from a myriad of sources, which I have previously discussed. In microbiology, those microscopic stowaways and spurious sequencing errors can be difficult to identify as non-sample contaminants, and collectively they can create large-scale changes to what you think a microbial community looks like.
Samples from large studies are often processed in batches based on how many samples can be processed by certain laboratory equipment, and if these span multiple bottles of reagents, or water-filtration systems, each batch might end up with a unique contamination profile. If your samples are not randomized between batches, and each batch ends up representing a specific time point or a treatment from your experiment, these batch effects can be mistaken for a treatment effect (a.k.a. a false positive).
“The times were statistically greater than prior time periods, while simultaneously being statistically lesser to prior times, according to longitudinal analysis.”
Over the past year, I analyzed a particularly complex bacterial 16S rRNA gene sequence data set, comprising nearly 600 home dust samples, and about 90 controls. Samples were collected from three climate regions in Oregon, over a span of one year, in which homes were sampled before and approximately six weeks after a home-specific weatherization improvement (treatment homes) or simply six weeks later in (comparison) homes which were eligible for weatherization but did not receive it. As these samples were collected over a span of a year, they were extracted with two different sequencing kits and multiple DNA extraction batches, although all within a short time after collection. The extracted DNA was spread across two sequence runs to allow for data processing to begin on cohort 1, while we waited for cohort 2 homes to be weatherized. Thus, there were a lot of opportunities to introduce technical error or biological contamination that could be conflated with treatment effects.
On top of this, each home was unique, with it’s own human and animal occupants, architectural and interior design, plants, compost, and quirks, and we didn’t ask homeowners to modify their behavior in any way. This was important, as it meant each of the homes – and their microbiomes – are somewhat unique. Therefore I didn’t want to remove sequences which might be contaminants on the basis of low abundance and risk removing microbial community members which were specific to that home. After the typical quality assurance steps to curate and process the data, which can be found on GitHub as an R script of a DADA2 package workflow, I needed to decide what to do with the negative controls.
Because sequencing is expensive, most of the time there is only one negative control included in sequencing library preparation, if that. The negative control is a blank sample – just water, or an unused swab – which does not intentionally contain cells or nucleic acids. Thus anything you find there will have come from contamination. The negative control can be used to normalize the relative abundance numbers – if you find 1,000 sequences in the negative control, which is supposed to have no DNA in it, then you might only continue looking at samples with a certain amount higher than 1,000 sequences. This risks throwing out valid sequences that happen to be rare. Alternatively, you can try to identify the contaminants and remove whole taxa from your data set, risking the complete removal of valid taxa.
I had three types of negative controls: sterile DNA swabs which were processed to check for biological contamination in collection materials, kit controls where a blank extraction was run for each batch of extractions to test for biological contamination in extraction reagents, and PCR negative controls to check for DNA contamination of PCR reagents. In total, 90 control samples were sequenced, giving me unprecedented resolution to deal with contamination. Looking at the total number of sequences before and after my quality-analysis processing, I can see that the number of sequences in my negative controls reduces dramatically; they were low-quality in some way and might be sequencing artifacts. But, an unsatisfactory number remain after QA filtering; these are high-quality and likely come from microbial contamination.
I wasn’t sure how I wanted to deal with each type of control. I came up with three approaches, and then looked at unweighted, non-rarefied ordination plots (PCoA) to watch how my axes changed based on important components (factors). What follows is a narrative summarize of what I did, but I included the R script of my phyloseq package workflow and workaround on GitHub.
“In microbial ecology, preprints are posted on late November nights. The foreboding atmosphere of conflated factors makes everyone uneasy.”
Ordination plots visualize lots of complex communities together. In both ordination figures below, each point on the graph represents a dust sample from one house. They are clustered by community distance: those closer together on the plot have a more similar community than points which are further away from each other. The points are shaped by the location of the samples, including Bend, Eugene, Portland, along with a few pilot samples labeled “Out”, and negative controls which have no location (not pictured but listed as NA). The points are colored by DNA extraction b
In Figure 1, the primary axis (axis 1) shows a clear clustering of samples by DNA extraction batch, but this is also mixed with geographic location, and as it turns out – date of collection and sequencing run. We know from other studies that geographic location, date of collection, and sequencing batch can all affect the microbial community.
Approach 1: Subtraction + outright removal
This approach subsets my data into DNA extraction batches, and then uses the number of sequences found in the negative controls to subtract out sequences from my dust samples. This assumes that if a particular sequence showed up 10 times in my negative control, but 50 times in my dust samples, that only 40 of those in my dust sample were real. For each of my DNA extraction batch negative control samples, I obtained the sum of each potential contaminant that I found there, and then subtracted those sums from the same sequence columns in my dust samples.
Approach 1 was alright, but there was still an effect of DNA extraction batch (indicated by color scale) that was stronger than location or treatment (not included on this graph). This approach is also more pertinent for working with OTUs, or situations where you wouldn’t want to remove the whole OTU, just subtract out a certain number sequences from specific columns. There is currently no way to do that just from phyloseq, so I made a work-around (see the GitHub page). However, using DADA2 gives you Sequence Variants, which are more precise and I found it’s better to remove them with approach 3.
Approach 2: Total Removal
This approach removes any contaminant sequences that is found in ANY of the negative controls from ALL the house samples, regardless of which negative control was for which extraction batch. This approach assumes that if it a sequence was found as a contaminant in a negative control somewhere, that it is a contaminant everywhere.
Once again, approach 2 was alright, and now that primary axis (axis 1) of potential batch effect is now my secondary axis; so there is still an effect of DNA extraction batch (indicated by color scale) but it is weaker. When I recolor by different variables, there is much more clustering by Treatment than by any batch effects. However, that second axis is also one of my time variables, so don’t want to get rid of all of the variation on that axis. But, since my negative kit controls showed a lot of variation in number and types of taxa, I don’t want to remove everything found there from all samples indiscriminately.
Additionally, I don’t favor throwing sequences out just because they were a contaminant somewhere, particularly for dust samples. Contamination can be situational, particularly if a microbe is found in the local air or water supply and would be legitimately found in house dust but would have also accidentally gotten into the extraction process.
Approach 3: “To each its own”
This approach removes all the sequences from PCR and swab contaminant SVs fully from each cohort, respectively, and removes extraction kit contaminants fully from each DNA extraction batch, respectively. I took all the sequences of the SVs found in my dust samples and made them into a vector (list), and then I took all the sequences of the SVs found in my controls and made them into a different vector. I effectively subtracted out the contaminant SVs by name, but asking to find the sequences which were different between my two lists (thus returning the sequences which were in my dust samples but not in my control samples). I did this respective to each sequencing cohort and batch, so that I only remove the pertinent sequences (ex. using kit control 1 to subtract from DNA extraction batch 1).
In Figure 4, potential batch effect is solidly my secondary axis and not the primary driving force behind clustering. The primary axis (axis 1) shows a clear separation by climate zone, or location of homes, once the batch contamination has been removed. When I recolor by different variables, there is much more clustering by Treatment and almost none by batch effects. I say almost none, because some of my DNA extraction batches also happen to be Treatment batches, as they represent a subset of samples from a different location. Thus, I can’t tell if those samples cluster separately solely because of location or also because of batch effect. However, I am satisfied with the results and ready to move on.
Unlike its namesake, this tale has a happier ending.
To study DNA or RNA, there are a number of “wet-lab” (laboratory) and “dry-lab” (analysis) steps which are required to access the genetic code from inside cells, polish it to a high-sheen such that the delicate technology we rely on can use it, and then make sense of it all. Destructive enzymes must be removed, one strand of DNA must be turned into millions of strands so that collectively they create a measurable signal for sequencing, and contamination must be removed. Yet, what constitutes contamination, and when or how to deal with it, remains an actively debated topic in science. Major contamination sources include human handlers, non-sterile laboratory materials, other samples during processing, and artificial generation due to technological quirks.
Contamination from human handlers
This one is easiest to understand; we constantly shed microorganisms and our own cells and these aerosolized cells may fall into samples during collection or processing. This might be of minimal concern working with feces, where the sheer number of microbial cells in a single teaspoon swamp the number that you might have shed into it, or it may be of vital concern when investigating house dust which not only has comparatively few cells and little diversity, but is also expected to have a large amount of human-associated microorganisms present. To combat this, researchers wear personal protective equipment (PPE) which protects you from your samples and your samples from you, and work in biosafety cabinets which use laminar air flow to prevent your microbial cloud from floating onto your workstation and samples.
Fun fact, many photos in laboratories are staged, including this one, of me as a grad student. I’m just pretending to work. Reflective surfaces, lighting, cramped spaces, busy scenes, and difficulty in positioning oneself makes “action shots” difficult. That’s why many lab photos are staged, and often lack PPE.
Photo Credit: Kristina Drobny
Contamination from laboratory materials
Microbiology or molecular biology laboratory materials are sterilized before and between uses, perhaps using chemicals (ex. 70% ethanol), an ultraviolet lamp, or autoclaving which combines heat and pressure to destroy, and which can be used to sterilize liquids, biological material, clothing, metal, some plastics, etc. However, microorganisms can be tough – really tough, and can sometimes survive the harsh cleaning protocols we use. Or, their DNA can survive, and get picked up by sequencing techniques that don’t discriminate between live and dead cellular DNA.
In addition to careful adherence to protocols, some of this biologically-sourced contamination can be handled in analysis. A survey of human cell RNA sequence libraries found widespread contamination by bacterial RNA, which was attributed to environmental contamination. The paper includes an interesting discussion on how to correct this bioinformatically, as well as a perspective on contamination. Likewise, you can simply remove sequences belonging to certain taxa during quality control steps in sequence processing. There are a number of hardy bacteria that have been commonly found in laboratory reagents and are considered contaminants, the trouble is that many of these are also found in the environment, and in certain cases may be real community members. Should one throw the Bradyrhizobium out with the laboratory water bath?
Like the mythical creatures these are named for, sequence chimeras are DNA (or cDNA) strands which are accidentally created when two other DNA strands merged. Chimeric sequences can be made up of more than two DNA strand parents, but the probability of that is much lower. Chimeras occur during PCR, which takes one strand of genetic code and makes thousands to millions of copies, and a process used in nearly all sequencing workflows at some point. If there is an uneven voltage supplied to the machine, the amplification process can hiccup, producing partial DNA strands which can concatenate and produce a new strand, which might be confused for a new species. These can be removed during analysis by comparing the first and second half of each of your sequences to a reference database of sequences. If each half matches to a different “parent”, it is deemed chimeric and removed.
Cross – sample contamination
During DNA or RNA extraction, genetic code can be flicked from one sample to another during any number of wash or shaking steps, or if droplets are flicked from fast moving pipettes. This can be mitigated by properly sealing all sample containers or plates, moving slowly and carefully controlling your technique, or using precision robots which have been programmed with exacting detail — down to the curvature of the tube used, the amount and viscosity of the liquid, and how fast you want to pipette to move, so that the computer can calculate the pressure needed to perform each task. Sequencing machines are extremely expensive, and many labs are moving towards shared facilities or third-party service providers, both of which may use proprietary protocols. This makes it more difficult to track possible contamination, as was the case in a recent study using RNA; the researchers found that much of the sample-sample contamination occurred at the facility or in shipping, and that this negatively affected their ability to properly analyze trends in the data.
Sample-sample contamination during sequencing
Controlling sample-sample contamination during sequencing, however, is much more difficult to control. Each sequencing technology was designed with a different research goal in mind, for example, some generate an immense amount of short reads to get high resolution on specific areas, while others aim to get the longest continuous piece of DNA sequenced as possible before the reaction fails or become unreliable. they each come with their own quirks and potential for quality control failures.
Due to the high cost of sequencing, and the practicality that most microbiome studies don’t require more than 10,000 reads per sample, it is very common to pool samples during a run. During wet-lab processing to prepare your biological samples into a “sequencing library”, a unique piece of artificial “DNA” called a barcode, tag, or index, is added to all the pieces of genetic code in a single sample (in reality, this is not DNA but a single strand of nucleotides without any of DNA’s bells and whistles). Each of your samples gets a different barcode, and then all your samples can be mixed together in a “pool”. After sequencing the pool, your computer program can sort the sequences back into their respective samples using those barcodes.
While this technique has made sequencing significantly cheaper, it adds other complications. For example, Illumina MiSeq machines generate a certain number of sequence reads (about 200 million right now) which are divided up among the samples in that run (like a pie). The samples are added to a sequencing plate or flow cell (for things like Illumina MiSeq). The flow cells have multiple lanes where samples can be added; if you add a smaller number of samples to each lane, the machine will generate more sequences per sample, and if you add a larger number of samples, each one has fewer sequences at the end of the run. you have contamination. One drawback to this is that positive controls always sequence really well, much better than your low-biomass biological samples, which can mean that your samples do not generate many sequences during a run or means that tag switching is encouraged from your high-biomass samples to your low-biomass samples.
Cross-contamination can happen on a flow cell when the sample pool wasn’t thoroughly cleaned of adapters or primers, and there are great explanations of this here and here. To generate many copies of genetic code from a single strand, you mimic DNA replication in the lab by providing all the basic ingredients (process described here). To do that, you need to add a primer (just like with painting) which can attach to your sample DNA at a specific site and act as scaffolding for your enzyme to attach to the sample DNA and start adding bases to form a complimentary strand. Adapters are just primers with barcodes and the sequencing primer already attached. Primers and adapters are small strands, roughly 10 to 50 nucleotides long, and are much shorter than your DNA of interest, which is generally 100 to 1000 nucleotides long. There are a number of methods to remove them, but if they hang around and make it to the sequencing run, they can be incorporated incorrectly and make it seem like a sequence belongs to a different sample.
This may sound easy to fix, but sequencing library preparation already goes through a lot of stringent cleaning procedures to remove everything but the DNA (or RNA) strands you want to work with. It’s so stringent, that the problem of barcode swapping, also known as tag switching or index hopping, was not immediately apparent. Even when it is noted, it typically affects a small number of the total sequences. This may not be an issue, if you are working with rumen samples and are only interested in sequences which represent >1% of your total abundance. But it can really be an issue in low biomass samples, such as air or dust, particularly in hospitals or clean rooms. If you were trying to determine whether healthy adults were carrying but not infected by the pathogen C. difficile in their GI tract, you would be very interested in the presence of even one C. difficile sequence and would want to be extremely sure of which sample it came from. Tag switching can be made worse by combining samples from very different sample types or genetic code targets on the same run.
There are a number of articles proposing methods of dealing with tag switching using double tags to reduce confusion or other primer design techniques, computational correction or variance stabilization of the sequence data, identification and removal of contaminant sequences, or utilizing synthetic mock controls. Mock controls are microbial communities which have been created in the lab by mixed a few dozen microbial cultures together, and are used as a positive control to ensure your procedures are working. because you are adding the cells to the sample yourself, you can control the relative concentrations of each species which can act as a standard to estimate the number of cells that might be in your biological samples. Synthetic mock controls don’t use real organisms, they instead use synthetically created DNA to act as artificial “organisms”. If you find these in a biological sample, you know you have contamination. One drawback to this is that positive controls always sequence really well, much better than your low-biomass biological samples, which can mean that your samples do not generate many sequences during a run or means that tag switching is encouraged from your high-biomass samples to your low-biomass samples.
Incorrect base calls
Cross-contamination during sequencing can also be a solely bioinformatic problem – since many of the barcodes are only a few nucleotides (10 or 12 being the most commonly used), if the computer misinterprets the bases it thinks was just added, it can interpret the barcode as being a different one and attribute that sequence to being from a different sample than it was. This may not be a problem if there aren’t many incorrect sequences generated and it falls below the threshold of what is “important because it is abundant”, but again, it can be a problem if you are looking for the presence of perhaps just a few hundred cells.
When researching environments that have very low biomass, such as air, dust, and hospital or cleanroom surfaces, there are very few microbial cells to begin with. Adding even a few dozen or several hundred cells can make a dramatic impactinto what that microbial community looks like, and can confound findings.
Collectively, contamination issues can lead to batch effects, where all the samples that were processed together have similar contamination. This can be confused with an actual treatment effect if you aren’t careful in how you process your samples. For example, if all your samples from timepoint 1 were extracted, amplified, and sequenced together, and all your samples from timepoint 2 were extracted, amplified, and sequenced together later, you might find that timepoint 1 and 2 have significantly different bacterial communities. If this was because a large number of low-abundance species were responsible for that change, you wouldn’t really know if that was because the community had changed subtly or if it was because of the collective effect of low-level contamination.
Stay tuned for a piece on batch effects in sequencing!
The Menalled lab has MS and PhD opportunities in agroecology, “Diversifying cropping systems through cover crops and targeted grazing: impacts on plant-microbe-insect interactions, yield, and economic returns”.
Ruminal acidosis is a condition in which the pH of the rumen is considerably lower than normal, and if severe enough can cause damage to the stomach and localized symptoms, or systemic illness in cows. Often, these symptoms result from the low pH reducing the ability of microorganisms to ferment fiber, or by killing them outright. Since the cow can’t break down most of its plant-based diet without these microorganisms, this disruption can cause all sorts of downstream health problems. Negative health effects can also occur when the pH is somewhat lowered, or is lowered briefly but repeatedly, even if the cow isn’t showing outward clinical symptoms. This is known as sub-acute ruminal acidosis (SARA), and can also cause serious side effects for cows and an economic loss for producers.
In livestock, acidosis usually occurs when ruminants are abruptly switched to a highly-fermentable diet- something with a lot of grain/starch that causes a dramatic increase in bacterial fermentation and a buildup of lactate in the rumen. To prevent this, animals are transitioned incrementally from one diet to the next over a period of days or weeks. Another strategy is to add something to the diet to help buffer rumen pH, such as a probiotic. One of the most common species used to help treat or prevent acidosis is a yeast; Saccharomyces cerevisiae.
This paper was part of a larger study on S. cerevisiae use in cattle to treat SARA, the effects of which on animal production as well as bacterial diversity and functionality have already been published by an old friend and colleague of mine, Dr. Ousama AlZahal, and several others. In total, very little work has been done on the effect of SARA or S. cerevisiae treatment on the fungal or protozoal diversity in the rumen, which is what I added to this study. I was very pleased to be invited to analyze and interpret some of the data, as well as to present the results at a conference in Chicago earlier this year. The article itself has just been published in Frontiers in Microbiology!
An investigation into rumen fungal and protozoal diversity in three rumen fractions, during high-fiber or grain-induced sub-acute ruminal acidosis conditions, with or without active dry yeast supplementation.
Sub-acute ruminal acidosis (SARA) is a gastrointestinal functional disorder in livestock characterized by low rumen pH, which reduces rumen function, microbial diversity, host performance, and host immune function. Dietary management is used to prevent SARA, often with yeast supplementation as a pH buffer. Almost nothing is known about the effect of SARA or yeast supplementation on ruminal protozoal and fungal diversity, despite their roles in fiber degradation. Dairy cows were switched from a high-fiber to high-grain diet abruptly to induce SARA, with and without active dry yeast (ADY, Saccharomyces cerevisiae) supplementation, and sampled from the rumen fluid, solids, and epimural fractions to determine microbial diversity using the protozoal 18S rRNA and the fungal ITS1 genes via Illumina MiSeq sequencing. Diet-induced SARA dramatically increased the number and abundance of rare fungal taxa, even in fluid fractions where total reads were very low, and reduced protozoal diversity. SARA selected for more lactic-acid utilizing taxa, and fewer fiber-degrading taxa. ADY treatment increased fungal richness (OTUs) but not diversity (Inverse Simpson, Shannon), but increased protozoal richness and diversity in some fractions. ADY treatment itself significantly (P < 0.05) affected the abundance of numerous fungal genera as seen in the high-fiber diet: Lewia, Neocallimastix, and Phoma were increased, while Alternaria, Candida Orpinomyces, and Piromyces spp. were decreased. Likewise, for protozoa, ADY itself increased Isotricha intestinalis but decreased Entodinium furca spp. Multivariate analyses showed diet type was most significant in driving diversity, followed by yeast treatment, for AMOVA, ANOSIM, and weighted UniFrac. Diet, ADY, and location were all significant factors for fungi (PERMANOVA, P = 0.0001, P = 0.0452, P = 0.0068, Monte Carlo correction, respectively, and location was a significant factor (P = 0.001, Monte Carlo correction) for protozoa. Diet-induced SARA shifts diversity of rumen fungi and protozoa and selects against fiber-degrading species. Supplementation with ADY mitigated this reduction in protozoa, presumptively by triggering microbial diversity shifts (as seen even in the high-fiber diet) that resulted in pH stabilization. ADY did not recover the initial community structure that was seen in pre-SARA conditions.