Finding the write words

Recently, a colleague recommended using Voyant Tools to analyze texts, so I thought I would give it a try.  Language metrics can give a fascinating look into a text, and in this example, into what my most commonly used words are, how verbose I can be, and how diverse my written vocabulary is.  It’s important to note that these metrics are sensitive to citation style, the use of text in legends or tables, and other bits of text in manuscripts or webpages that may get incorporated which aren’t part of the text, strictly speaking.  When possible, I uploaded just the written portion of the manuscript.

My first publication

Insight into the bacterial gut microbiome of the North American moose (Alces alces), was written in 2012 and published in BMC Microbiology, which does not have a word limit.  According to Voyant, the document contains 5,904 total words and 1,489 unique word forms. Vocabulary Density, the ratio of the number of words in the document to the number of unique words in the document, is 0.252.  A lower vocabulary density indicates complex text with lots of unique words, and a higher ratio indicates simpler text with words reused. Average Words Per Sentence is 27.1, and Most frequent words are: rumen (80); otus (68); samples (67);  moose (54); colon (50)..

This slideshow requires JavaScript.

My latest first-authored publication

An investigation into rumen fungal and protozoal diversity in three rumen fractions, during high-fiber or grain-induced sub-acute ruminal acidosis conditions, with or without active dry yeast supplementation, was written in 2017 and published in Frontiers in Microbiology, which also doesn’t have a word limit.  For this one, I altered the citation style first.  As Frontiers uses a verbose citation style (Author et al., year), my top words were “et” and “al” in the published version of the paper.  In the modified version, there are 7,580 total words and 2,067 unique word forms. Vocabulary Density: 0.273, Average Words Per Sentence: 12.7, Most frequent words: rumen (111); diversity (69); diet (56); fungal (47); protozoa (46).

This slideshow requires JavaScript.

My dissertation

My dissertation, written in 2015, contains 75,859 total words and 8,958 unique word forms. Vocabulary Density: 0.118, Average Words Per Sentence: 12.9, Most frequent words are: rumen (632); moose (411); sequences (323); using (304); samples (284).

This slideshow requires JavaScript.

Summary

To look at all my first authored research publications to date, I put all the text from the word documents together, excluding figure and table legends, as well as reference lists. Across these 8 documents, there were 40,860 total words and 5,059 unique word forms, Vocabulary Density: 0.124, Average Words Per Sentence: 26.6, Most frequent words: rumen (304); samples (301); sequences (275); using (265); moose (226).

ishaq all docs
Word cloud from 8 publications.

 

 

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s