thuringiensis) was no more cohesive than that of randomly selected sets of isolates from the same genus, indicating that the current taxonomy of those species may need to be revisited. The differing pan-genomic properties of the various genera reported in this paper reflect the fact that different groups of bacteria have diverse evolutionary pressures and unequal rates of genomic evolution, and provide a starting point for a general, genome-based Smad activation understanding of such differences in a broad range of bacteria. We also note that the analyses described in this paper could be applied to any groups of interest, whether or not
the bacteria included in each group have a common taxonomic classification. The commonalities in each group could instead be related to phenotype; for example, ability to live in a particular environment, physiological properties, metabolic capabilities, or even disease pathogenesis. As such, the methods described in
this paper have broad applicability and should be useful for further pan-genomic comparisons in the future. There are a number of opportunities to build upon the work performed in this study. For instance, it would be interesting to further characterize proteins that are found in only Erismodegib a single isolate of a given genus (singlets). Our research revealed that the isolates of most genera contain, on average, hundreds of singlets. This phenomenon could be further described by answering questions like: how much variation is there in the number ADP ribosylation factor of singlets in isolates of the same genus? Do isolates inhabiting certain environments possess more singlets than other isolates? Do singlets tend to be biased toward any particular functional category
of protein? Another avenue for future work would be to enhance our study of the relationship between protein PND-1186 content similarity and 16S rRNA gene similarity. Despite the existence of usually-consistent lower bounds for 16S rRNA gene similarity for isolates of the same genus, in this study we were unable to determine corresponding bounds for protein content similarity. However, we considered only absolute measures of protein content (i.e. absolute numbers of shared proteins or average unique proteins), and it would also be worthwhile to devise biologically meaningful bounds using a relative measure that could take into account factors like the proteome sizes of the individual isolates, the number of individual isolates, and so on. Finally, perhaps the most obvious opportunity for future work is simply to repeat the analyses described in this paper when more genome sequences become available.