← Back to Blog

Galaxy: the first 5,000 pubs

The Galaxy Publication Library hits a milestone

We reached 5,000 publications in the Galaxy Publication Library last week. The library tracks publications that use, extend, implement or reference Galaxy or a Galaxy server. It includes journal articles, theses, and a couple of odds and ends. This milestone is a good opportunity to look at what the library tells us about where the Galaxy project has been, and maybe where it's going too.

The library was started December 2011, when the first 168 galaxy related publications were added and classified using 8 tags. This included all project publications plus every pub that ad hoc literature searches could find at the time. The library started on CiteULike and stayed there until September 2017, when we moved it to Zotero. The library grew to 4500 papers during that time.

The library uses tags to indicate how the publication relates to Galaxy. See below for an explanation and history of the tags.


Publications and Tags Over Time

Year # Pubs Meth-ods Work-bench Use-Public Use-Main Tools Ref-Public Use-Local Is-Galaxy Cloud Repro-duci-bility Oth-er Shar-ed Un-known How-To Pro-ject Visu-aliza-tion Use-Cloud
2005 2 1 1
2006 4 3 1
2007 12 2 7 1 2 2
2008 32 15 12 1 2 2 1 1
2009 52 26 18 3 2 1 1 4 1
2010 107 50 36 1 5 1 1 7 3 5
2011 205 93 70 1 8 16 3 6 8 3 4 6 1
2012 396 196 128 1 3 29 3 15 13 7 9 12 10 12 10 2
2013 499 262 150 16 91 37 10 28 27 22 8 9 22 13 7 6 3 2
2014 733 329 224 59 97 67 28 42 47 40 25 40 23 7 11 7 8 1
2015 920 468 232 137 116 67 51 57 48 48 23 34 23 14 8 11 7 2
2016 1104 567 243 206 113 70 109 72 46 35 43 49 20 14 19 7 9 3
2017 929 541 156 207 107 57 75 71 53 19 53 22 16 18 5 5 4 1
2018 5 4 3 1 1
Total 5000 2553 1280 629 529 339 274 274 261 181 165 165 126 92 73 61 35 9

Trends

Trends in the publication library reflect the trajectory of the Galaxy Project over the last 6 years.

Methods

The most obvious "trend" is that there are a lot of pubs using Galaxy in their methods. Just over half of all the publications mention Galaxy in their methods section. This trend doesn't show any sign of slowing down.

UseMain, UsePublic, UseLocal, and UseCloud

Not all Methods paper say which Galaxy instance(s) they used. But starting in 2013, papers that do mention this are also tagged with UseMain, UsePublic, UseLocal, and/or UseCloud tags (see Tags below for an explanation of all tags).

The relative number of UseMain and UsePublic pubs highlights the increasing availability of publicly accessible Galaxy servers. In 2013-2014, there are 2 1/2 times as many UseMain pubs as UsePublic pubs. In 2015 they were about the same, and in 2016-2017, there are nearly twice as many UsePublic pubs as there are UseMain pubs. This rise reflects the increase in available public servers from 21 servers at the start of 2012 to over 90 servers, and 6 services today.

Reproducibility Rising

The last trend I want to highlight is about Reproducibility. Reproducibility has been a core value of Galaxy since at least 2011. The Reproducibilty topic has seen a nearly 3 fold increase since then. There were 21 pubs in all of 2011-2013, compared to 53 pubs in 2017 thus far and Reproducibility has gone from 2.1% of papers to 5.7% of papers in the same time.

Publications per year

Publications published in each year, as of 2017/10

The number of publications that reference Galaxy each year has increased every year since the project started. It took over three and a half years to reach 2,500 publications but only a little over two more years to add the next 2,500 publications.

Journals

The library can also tell us which journals are most popular. Here's the top 20:

Rank Journal #
1 PLOS ONE 277
2 BMC Genomics 186
3 Nucleic Acids Research 181
4 Bioinformatics 159
5 BMC Bioinformatics 117
6 Scientific Reports 96
7 PLOS Genetics 63
8 Genome Announcements 56
8 Genome Biology 56
8 Genome Research 56
11 Nature Communications 51
12 Briefings in Bioinformatics 46
13 Proceedings of the National Academy of Sciences 44
14 Molecular Ecology 39
15 Cell 36
16 Future Generation Computer Systems 34
17 PLOS Computational Biology 32
18 Cell Reports 31
18 Nature 31
20 Concurrency and Computation: Practice and Experience 29

Eight of the top ten journals are open access. If you are curious about the remaining 1,045 journals they can be found here.

The 5,000th Pub?

The 5,000th pub is

Variations in oral microbiota associated with oral cancer, Hongsen Zhao, Min Chu, Zhengwei Huang, Xi Yang, Shujun Ran, Bin Hu, Chenping Zhang & Jingping Liang. Scientific Reports 7, Article number: 11773 (2017) doi:10.1038/s41598-017-11779-9

Which is an exemplar 5,000th publication: It's a Methods paper, by far the most popular topic tag; a UsePublic paper, an ascendant topic tag; and a >Huttenhower paper, the most frequently referenced public server tag. And it's open access too. See the paper's zotero entry for more.

The Future

If current trends continue we'll hit 10,000 publications sometime in 2021. Look for the update.

Thanks for using Galaxy,

Dave Clements



More on Tags

We've used Topic Tags since the beginning of the library to track how publications relate to Galaxy. Since the move to Zotero, we've also added Galaxy Featured Tags, Public Server Tags, and Publisher Tags. They are all explained here.

Topic Tags

Topic tags indicate how the publication relates to Galaxy. Here's the current set and when each tag was added:

Tag Explanation Year
+HowTo Papers about how to use Galaxy for specific analyses. These are tutorials. 2011
+IsGalaxy Publications about Galaxy itself or installations of Galaxy. 2011
+Methods Uses Galaxy in their methods. 2011
+Other Publications that don't fit well under any other tag. 2011
+Project Publications with a Galaxy team member as an author. 2011
+Reproducibility Reproducibility and persistence in science. 2011
+Shared Publications that have published workflows, histories, datasets, pages, or visualizations in a Galaxy instance. 2011
+Workbench Publication mentions Galaxy as a platform. 2011
+Tools Tools that run in, have been ported to, or interact with Galaxy 2012
+Cloud Publications referencing / extending / discussing Galaxy in a cloud context. 2013
+RefPublic References a publicly accessible Galaxy instance or a Galaxy service. This is distinct from the +UsePublic tag. 2013
+Unknown Publications that we know refer to Galaxy, but we aren't sure how because they are behind a paywall we don't have access to. These are revisited periodically. 2013
+UseCloud Uses a custom built cloud based instance of Galaxy in its methods. 2013
+UseLocal Uses a local installation of Galaxy in its methods. 2013
+UseMain Uses the project's public server, usegalaxy.org (a.k.a. Main, in its methods. 2013
+UsePublic Uses a publicly accessible Galaxy instance or a Galaxy service in its methods. 2013
+Visualization Publications referencing Galaxy in a visualization and/or visual analytics context. 2013

With the move to Zotero we added two new sets of tags. The first set is used to highlight publications that feature Galaxy prominently:

Tag Explanation
+Galactic Publication is about Galaxy.
+Stellar Publication features Galaxy prominently.

Public server and services tags

The second set of new tags show which public Galaxy server or service is used or discussed in publications. These are tagged with the server's name, preceded by a ">". For example, the >RepeatExplorer tag lists all papers that use or reference the RepeatExplorer public server.

Publisher Tags

Zotero is configured to also add any keywords it can detect automatically when the publication is added. These tags are not rationalized in any way, and tend to describe the research topic or domain. Prosapip1 and Genome evolution are examples.

Retroactive Tagging?

These tags were added over a 6 year period. Are older papers back-tagged when new tags are added? Mostly not, but there are some exceptions:

  • Galaxy Featured Tags exist back to the beginning of time. (These were converted from CiteULike's priority feature.)
  • Topic and Public Server/Service tags have been applied to older publications on a selected basis.

Therefore, don't look for a lot of +UseMain or +Cloud tagged papers from before 2013.