The April 2015 Galactic News!
Welcome to the April 2015 Galactic News, a summary of what is going on in the Galaxy community. These newsletters complement the Galaxy Development News Briefs which accompany new Galaxy releases and focus on Galaxy code updates.
New Papers
68 new papers referencing, using, extending, and implementing Galaxy were added to the Galaxy CiteULike Group in March, bring the total to 2200 publications. Some highlights:
- Transcriptionally Active Regions Are the Preferred Targets for Chromosomal HPV Integration in Cervical Carcinogenesis, by Irene Kraus K. Christiansen, Geir Kjetil K. Sandve, Martina Schmitz, Matthias Dürst, Eivind Hovig; PloS One, Vol. 10, No. 3. (2015)
- Reproducible Analysis of Sequencing-Based RNA Structure Probing Data with User-Friendly Tools, by Lukasz J. Kielpinski, Nikolaos Sidiropoulos, Jeppe Vinther, Methods in Enzymology (2015), doi:10.1016/bs.mie.2015.01.014
- The Globus Galaxies Platform: Delivering Science Gateways as a Service, by Madduri, R, Chard, K, Chard, R, Lacinski, L, Rodriguez, A, Sulakhe, D, Kelly, D, Dave, U, Foster, I, Argonne National Laboratory Report, 2015
The new papers were tagged with:
# | Tag | # | Tag | # | Tag | # | Tag | |||
---|---|---|---|---|---|---|---|---|---|---|
2 | Cloud | - | Project | 6 | Tools | 12 | UsePublic | |||
1 | HowTo | 5 | RefPublic | - | UseCloud | 1 | Visualization | |||
1 | IsGalaxy | 1 | Reproducibility | 6 | UseLocal | 19 | Workbench | |||
33 | Methods | 2 | Shared | 7 | UseMain |
Events
April GalaxyAdmins Meetup
The next GalaxyAdmins online meetup will be 16 April. Carrie Ganote from the National Center for Genome Analysis Support (NCGAS) and Pervasive Technology Institute at Indiana University will talk about her Galaxy work with Trinity, IU Galaxy, and the Open Science Grid.
See the meetup page for more.
GalaxyAdmins is a special interest group for Galaxy community members who are responsible for Galaxy installations.
Galaxy Workshop Tokyo, April 28
The Galaxy Workshop Tokyo 2015 is a full day of hands-on training, keynotes, lightning talks and discussions all about ways of using Galaxy for high-throughput biology, specially for human genome sequencing. This workshop will consist of two parts: The morning session is a hands-on training to learn how to run Pitagora-Galaxy, our Galaxy preconfigured virtual machine, on your laptop or on AWS cloud. The afternoon session includes keynotes and lightning sessions to explain actual workflows, which you can try immediately on Pitagora-Galaxy.
GCC2015: 6-8 July, Norwich UK
The 2015 Galaxy Community Conference (GCC2015) is the Galaxy community's annual gathering of users, developers, and administrators. Previous GCC's have drawn over 200 participants, and we expect that to happen again in 2015. GCC2015 is being hosted by The Sainsbury Lab in Norwich, UK, immediately before BOSC and ISMB/ECCB in Dublin.
There are a lot of events going on at GCC2015, including:
- Code Hackathon, 4-5 July
- Data Wrangling Hackathon, 4-5 July (new)
- Training SunDay, 5 July (new)
- Training Day, 6 July
- GCC Meeting, 7-8 July
Code Hackathon
An intense two-day hands-on collaboration to develop working code that is useful to the Galaxy community. If you know how to code, and want to contribute to one of the most successful open source projects in the life sciences, then please consider attending. See the Code Hackathon home page for more.
Data Wrangling Hackathon
An intense two-day hands-on collaboration to develop cutting edge analysis pipelines that are useful to the Galaxy community. If you know data analysis, we would love to have you here to help us beat back those seemingly unsurmountable analysis challenges. See the Data Wrangling Hackathon home page for more.
Training SunDay
Something new for GCC2015 is Training SunDay, an additional day of training offered the day before its older sibling Training Day, and featuring a single track with the most in-demand topics. You can attend both Training Days, or just one. Training SunDay features these three topics:
These three topics are also offered on Monday as well. You can register for one or both Training Days.
Training (Mon)Day
The schedule for Training Day, Monday, 6 July is available. Training Day featuring five parallel tracks, each with three, two and a half hour workshops. There are topics on using Galaxy, interacting with it programmatically, and deploying, administering, and extending Galaxy. No matter what you do with Galaxy, there are workshops for you.
Early Registration opens ...
Early registration (save heaps) will open in April, we promise. Early registration is very affordable and starts at less than £40 per day for students and postdocs. If you work in data-intensive life science research, then it is hard find a meeting more relevant than GCC2015. We look forward to seeing you there.
Paper Abstract Submission Extended to April 20
Abstract submission for Oral and Poster Presentations at the 2015 Galaxy Community Conference (GCC2015) is now open.
Abstract submission for oral presentations closes 10 20 April, while poster submission closes 1 May. Poster authors will be notified of acceptance status within two weeks of submission, while oral presentation authors will be notified no later than 4 May. Please consider presenting your work. If you are dealing with big biological data, then this meeting wants to hear about it.
GCC2015 Sponsorships
We are pleased to announce a joint GCC2015 Platinum Sponsorship from SGI, Intel, and Kelway. Please welcome Intel and Kelway to the community, and welcome SGI back!
Call for Sponsors
The 2015 Galaxy Community Conference (GCC2015) is still accepting Sponsorships. Your organisation can play a prominent part in the Galaxy community by sponsoring GCC2015. Sponsorship is an excellent way to raise your organization’s visibility.
Several sponsorship levels are available, including two levels of premier sponsorships that include presentations. Premium sponsorships are limited, however, so you are encouraged to act soon.
Please let the organisers ([gcc2015-org AT lists DOT galaxyproject DOT org](mailto:gcc2015-org AT lists DOT galaxyproject DOT org)) know if you are interested in helping make this event a success.
Other Events
There are upcoming events in 8 countries on 4 continents. See the Galaxy Events Google Calendar for details on other events of interest to the community.
Designates a training event offered by GTN member(s) |
Who's Hiring
The Galaxy is expanding! Please help it grow.
- Graph-based Genomes: PhDs in Oslo
- Computational Metabolomics Professor, Penn State University, Pennsylvania, United States
- Software Developer for the Refinery Team, Boston, Massachusetts, United States
- The Galaxy Project is hiring software engineers and post-docs
Got a Galaxy-related opening? Send it to outreach@galaxyproject.org and we'll put it in the Galaxy News feed and include it in next month's update.
New Public Servers
One new public Galaxy server was added in March:
Vinther Lab
-
Links:
- Vinther Lab server
- Reproducible Analysis of Sequencing-Based RNA Structure Probing Data with User-Friendly Tools by Lukasz Jan Kielpinski, Nikolaos Sidiropoulos, Jeppe Vinther, * Methods in Enzymology*, DOI: 10.1016/bs.mie.2015.01.014:
-
Domain/Purpose:
- RNA structure-probing data analysis to "improve the prediction of RNA secondary and tertiary structure and allow structural changes to be identified and investigated."
-
Comments:
- From Kielpinski, et al.: a collection of tools, which allow raw sequencing reads to be converted to normalized probing values using different published strategies. In addition, we also provide tools for visualization of the probing data in the UCSC Genome Browser and for converting RNA coordinates to genomic coordinates and vice versa. The collection is implemented as functions in the R statistical environment and as tools in the Galaxy platform, making them easily accessible for the scientific community.
- User Support:
-
Quotas:
- Must have a login to use the server; anyone can create a login.
-
Sponsor(s):
Galaxy Community Hubs
Share your training resources and experience now | Share your experience now |
One new Community Log Board entry was added in March:
Releases
March 2015 Galaxy Release (v 15.03)
Complete Release Notes
Highlights
Starting with this distribution, an updated Galaxy release versioning system has been implemented. The versioning scheme is Ubuntu-style.
Galaxy development has moved to Github, but stable/release changes are mirrored to Bitbucket. Deployers can continue to use Bitbucket as they have done in the past. Release branches discussed in the full release notes.
Much of Galaxy’s core tool set has been redesigned. Several contain new functionality. These tools are included in the Tool Shed and many are ready for use on Galaxy Main.
Get the Galaxy Release
getgalaxy.org | ||
galaxy-dist.readthedocs.org | ||
bitbucket.org/galaxy/galaxy-dist | ||
new: | $ hg clone https://bitbucket.org/galaxy/galaxy-dist#stable |
|
upgrade: | $ hg pull $ hg update latest_15.03 |
Thanks for using Galaxy!
The Galaxy Team
BioBlend v0.5.3 Released
BioBlend v0.5.3 has been released. BioBlend is a python library for interacting with CloudMan and the Galaxy API. (CloudMan offers an easy way to get a personal and completely functional instance of Galaxy in the cloud in just a few minutes, without any manual configuration.)
This is mostly an incremental bug fix release with the following summary of changes:
- Project source moved to new URL - https://github.com/galaxyproject/bioblend
- Huge improvements to automated testing, tests now run against Galaxy release_14.02 and all later versions to ensure backward compatibility (see travis.yml for details).
- Many documentation improvements (thanks to Helena Rasche).
- Add Galaxy clients for the tool data tables, the roles, and library folders (thanks to Anthony Bretaudeau).
- Add method to get the standard error and standard output for the job corresponding to a Galaxy dataset (thanks to Anthony Bretaudeau).
- Add
get_state()
method toJobsClient
. - Add
copy_from_dataset()
method toLibraryClient
. - Add
create_repository()
method toToolShedClient
(thanks to Helena Rasche). - Fix
DatasetClient.download_dataset()
for certain proxied Galaxy deployments. - Make
LibraryClient._get_root_folder_id()
method safer and faster for Galaxy release_13.06 and later. - Deprecate and ignore invalid deleted parameter to
WorkflowClient.get_workflows()
. - CloudMan: Add method to fetch instance types.
- CloudMan: Update cluster options to reflect change to SLURM.
- BioBlend.objects: Deprecate and ignore invalid deleted parameter to
ObjWorkflowClient.list()
. - BioBlend.objects: Add
paste_content()
method toHistory
objects. - BioBlend.objects: Add
copy_from_dataset()
method androot_folder
property toLibrary
objects. - BioBlend.objects: Add
container
anddeleted
attributes toFolder
objects. - BioBlend.objects: Set the
parent
attribute of aFolder
object to its parent folder object (thanks to John M. Eppley). - BioBlend.objects: Add
deleted
parameter tolist()
method of libraries and histories. - BioBlend.objects: Add
state
andstate_details
attributes toHistory
objects (thanks to Gianmauro Cuccuru). - BioBlend.objects: Rename
upload_dataset()
method toupload_file()
for History objects. - BioBlend.objects: Rename
input_ids and output_ids
attributes ofWorkflow objects
tosource_ids
andsink_ids
respectively. - Add
run_bioblend_tests.sh
script (useful for Continuous Integration testing).
Enjoy and please let us know what you think,
Enis & John & Nicola Soranzo & Simone Leo & Helena Rasche
Planemo 0.6.0
Planemo 0.6.0 was released in March. The Release Notes:
- Many enhancements to the tool building documentation - descriptions of macros, collections, simple and conditional parameters, etc…
- Fix tool_init to quote file names (thanks to Peter Cock). Pull Request 98.
- Allow ignoring file patterns in .shed.yml (thanks to Björn Grüning). Pull Request 99
- Add --macros flag to tool_init command to generate a macro file as part of tool generation. ec6e30f
- Add linting of tag order for tool XML files. 4823c5e
- Add linting of stdio tags in tool XML files. 8207026
- More tests, much higher test coverage. 0bd4ff0
Planemo is a set of command-line utilities to assist in building tools for the Galaxy project
CloudMan and blend4j
New versions CloudMan, and blend4j were released in August.
Other News
- From Björn Grüning: New release of our Galaxy Docker container with a lot of new features
- You can browse our tutorials section by visiting this biostar.usegalaxy.org/t/Tutorial/ (Note: Deprecated. Now see training.galaxyproject.org)
- Should Galaxy use Trello or Github for issue tracking?
- Every contribution and every pull request to the Galaxy repo is now being publicly built on Travis-CI.
- Galaxy Training Network (GTN) Joins GOBLET
ToolShed Contributions
A best practices for creating Galaxy Tools is now available on this wiki. Thanks to the many contributors who created it.
Galaxy Project ToolShed Repos
Note: Starting with the May news, this list will be placed on a separate page and linked to from here. This section is just getting too long (which is the kind of problem we want to have :-).
Suites
-
From biomonika:
- suite_linkyx_1_0: Metapackage for the installation of linkyx suite of tools.
-
From devteam:
- suite_vcflib_14_08: A collection of Galaxy wrappers for tools for manipulation of VCF files from 8/2014
Tools
-
From galaxyp:
- ms_wiff_loader: Loads AB Sciex wiff files from URLs into a Galaxy Wiff Composite dataset
- ms_data_converter: Converts WIFF format MS data to mzML or MGF using AB SCIEX MS Data Converter
-
From rnateam:
- kinwalker: The Kinwalker algorithm performs cotranscriptional folding of RNAs, starting at a user a specified structure (default: open chain) and ending at the minimum free energy structure. Folding events are performed between transcription of additional bases and are regulated by barrier heights between the source and target structure
-
vienna_rna: The ViennaRNA Package consists of several stand-alone programs for the prediction and comparison of RNA secondary structures. RNA secondary structure prediction through energy minimization is the most used function in the package. We provide three kinds of dynamic programming algorithms for structure prediction: the minimum free energy algorithm of (Zuker & Stiegler 1981) which yields a single optimal structure, the partition function algorithm of (McCaskill 1990) which calculates base pair probabilities in the thermodynamic ensemble, and the suboptimal folding algorithm of (Wuchty et.al 1999) which generates all suboptimal structures within a given energy range of the optimal energy. For secondary structure comparison, the package contains several measures of distance (dissimilarities) using either string alignment or tree-editing (Shapiro & Zhang 1990). Finally, we provide an algorithm to design sequences with a predefined structure (inverse folding). In case you are using our software for your publications you may want to cite:
Lorenz, Ronny and Bernhart, Stephan H. and H\u00f6ner zu Siederdissen, Christian and Tafer, Hakim and Flamm, Christoph and Stadler, Peter F. and Hofacker, Ivo L.
ViennaRNA Package 2.0, Algorithms for Molecular Biology, 6:1 26, 2011, doi:10.1186/1748-7188-6-26
-
From izsam:
- phylogeny_converter: Converts different file formats (FASTA, GenBank, phylip, nexus) to allow data-exchange from different phylogeny tools.
-
From biomonika:
- trinityrnaseq: De novo assembly of RNA-Seq data Using Trinity Contains Trinity only. See trinityrnaseq in Test Tool Shed for additional tools.
-
From jjkoehorst:
- sapp: GBK2RDF Semantic Annotation Platform for Prokaryotes. It might take a while but I will try to make for each module in the SAPP paper a galaxy tool shed module.
-
From fastaptamer:
- fastaptamer_cluster: Cluster closely-related sequences using Levenshtein edit distance. FASTAptamer-Cluster uses the Levenshtein algorithm to cluster together closely-related sequences based on a user-defined edit distance (the minimum number of insertions, deletions, or subsitutions required to transform one string into another).
- fastaptamer_count: Count, rank, sort and normalize sequence reads in a selection population. FASTAptamer-Count serves as the gateway to the FASTAptamer suite of bioinformatics tools for combinatorial selections (aptamers, (deoxy)ribozymes, phage display, direct mutagenesis, etc.). For a given FASTQ input file, FASTAptamer-Count will determine the number of times each sequence was read, normalize sequence frequency to reads per million, and rank and sort sequences by decreasing total reads.
- fastaptamer_compare: Compare sequence distribution between two populations. FASTAptamer-Compare facilitates statistical analysis of two populations by rapidly generating a tab-delimited output file that lists each unique sequence along with RPM (reads per million) in each population file (if available) and log(2) of the ratio of their RPM values in each population. RPM data for both populations can be utilized to generate an XY-scatter plot of sequence distribution across two populations. FASTAptamer-Compare also facilitates the generation of a histogram of the sequence distribution by creating 102 bins for the log(2) values. This histogram can provide a quick visual comparison of the two populations: distributions centered around 0 indicate similar populations, while distributions shifted to the left or right indicate overall enrichment or depletion.
- fastaptamer_search: Degenerate nucleotide motif searching. FASTAptamer-Search searches for degenerate nucleotide motifs within a FASTA file.
- fastaptamer_enrich: Calculate fold-enrichment of each sequence across populations. FASTAptamer-Enrich rapidly calculates "fold-enrichment" values for each sequence across two or three input files. Output is provided as a tab-delimited file and is formatted to include sequence composition, length, rank, reads, reads per million (RPM), cluster information (if available) and enrichment values for each sequence.
-
From bgruening:
-
diamond: DIAMOND is a new high-throughput program for aligning a file of short reads against a protein reference database such as NR, at 20,000 times the speed of BLASTX, with high sensitivity
Repository-Maintainer: Bjoern Gruening
Repository-Development: https://github.com/bgruening/galaxytools
-
- text_processing: High performance text processing tools using the GNU coreutils, sed, awk and friends. That repository contains all kind of different text processing tools.
- awk - The AWK programmning language ( http://www.gnu.org/software/gawk/ )
- sed - Stream Editor ( http://sed.sf.net )
- grep - Search files ( http://www.gnu.org/software/grep/ )
- sort_columns - Sorting every line according to there columns
-
GNU Coreutils programs ( http://www.gnu.org/software/coreutils/ ):
- sort - sort files
- join - join two files, based on common key field.
- cut - keep/discard fields from a file
- unsorted_uniq - keep unique/duplicated lines in a file
- sorted_uniq - keep unique/duplicated lines in a file
- head - keep the first X lines in a file.
-
tail - keep the last X lines in a file.
Originally known as "Unix Tools" and developed from Assaf Gordon @ Greg Hannon's lab ( http://hannonlab.cshl.edu ) in Cold Spring Harbor Laboratory, it is now hosted under https://github.com/bgruening/galaxytools/tree/master/unix_tools and open for contributions. It will also replace several smaller sed, sort and uniq wrappers, developed over the time.
Repository-Maintainer: Bjoern Gruening
Repository-Development: https://github.com/bgruening/galaxytools
- data_manager_diamond_database_builder: Diamond data manager
- find_genes_located_nearby_workflow: Galaxy workflow for the identification of candidate genes clusters This approach screens two proteins against all nucleotide sequence from the NCBI nt database within hours on our cluster, leading to all organisms with an interesting gene structure for further investigation. As usual in Galaxy workflows every parameter, including the proximity distance, can be changed and additional steps can be easily added. For example additional filtering to refine the initial BLAST hits, or inclusion of a third query sequence.
- find_three_genes_located_nearby_workflow: Galaxy workflow for the identification of candidate genes clusters with three known genes This approach screens three proteins against a given genome sequence, leading to a genome position were all three genes are located nearby. As usual in Galaxy workflows every parameter, including the proximity distance, can be changed and additional steps can be easily added. For example additional filtering to refine the initial BLAST hits, or inclusion of a third query sequence.
-
find_subsequences: Searches for a subsequence in a larger sequence. For example to get all restriction enzymes for BamH1. Searches for a subsequence in a larger sequence. For example to get all restriction enzymes for BamH1.
This tool is based on biopython: 10.1093/bioinformatics/btp163
Repository-Maintainer: Bjoern Gruening
Repository-Development: https://github.com/bgruening/galaxytools -
From iuc:
-
macs2: MACS - Model-based Analysis of ChIP-Seq. With the improvement of sequencing techniques, chromatin immunoprecipitation followed by high throughput sequencing (ChIP-Seq) is getting popular to study genome-wide protein-DNA interactions. To address the lack of powerful ChIP-Seq analysis method, we present a novel algorithm, named Model-based Analysis of ChIP-Seq (MACS), for identifying transcript factor binding sites. MACS captures the influence of genome complexity to evaluate the significance of enriched ChIP regions, and MACS improves the spatial resolution of binding sites through combining the information of both sequencing tag position and orientation. MACS can be easily used for ChIP-Seq data alone, or with control sample with the increase of specificity.
Repository-Maintainer: Bjoern Gruening
Repository-Development: https://github.com/iuc/galaxytools
-
-
seqtk: toolkit for processing FASTA and FASTQ files. Seqtk is a fast and lightweight tool for processing sequences in the FASTA or FASTQ format. It seamlessly parses both FASTA and FASTQ files which can also be optionally compressed by gzip.
https://github.com/lh3/seqtk/
Repository-Maintainer: Helena Rasche
Repository-Development: https://github.com/galaxy-iuc/tool_shed -
From hogart:
-
unafold: Galaxy Tool wrapper for UNAFold This is the Galaxy wrapper for the UNAFold (http://mfold.rna.albany.edu/?q=DINAMelt/software). UNAFold software was developed for nucleic acid folding and hybridization prediction (doi: 10.1007/978-1-60327-429-6_1, doi: 10.1093/nar/gki591)
Note: the UNAFold requires a big amount of RAM - e.g. the folding of 43 kb RNA eats near 30 GB of memory. So, after the installation of this wrapper you will need to modify the job_conf.xml of your Galaxy instance properly. Also, please be sure that zip datatype is registered as binary datatype in your Galaxy instance.
-
-
From steffen:
-
covenntree: CoVennTree compares up to three rooted trees at the same time. CoVennTree (Comparative weighted Venn Tree) is a software to analyze and compare up to three datasets. Unlike other methods, CoVennTree correlates data on the leaf level and transfers this information to the root node. CoVennTree works with numbers to compute weighted Venn diagrams for each node in the graph (rooted tree). Therefore any kind of input data can be processed as long as the data structure will be taken into account.
http://journal.frontiersin.org/Journal/10.3389/fgene.2015.00043/abstract
-
-
From ngsplot:
- ngsplot: ngs.plot is a program that allows you to easily visualize your next-generation sequencing (NGS) samples at functional genomic regions. This galaxy implementation of ngs.plot has been tested to work with ngs.plot v2.47.1. For instructions on the system installation of ngs.plot, please see https://github.com/shenlab-sinai/ngsplot.
-
From devteam:
-
vcfhethom: Count the number of heterozygotes and alleles, compute het/hom ratio. This tool perfoms three basic calculations:
- Computes the number of heterozygotes
- Computes the ratio between heterozygotes and homozygotes 1. Computes the total number of alleles in the input dataset
- vcfselectsamples: Select samples from a VCF file Allows selecting samples from a VCF dataset. This tool combines vcfkeepsamples and vcfremovesamples from VCFlib package into a single utility.
- vcfleftalign: Left-align indels and complex variants in VCF dataset Left-aligns variants in VCF dataset. Window size is determined dynamically according to the entropy of the regions flanking the indel. These must have entropy > 1 bit/bp, or be shorter than ~5kb.
- vcfannotategenotypes: Annotate genotypes in a VCF dataset using genotypes from another VCF dataset. Annotates genotypes in the first file with genotypes in the second adding the genotype as another flag to each sample filed in the first file. Annotation-tag is the name of the sample flag which is added to store the annotation. also adds a 'has_variant' flag for sites where the second file has a variant.
-
vcfbreakcreatemulti: Break multiple alleles into multiple records, or combine overallpoing alleles into a single record This tool breaks or creates multiallelic VCF records based on user selection.
- Breaking = If multiple alleles are specified in a single record, break the record into multiple lines, preserving allele-specific INFO fields.
- Creation = If overlapping alleles are represented across multiple records, merge them into a single record.
- vcfgenotypes: Convert numerical representation of genotypes to allelic. Converts numerical representation of genotypes (standard in GT field) to the alleles provided in the call's ALT/REF fields.
- vcfaddinfo: Adds info fields from the second dataset which are not present in the first dataset. Adds info fields from the second dataset which are not present in the first dataset.
- vcfsort: Sort VCF dataset by coordinate This tools uses native UNIX sort command to order VCF dataset in coordinate order.
- vcfbedintersect: Intersect VCF and BED datasets Computes intersection between a VCF dataset and a set of genomic intervals defined as either a BED dataset (http://genome.ucsc.edu/FAQ/FAQformat.html#format1) or a manually typed interval (in the form of chr:start-end).
- vcf2tsv: Converts VCF files into tab-delimited format Converts stdin or given VCF file to tab-delimited format, using null string to replace empty values in the table. Specifying -g will output one line per sample with genotype information. A part of the vcflib utilities developed by Erik Garrison (https://github.com/ekg/vcflib).
- vcfcheck: Verify that the reference allele matches the reference genome Verifies that the VCF REF field matches the reference as described
- vcffixup: Count the allele frequencies across alleles present in each record in the VCF file. Uses genotypes from the VCF file to correct AC (alternate allele count), AF (alternate allele frequency), NS (number of called), in the VCF records.
- data_manager_bwa_mem_index_builder: Data Manager for building BWA (0.6+) indexes Data Manager for building BWA (0.6+) indexes.
- vcfrandomsample: Randomly sample sites from VCF dataset Randomly sample sites from an input VCF file. Scale the sampling probability by the field specified by --scale-by (see advanced controls). This may be used to provide uniform sampling across allele frequencies, for instance (AF field in this case).
- vcfgeno2haplo: Convert genotype-based phased alleles into haplotype alleles Convert genotype-based phased alleles within a window size specified by -w option into haplotype alleles. Will break haplotype construction when encountering non-phased genotypes on input.
- data_manager_fetch_genome_dbkeys_all_fasta: Allows optionally defining a new DBKEY and retrieves a FASTA file and populate the all_fasta.loc data table.
- samtools_bedcov: Calculate read depth on BAM files This tool uses the SAMTools toolkit to produce read depth per BED region.
- vcfcommonsamples: Output records belonging to samples commong between two datasets. Outputs each record in the first file, removing samples not present in the second.
- vcfvcfintersect: Intersect two VCF datasets Computes intersections and unions for two VCF datasets. Unifies equivalent alleles within window-size bp.
- vcfcombine: Combine multiple VCF datasets Combines VCF files positionally, combining samples when sites and alleles are identical. Any number of VCF files may be combined. The INFO field and other columns are taken from one of the files which are combined when records in multiple files match. Alleles must have identical ordering to be combined into one record. If they do not, multiple records will be emitted.
- vcfdistance: Calculate distance to the nearest variant. Adds a value to each VCF record indicating the distance to the nearest variant in the file. The dataset used as input to this tool must be coordinate sorted. This can be achieved by either using the VCFsort utility or Galaxy's general purpose sort tool (in this case sort on the first and the second column in ascending order).
- vcfflatten: Removes multi-allelic sites by picking the most common alternate Removes multi-allelic sites by picking the most common alternate. Requires allele frequency specification 'AF' and use of 'G' and 'A' to specify the fields which vary according to the Allele or Genotype.
- vcfprimers: Extract flanking sequences for each VCF record For each VCF record, extract the flanking sequences, and write them to stdout as FASTA records suitable for alignment. This tool is intended for use in designing validation experiments. Primers extracted which would flank all of the alleles at multi-allelic sites.
- vcffilter: Tool for filtering VCF files A vcflib-based tool for flexible filtering of VCF datasets on a variety of tags. This is a galaxy wrapper for vcffilter utility from vcflib package.
- vcfallelicprimitives: Splits alleleic primitives (gaps or mismatches) into multiple VCF lines If multiple alleleic primitives (gaps or mismatches) are specified in a single VCF record, this tools splits the record into multiple lines, but drops all INFO fields. "Pure" MNPs are split into multiple SNPs unless the -m flag is provided. Genotypes are phased where complex alleles have been decomposed, provided genotypes in the input.
- vcfannotate: Intersect VCF records with BED annotations Intersects the records in the VCF file with targets provided in a BED file. Intersections are done on the reference sequences in the VCF file.
-
-
From damion:
-
blast_reporting: Provides filtered, sorted HTML and tabular reports of Blast XML format search results NCBI BLAST+ searches can output in a range of formats, but in the past only the XML format included fields like sequence description. This tool converts the NCBI BLAST XML report into 12, 24, 26 or custom column tabular and HTML reports. It is used as a command-line tool or via its Galaxy tool.
The tool allows almost complete control over which fields are displayed and filtered, how columns are named, and how the HTML report on each query is sectioned. Search result records can be filtered out based on values in numeric or textual fields. Matches (by accession id) to a selection of reference databases can be shown, and this can include a description of the matched sequence.
-
-
ffp_phylogeny: calculating Feature Frequency Profiles (FFP) from fasta sequence and text data. FFP (Feature frequency profile) is an alignment free comparison tool for phylogenetic analysis and text comparison. It can be applied to nucleotide sequences, complete genomes, proteomes and even used for text comparison. This tool calculates FFP on one or more fasta sequence or text datasets. It prepares a mini pipeline consisting of [ffpry | ffpaa | ffptxt] > [ ffpfilt | ffpcol > ffprwn] > ffpjsd > ffptree
The original command line ffp-phylogeny code is at http://ffp-phylogeny.sourceforge.net/ . This tool uses Aaron Petkau's modified version: https://github.com/apetkau/ffp-3.19-custom.
-
From yokofakun:
- jvarkit: Tools from jvarkit https://github.com/lindenb/jvarkit Java utilities for Bioinformatics - Pierre Lindenbaum / @yokofakun https://github.com/lindenb/jvarkit
-
From kosrou:
- ngs_plot: ngs plot Novel tool to visualise next generation sequencing data around TSS, gene bodies etc
-
From dereeper:
- admixture: admixture: fast ancestry estimation ADMIXTURE is a software tool for maximum likelihood estimation of individual ancestries from multilocus SNP genotype datasets. It uses the same statistical model as STRUCTURE but calculates estimates much more rapidly using a fast numerical optimization algorithm.
- snpeff_from_gff_vcf: snpeff v4.0 from VCF, fasta reference and GFF files snpeff v4.0 from VCF, fasta reference and GFF files
- sniplay: SNiPlay3: a package for exploration and large scale analyses of SNP polymorphisms (filtering, density, vcftools, diversity, linkagedisequilibrium, GWAS) SNiPlay3: a package for exploration and large scale analyses of SNP polymorphisms (filtering, density, vcftools, diversity, linkagedisequilibrium, GWAS)
- tassel5: Software to evaluate traits associations, evolutionary patterns, and linkage disequilibrium. Software to evaluate traits associations, evolutionary patterns, and linkage disequilibrium.
-
From okorol:
- itsx: ITSx -- Identifies ITS sequences and extracts the ITS region ITSx is an open source software utility to extract the highly variable ITS1 and ITS2 subregions from ITS sequences, which is commonly used as a molecular barcode for e.g. fungi.
-
From iracooke:
- protk_proteogenomics: Docker support and update for protk 1.4 Tools for mapping peptides and proteins to genomic coordinates
-
From wolma:
- mimodd_workflows: Some example workflows for use with MiModD The workflows defined here let you automate much of the tutorial analyses from the MiModD documentation (see http://mimodd.readthedocs.org/en/latest/tutorial.html). These example workflows should be easy to customize for your own needs.
- mimodd: MiModD - Identify Mutations from Whole-Genome Sequencing Data installs the MiModD suite of tools for the analysis of genome-wide sequencing data from model organisms along with their Galaxy tool wrappers.
Tool Dependency Definitions
-
From dereeper:
- package_admixture_1_23: package_admixture_1_23 package_admixture_1_23
- package_tassel_5_0: package to evaluate traits associations, evolutionary patterns, and linkage disequilibrium.
- package_plink_1_07: package_plink_1_07
-
From iuc:
-
package_cofold_0_0_1: Contains a tool dependency definition that downloads and compiles CoFold. A tool for prediction of RNA secondary structure that takes co-transcriptional folding into account.
http://www.e-rna.org/cofold/
Repository-Maintainer: Bjoern Gruening
Repository-Development: https://github.com/bgruening/galaxytools/
-
-
package_samtools_1_2: Contains a tool dependency definition that downloads and installs version 1.2 of the SAMTools package. samtools \u2212 Utilities for the Sequence Alignment/Map (SAM) format
Samtools is a set of utilities that manipulate alignments in the BAM format. It imports from and exports to the SAM (Sequence Alignment/Map) format, does sorting, merging and indexing, and allows to retrieve reads in any regions swiftly.
Samtools is designed to work on a stream. It regards an input file \u2018-\u2019 as the standard input (stdin) and an output file \u2018-\u2019 as the standard output (stdout). Several commands can thus be combined with Unix pipes. Samtools always output warning and error messages to the standard error output (stderr).
Samtools is also able to open a BAM (not SAM) file on a remote FTP or HTTP server if the BAM file name starts with \u2018ftp://\u2019 or \u2018http://\u2019. Samtools checks the current working directory for the index file and will download the index upon absence. Samtools does not retrieve the entire alignment file unless it is asked to do so.
Repository-Maintainer: Bjoern Gruening
-
package_gengetopt_2_22_6: Contains a tool dependency definition that downloads and compiles version 2.22.6 of GNU gengetopt Gengetopt is a tool to write command line option parsing code for C programs.
http://www.gnu.org/software/gengetopt/gengetopt.html
Repository-Maintainer: Bjoern Gruening
Repository-Development: https://github.com/galaxyproject/tools-iuc -
package_rnastructure_5_7: Contains a tool dependency definition that downloads and compiles version 5.7 of RNAstructure RNAstructure is a complete package for RNA and DNA secondary structure prediction and analysis. It includes algorithms for secondary structure prediction, including facility to predict base pairing probabilities. It also can be used to predict bimolecular structures and can predict the equilibrium binding affinity of an oligonucleotide to a structured RNA target. This is useful for siRNA design. It can also predict secondary structures common to two, unaligned sequences, which is much more accurate than single sequence secondary structure prediction. Finally, RNAstructure can take a number of different types of experiment mapping data to constrain or restrain structure prediction. These include chemical mapping, enzymatic mapping, NMR, and SHAPE data.
-
package_vcflib_8a5602bf07: Compiled vcflib binaries for x86_64 Binary files in this package are compiled from source code with SHA: 8a5602bf07.
This is package dependency for tools relying on VCFlib toolkit developed by Erik Garrison (https://github.com/ekg/vcflib). This package is distributed as x86_64 binaries only as it is difficult to compile on other platforms. These binaries should work on any of the supported linux platforms other than RHEL/CentOS 5.
-
package_vienna_rna_2_1: Contains a tool dependency definition that downloads and compiles version 2.1 of the Vienna RNA package. The Vienna RNA Package consists of a C code library and several stand-alone programs for the prediction and comparison of RNA secondary structures.
http://www.tbi.univie.ac.at/RNA/
Repository-Maintainer: Bjoern Gruening
Repository-Development: https://github.com/bgruening/galaxytools -
package_diamond_0_6_13: Contains a tool dependency definition that downloads and compiles version 0.6.13 of DIAMOND DIAMOND is a new high-throughput program for aligning a file of short reads against a protein reference database such as NR, at 20,000 times the speed of BLASTX, with high sensitivity
Repository-Maintainer: Bjoern Gruening
Repository-Development: https://github.com/bgruening/galaxytools - package_stringtie_1_0_1: tool dependency definition. Contains a tool dependency definition that downloads and installs version 1.0.1 of the stringtie RNA-seq assembler. StringTie is a fast and highly efficient assembler of RNA-Seq alignments into potential transcripts. It uses a novel network flow algorithm as well as an optional de novo assembly step to assemble and quantitate full-length transcripts representing multiple splice variants for each gene locus. Its input can include not only the alignments of raw reads used by other transcript assemblers, but also alignments longer sequences that have been assembled from those reads.To identify differentially expressed genes between experiments, StringTie's output can be processed either by the Cuffdiff or Ballgown programs.
-
From biomonika:
- package_samtools_0_1_19_custom: custom compilation of samtools 0.1.19 using -fPIC flag custom compilation of samtools 0.1.19 using -fPIC flag
-
From devteam:
-
package_freebayes_0_9_20_b040236: Contains a tool dependency definition that downloads and compiles version 0.9.20 of FreeBayes. Program: freebayes (Bayesian haplotype-based polymorphism discovery and genotyping.)
Version: 0.9.20 (b040236)
-
-
From jjohnson:
-
package_trinityrnaseq_2013_08_14: Contains a tool dependency definition that downloads and compiles version 2013_08_14 of trinity RNA-Seq De novo Assembly Using Trinity adds Trinity.pl to PATH environment variable, set TRINITY_HOME to the installation directory, http://trinityrnaseq.sourceforge.net/index.html
Requires perl compiled to use threads, bioperl, and perl modules: PerlIO::gzip and Bio::DB::Sam
-
Select Updates
Tools
-
From iracooke:
- proteomics_datatypes: Add trafo and qcml, =improved readme
- sixframe_translate: Docker support and update for protk 1.4
- xtandem: Docker support and update for protk 1.4
- make_protein_decoys: Docker support and update for protk 1.4
- make_protein_decoys: Docker support and update for protk 1.4
- omssa: Docker support and update for protk 1.4
- mascot: Docker support and update for protk 1.4
- mascot: Make Trypsin default
- tpp_prophets: Docker support and update for protk 1.4
- msgfplus: Docker support and update for protk 1.4
-
From geert-vandeweyer:
- coverage_report: Added (default) option to collapse repetitive BED files, and added BED format check before collapsing regions.
- vcf_to_variantdb: Added support for 23andMe VCF files generated by ArrogantRobot
-
From peterjc:
- blast2go: v0.0.9, embed citation, updated README
- sample_seqs: v0.2.0, adds desired count mode
- sample_seqs: v0.2.1, fixed missing test file, more tests.
-
From crs4:
- edge_pro: Add support for paired collection of FASTQ (thanks to Inge Alexander Raknes).
-
From george-weingart:
- micropita: Updated version that suppresses the future warnings option that was causing a problem
-
From iuc:
- stringtie: updated tool wrapper for stringtie 1.0.1