Georgia Tech inventors have developed new software called KAnalyze to analyze and sort genomic data from a host of genomic applications. It currently supports FASTA, FASTQ and RAW data formats as inputs and is designed to be easily extensible for a number of different formats and applications. It includes an API for integration into existing programs as well as a command line interface for integration into various scripts for data analysis. This software is designed for both ease of maintenance, as programmers don’t have to maintain the software, and speed, as the software outperforms existing software such as Jellyfish and DSK. The software is also designed to sort k-mer counts into tab-delimited files. This output can then be used to easily find and reassemble fragments of genomic data for a variety of purposes, such as genome assembly, mutation finding, and detection of genomic repeats and protein binding sites.
- Quick, extensible counting of k-mers in genomic software
- Easy integration into a variety of applications and processing scripts
- Accelerates the development of new tools for analyzing genomic data
- Processes k-mers faster than existing software tools in the marketplace
- Integration into different applications that handle genomic data
- Software foundation for research and clinical analytical tools
- Software backbone for the development of applications reliant on the processing of k-mers
As the popularity of genomics applications increases, there is demand for new algorithms to handle the data generated. Particularly, the need to keep track of k-mers, or short sequences of data, resulting from genomic analysis means that better algorithms are needed to support applications as they analyze genomic data for research and clinical diagnostic purposes.