środa, 13 czerwca 2012

Getting GTF from UCSC with proper gene_id

While downloading GTF file (knownGenes or ensemblGenes) from UCSC Table browser an output has one serious issue. Transcript_id = gene_id. And in fact there is no gene_id. Below simple solution is presented for this problem (mm9 genome):

#prerequisities mysql
wget http://hgdownload.cse.ucsc.edu/admin/exe/linux.x86_64/genePredToGtf
chmod +x genePredToGtf
sudo ln -s ./genePredToGtf genePredToGtf
mysql --user=genome --host=genome-mysql.cse.ucsc.edu -A -N -e "select * from ensGene;" mm9 | cut -f2- | genePredToGtf file stdin ensGene.gtf

Brak komentarzy:

Prześlij komentarz