\name{genomicLocsToProteinSequence}
\alias{genomicLocsToProteinSequence}
\title{Obtaining the protein sequences and DNA sequences of the coding regions
within a list of loci in genome}
\description{
\code{genomicsLocToProteinSequence} takes a list of genomic loci given in the 
input and tries to find the protein sequences and DNA sequences of the coding 
regions of genome which are within those genomic loci.
}
\usage{
genomicLocsToProteinSequence(inputLoci, CDSaaFile)
}
\arguments{
    \item{inputLoci}{
A data frame containing the genomic loci as the input. Each row is for one 
genomic locus. The first column is for the chromosome, the 2nd and 3rd columns 
are for the start and end coordinates of the locus in the chromosome, and the 
4th column is for the strand ("+" or "-" for forward and reverse strand, 
respectively). Other columns are optional and will not be used by the function.
Note that the chromosome name can be either in the ENSEMBL style, e.g. 1, 2, 3,
\dots, and X, Y and MT, or in another popular style, namely chr1, chr2, chr3, 
\dots, and chrX, chrY and chrM. But they cannot be mixed in the input of one 
function call.
}
    \item{CDSaaFile}{
The data file generated by the package's function \code{generatingCDSaaFile},
containing the genomic locations, DNA sequences and protein sequences of all 
coding regions in a specific genome which is used in your analysis.
}
}
    \value{
A data frame containing the original genomic loci specified in the input and 
the protein sequence and the DNA sequence of the coding regions within each of 
the loci. In detail, the returned data frame contains the original genomic loci
specified in the input and after them, the five added columns:
    \itemize{
\item Column "transId" lists the ENSEMBL IDs of the transcripts whose coding 
regions overlap with locus specified and the overlapping coding regions are 
exactly the same among those transcripts.
\item Column "dnaSeq" contains the DNA sequence in the overlapping coding 
regions.
\item Column "dnaBefore" contains the DNA letters which are in the same codon 
as the first letter in the DNA sequence in the column "dnaSeq".
\item Column "dnaAfter" contains the DNA letters which are in the same codon 
as the last letter in the DNA sequence in the previous column 'dnaSeq'.
\item Column "pepSeq" contains the protein sequence translated from the DNA 
sequences in the three preceding columns, "dnaBefore", "dnaSeq" and "dnaAfter".
}
}

\author{
Yaoyong Li
}

\examples{

    dataFolder = system.file("extdata", package="geno2proteo")
    inputFile_loci=file.path(dataFolder, 
        "transId_pfamDomainStartEnd_chr16_Zdomains_22examples_genomicPos.txt")
    CDSaaFile=file.path(dataFolder, 
        "Homo_sapiens.GRCh37.74_chromosome16_35Mlong.gtf.gz_AAseq.txt.gz")

    inputLoci = read.table(inputFile_loci, sep="\t", stringsAsFactors=FALSE)

    proteinSeq = genomicLocsToProteinSequence(inputLoci=inputLoci, 
                                            CDSaaFile=CDSaaFile)

}

