Perhaps a more optimistic slant would be to classify ourselves as biologists, but ones who specialize in computational assays. By that metric, we get higher salaries than other academics, have less trouble finding jobs, etc. :-)
- Chris Miller
1) Regarding the mutation lists, If you call the following mutations:
Tumor 1: 1 111 10 10 50.0 2 222 9 9 50.0 Tumor 2: 1 999 8 8 50.0 2 888 7 7 50.0
You then need to merge the calls and pull readcounts for every site in both samples, so your files will look like:
Tumor 1: 1 111 10 10 50.0 1 999 6 6 50.0 2 222 9 9 50.0 2 888 5 5 50.0 Tumor 2: 1 111 4 4 50.0 1 999 8 8 50.0 2 222 3 3 50.0 2 888 7 7 50.0
2) Read the sciClone paper for a full explanation of why we only consider copy-number neutral sites for clustering: http://www.ploscompbiol.org/article...
- Chris Miller
Possible problems:
- Do you have readcounts for each site in both tumors? The first two columns of the lists of variants that you feed in should be identical.
- Is your tumor polyploid, or mostly CN altered? If so, no points will be usable.
- Is your CN data formatted incorrectly? By default, sciClone expects absolute copy number values (CN 2 = neutral, CN 3 = 1 copy amplified, etc). You can feed it log2 values by passing the appropriate flag.
- Is your data low-coverage, such that no points are exceeding the minimum depth threshold?
- Chris Miller
Hi Lisa. I'd need more info to help you figure out why you're getting this erro. I'd suggest opening up a new Biostar question, and posting at least the header of your input files there so that I have some information to use for troubleshooting. Feel free to shoot me an email when that question is up ([email protected]) and I'll take a look.
- Chris Miller
I have an opening for a position in my group at the Genome Institute at Washington University in St Louis that mixes research into cancer genomics and applying that knowledge to genomic medicine. Our group is a collection of mostly-PhD holding bioinformatics experts, who apply computational tools to help understand the origins and progression of cancer. We specialize on translating large-scale sequence data (genomic DNA, RNA-seq, and bisulfite-seq, among others) into insight about how to better detect cancer, predict its response to therapy, and treat patients. You will spend approximately 50% of your time analyzing and interpreting cases from our clinical sequencing group, in an environment where your work may directly influence patient care. The other half of your time will be spent on research projects, pairing with some fantastic oncologists to probe the genomics of tumorigenesis, relapse, and clonal evolution in response to therapy. Ideally, you'll have a strong biological...
- Chris Miller
If you look at the help (with ?sciClone), you'll find many of the answers you're looking for. 1. My copy number data used for input is log-ratio data. It seems though as sciClone interpreted it as absolute copy number data, since only cn=2 were used (and according to the paper, sciClone only looks at VAFs in regions of normal cn). Is it possible to make the program use the log-ratio data? When calling the sciClone() function, set cnCallsAreLog2=TRUE 2. The program seems to have filtered away mutations with depth<100. Is it possible to decrease this limit? SciClone works best with deep readcounts, as they decrease the uncertainty associated with each VAF. That said, if you want to decrease it, when calling the sciClone() function, change the minimumDepth parameter. 3. Finally, the output contains one cluster. However, visually there are 3 subclusters in that cluster. I understand they are too close to each other to be considered separate clusters. Is it possible to make the program...
- Chris Miller
RT @neilfws: Bioinformaticians have been writing for years about how data preparation is >= 80% of the job; good to see "big data science" catching up :)