Document Type

Journal Article

Department/Unit

Department of Biology

Language

English

Abstract

Most next-generation sequencing platforms permit acquisition of high-throughput DNA sequences, but the relatively short read length limits their use in genome assembly or finishing. Illumina has recently released a technology called Synthetic Long-Read Sequencing that can produce reads of unusual length, i.e., predominately around 10Kb. However, a systematic assessment of their use in genome finishing and assembly is still lacking. We evaluate the promise and deficiency of the long reads in these aspects using isogenic C. elegans genome with no gap. First, the reads are highly accurate and capable of recovering most types of repetitive sequences. However, the presence of tandem repetitive sequences prevents pre-assembly of long reads in the relevant genomic region. Second, the reads are able to reliably detect missing but not extra sequences in the C. elegans genome. Third, the reads of smaller size are more capable of recovering repetitive sequences than those of bigger size. Fourth, at least 40Kbp missing genomic sequences are recovered in the C. elegans genome using the long reads. Finally, an N50 contig size of at least 86Kbp can be achieved with 24×reads but with substantial mis-assembly errors, highlighting a need for novel assembly algorithm for the long reads.

Publication Date

6-2015

Source Publication Title

Scientific Reports

Volume

5

Start Page

10814

Publisher

Nature Publishing Group

Peer Reviewed

1

Copyright

This work is licensed under a Creative Commons Attribution 4.0 International License. The images or other third party material in this article are included in the article’s Creative Commons license, unless indicated otherwise in the credit line; if the material is not included under the Creative Commons license, users will need to obtain permission from the license holder to reproduce the material.

Funder

This work was supported by Early Career Scheme (ECS) Fund, HKBU263512 and Collaborative Research Fund (CRF), HKBU5/CRF/11G of Hong Kong Research Grants Council (RGC) to Z Zhao.

DOI

10.1038/srep10814

Link to Publisher's Edition

http://dx.doi.org/10.1038/srep10814

ISSN (print)

20452322

ISSN (electronic)

20452322

Additional Files

JA-5189-28051_suppl1.pdf (1417 kB)
JA-5189-28051_suppl1.xls (5125 kB)

Included in

Biology Commons

Share

COinS