Sequence Analysis and Assembly Using the Fast Data Finder3

Timothy Burcham
James Candlin
Alan Roter

Applied Biosystems, Inc.


Abstruct

INHERIT is a sequence analysis and assembly package that utilizes the Fast Data Finder (FDF) in a number of unique ways. The FDF is a linear-systolic array designed to search flat text files very fast. INHERIT employs the FDF in database searching, resulting in search times limited only by disk-transfer rates. INHERIT also has a flexible pattern specification language that allows the expression of very complex biological patterns. These patterns can describe genetic motifs that can be used in database searches. Because of its design, the FDF can search the database for these patterns independent of pattern complexity. The FDF is used in sequence assembly by initially screening pairs of fragments for similarity and then excluding those fragments that are below the threshold of similarity from further consideration in assembly algorithm.

We have recently upgraded INHERIT package to take advantage of the next generation of FDF, the FDF-3. The FDF-3 hardware has approximately 3-times the number of cells per board with a form factor that is one-fourth the size of the previous FDF system. The FDF-3 is designed as a SCSI device, facilitating the connection of the FDF to the host computer and is designed to search either dedicated SCSI disks or the system disk. The FDF-3 also uses a new file system that speeds up searches, is easy to maintain, and supports file protection. By taking advantage of these new FDF-3 features, the new release of the INHERIT system is easier to maintain and the search time is nominally faster. The new release of the INHERIT Assembler includes more rigorous vector and ambiguous sequence removal, integrated editing, and sequence chromatogram display.