Scientific Stream

The Usability of Lisp Family Programming Languages in Bioinformatics and Computational Biology

Introduction and Background

Lisp, as a pioneer in computer science, has influenced the development of nearly every modern programming language. It's known for its introduction of concepts such as tree data structures, conditionals, higher-order functions, and meta programming. These concepts form the foundation of today's software engineering community.

Furthermore, empirical evidence shows that Lisp is a more productive and faster general-purpose programming language than its contemporaries. When programmers tackle the same problems in Lisp, C/C++, and Java, the Lisp programs are smaller, take less time to develop, and run faster.

Lisp has been successfully applied to various research areas in bioinformatics and computational biology, including systems biology, high-performance computing, database curation, drug discovery, network analysis, and RNA structure prediction.

Lisp Applications and Dialects

Among the Lisp-family languages (LFLs), Common Lisp is considered the most powerful and accessible modern language for advanced biomedical concept representation and manipulation.

Scheme is a compact version of Common Lisp, while Clojure specializes in the parallel processing of big data through the Java Virtual Machine.

Rewards and Challenges

Early adopters of programming frameworks often reap significant scientific benefits, as they establish critical libraries and attract a growing community of researchers and developers.

However, the adoption of a new language also faces the "chicken-and-egg" problem: without significant user support, it's difficult to create and maintain large-scale tools and libraries. Conversely, without these tools and libraries, there will never be a substantial user base.

Currently, library support for bioinformatics tasks in the Lisp family is still in its early stages, and there is no official bioinformatics Lisp community. However, this presents an excellent opportunity to establish a community and contribute to this growing field.

Macros and Domain-Specific Languages

Lisp is a homoiconic language where code is represented as a data structure of the language itself, preserving its structural syntax.

This property empowers Lisp with a unique macro system that can transform the program structure itself. Macros are not confined to simple text substitutions; they can apply extensive structural changes to programs, including the introduction of new control structures and pattern matching capabilities.

さらに、Common Lisp offers access to its "reader," allowing code to be manipulated before it is parsed. This enables Lisp programs to modify their syntax if necessary.

These features make Lisp well-suited for creating domain-specific languages (DSLs), tailored to specific problem domains yet seamlessly integrated with Lisp. A notable example is Common Prolog, a professional Prolog system implemented and embedded in Common Lisp.

Other Unique Strengths

Lisp possesses several other distinctive features that set it apart:

Interactive Programming Environment: Lisp offers an incremental interactive programming environment called the read-eval-print loop (REPL). This allows programmers to continually interact with their program as it evolves, similar to Python's "image" concept.
Software Hot Swapping: Common Lisp supports "hot swapping," where the code of a running program can be modified without interruption. This includes dynamically modifying object classes, a feature not found in mainstream compiled languages.
Robust Error Handling: Lisp invented exception handling, and Common Lisp has a comprehensive error-handling mechanism called the "condition system." It doesn't necessarily unwind the stack on an exception occurrence but instead offers "restarts" to continue program execution where the error occurred.
Foreign Function Interface: Common Lisp implementations often come with a sophisticated foreign function interface (FFI). This enables Lisp programmers to utilize libraries written in other languages like C++ and Java.

Speed Considerations

Despite the perception of Lisp as an interpreted language and thus slow, this is not entirely accurate. Compilers for Lisp have existed since 1959, and all major Common Lisp implementations can directly compile to machine code, which is on par with C code.

While interpreted languages often resort to C/C++ for time-critical portions of their code, Lisp programmers can often avoid this. This is demonstrated by Ross Ihaka, the creator of R, who showed that Lisp's optional type declaration and machine-code compiler enable code that is significantly faster than R and Python.

Case Study: Pathway Tools

Pathway Tools, a large bioinformatics software system written in Common Lisp, demonstrates the language's capabilities.

Pathway Tools has among the most extensive functionality within bioinformatics software, including genome informatics, metabolic pathway informatics, and omics data analysis. It can infer metabolic reconstructions, compute optimal routes within metabolic networks, and execute quantitative metabolic flux models.

The same Pathway Tools binary executable runs as both a desktop window application and a web server. In web server mode, Pathway Tools powers the BioCyc.org website, which has a significant user base and high page views.

Case Study: BioBike

BioBike exemplifies the power of Lisp's homoiconicity.

BioBike is a web-based, programmable, and integrated biological knowledge base that allows programmers to create extensible macros to facilitate the creation of modularized extensions for bioinformaticians.

BioBike's core web listener is nearly 15,000 lines of Common Lisp code, while the entire system comprises about 400,000 lines of code. It includes a scratch-like visual programming language, a specialized bioinformatics-oriented frame system, and many other modules.

Perspectives and Outlook

Lisp has laid the foundation for many programming languages and influenced their constructs. It's noteworthy that R is built on top of Scheme and borrows from Lisp to create embedded DSLs.

The large spectrum of domains and subdomains in bioinformatics and computational biology suggests the potential for developing similar DSLs for genomics, proteomics, and other fields within Lisp.

Furthermore, the future of statistical computing and artificial intelligence will heavily rely on big data, a domain where Lisp excels. This makes Lisp well-suited for future bioinformatics applications.

Conclusions

New programming language adoption in a scientific community is a challenging yet rewarding process. We advocate for a greater inclusion of LFLs into large-scale bioinformatics research, outlining the benefits of using them.

We emphasize Lisp's unparalleled support for homoiconicity, DSLs, error handling, and its importance to future bioinformatics research. We predict that the current state of Lisp research in bioinformatics and computational biology is auspicious for establishing robust community standards.

Key Points

Lisp empowers programmers to write faster programs faster. Empirical studies show that Lisp programs are smaller, take less time to develop, and run faster than their C/C++ and Java counterparts.
LFLs allow for easy creation of extensible macros, facilitating the development of modularized extensions and robust DSLs for bioinformaticians.
Lisp's unique features, including homoiconicity, comprehensive error handling, and hot swapping, make it an excellent choice for the development of enterprise-level, fault-tolerant DSLs for various research areas.
The current state of Lisp research in bioinformatics and computational biology is highly conducive to a timely establishment of a strong community focused on developing robust bioinformatic libraries and highly customizable machine learning and AI applications written in languages like Common Lisp, Clojure, and Scheme.

Acknowledgements

B.B.K. dedicates this work to the memory of his uncle, Taras Khomchuk. B.B.K. wishes to acknowledge the financial support of the United States Department of Defense (DoD) through the National Defense Science and Engineering Graduate Fellowship (NDSEG) Program.

C.W. thanks Jeff Shrager for critical review and helpful comments on the manuscript.

Funding

This research was conducted with Government support under and awarded by DoD, Army Research Office (ARO), National Defense Science and Engineering Graduate (NDSEG) Fellowship, 32 CFR 168a.

Authors' Information and Affiliations

Bohdan B. Khomtchouk is an NDSEG Fellow and PhD candidate in the Human Genetics and Genomics Graduate Program at the University of Miami Miller School of Medicine. His research interests include bioinformatics and computational biology applications in HPC, integrative multi-omics, artificial intelligence, and machine learning.

Edmund Weitz is a full professor at the University of Applied Sciences in Hamburg, Germany. He is a mathematician specializing in set theory, logic, and combinatorics.

Peter D. Karp is the director of the Bioinformatics

How the strengths of Lisp facilitate complex and flexible applications (2016)