Programming in C, Cpp, and Ada
Published on 2007-12-25.
One day back in 2007 I was dribbling down some notes about programming languages and compilers with a special focus on the use of C, C++ and Ada. These are some of my dribbles.
Table of contents
- Compiled versus interpreted languages
- GNU Compiler Collection
- Portable C Compiler
- Clang compiler
- Programming languages
- Gaining experience
- Comparing languages
Compiled versus interpreted languages
A compiled programming language is written and then run through a compiler that checks its syntax and compresses it into a binary executable. An interpreted language is not compiled. It must be checked for errors at run-time which makes it a bit slower than a compiled language.
A compiler is a computer program (or set of programs) that translate text written in a computer programming language (the source code) into another computer language (the target code).
The most common reason for wanting to translate source code is to create a binary executable program.
The name "compiler" is primarily used for programs that translate source code from a high-level programming language such as C, C++ or Ada into assembly language and then into binary code.
Software for early computers was exclusively written in assembly language for many years. Higher level programming languages were not invented until the benefits of being able to reuse software on different kinds of CPU's started to become significantly greater than the cost of writing a compiler. The very limited memory capacity of early computers also created many technical problems when implementing a compiler.
Towards the end of the 1950s, machine-independent programming languages were first proposed. Subsequently, several experimental compilers were developed. The first compiler was written by Grace Hopper in 1952 for the A-0 programming language. The FORTRAN team led by John Backus at IBM is generally credited as having introduced the first complete compiler in 1957. COBOL was an early language to be compiled on multiple architectures in 1960.
Because of the expanding functionality supported by newer programming languages and the increasing complexity of computer architectures, compilers have become more and more complex.
Early compilers were written in assembly language. The first self-hosting compiler - capable of compiling its own source code in a high-level language - was created for Lisp by Hart and Levin at MIT in 1962. Since the 1970s it has become common practice to implement a compiler in the language it compiles, although both Pascal and C have been popular choices for implementation languages. Building a self-hosting compiler is a bootstrapping problem: first such compiler must be compiled either by a compiler written in a different language, or (as in Hart and Levin's Lisp compiler) compiled by running the compiler in an interpreter.
GNU Compiler Collection
The GNU Compiler Collection (GCC) is a set of compilers produced for various programming languages by the GNU Project. As well as being the official compiler of the GNU system, GCC has been adopted as the standard compiler by most other modern Unix-like computer operating systems.
Originally named the GNU C Compiler, because it only handled the C programming language, GCC 1.0 was released in 1987, and the compiler was extended to compile C++ in December of that year. Front ends were later developed for Fortran, Pascal, Objective C, Java, and Ada, among others.
Portable C Compiler
In September 2007 the lighter, faster, and most importantly, BSD Licensed, compiler PCC was imported into OpenBSD's CVS and NetBSD's pkgsrc.
The Portable C Compiler (also known as pcc or sometimes pccm - portable C compiler machine) was an early compiler for the C programming language written by Stephen C. Johnson of Bell Labs in mid-1970s-based in part on ideas from earlier work by Alan Snyder in 1973.
It was one of the first compilers that could easily be adapted to output code for different computer architectures, the compiler had a long life span. It shipped with BSD Unix until the release of 4.4BSD in 1994 after that it was replaced with the GNU C Compiler. It was very influential in its days, so much so that at the beginning of the 1980s, the majority of C compilers were based on it.
Clang is a compiler front end for the programming languages C, C++, Objective-C, Objective-C++, OpenMP, OpenCL, and CUDA. It uses LLVM as its back end and has been part of the LLVM release cycle since LLVM 2.6. It is designed to be able to replace the full GNU Compiler Collection (GCC). Its contributors include Apple, Microsoft, Google, ARM, Sony, Intel and Advanced Micro Devices (AMD). It is open-source software, with source code released under the University of Illinois/NCSA License, a permissive free software licence.
The Clang project includes the Clang front end and the Clang static analyzer and several code analysis tools.
Starting in 2005, Apple made extensive use of LLVM in a number of commercial systems, including the iPhone software development kit (SDK) and integrated development environment (IDE) Xcode 3.1.
One of the first uses of LLVM was an OpenGL code compiler for OS X that converts OpenGL calls into more fundamental calls for graphics processing units (GPU) that do not support certain features. This allowed Apple to support the entire OpenGL application programming interface (API) on computers using Intel Graphics Media Accelerator (GMA) chipsets, increasing performance on those machines. For GPUs that support it, the code is compiled to exploit fully the underlying hardware, but on GMA machines, LLVM compiles the same OpenGL code into subroutines to ensure continued proper function.
LLVM was intended originally to use GCC's front end, but GCC turned out to cause some problems for developers of LLVM and at Apple. The GCC source code is a large and somewhat cumbersome system for developers to work with; as one long-time GCC developer put it, "Trying to make the hippo dance is not really a lot of fun".
Apple software makes heavy use of Objective-C, but the Objective-C front-end in GCC is a low priority for GCC developers. Also, GCC does not fit smoothly into Apple's IDE. Finally, GCC is licensed under GNU General Public License (GPL) version 3, which requires developers who distribute extensions for, or modified versions of, GCC to make their source code available, whereas LLVM has a BSD-like license which does not force users to release their source code changes when publishing compiled binaries of those changes.
Apple chose to develop a new compiler front end from scratch, supporting C, Objective-C and C++. The "clang" project was open-sourced in July 2007.
OpenBSD is now the latest BSD switching from GCC to LLVM's Clang C/C++ compiler by default and Clang is now the default compiler for i386 and x86_64 architectures. For other architectures where Clang is in less supportive shape, GCC4 is remaining the default there as well. This follows FreeBSD and others that have switched to Clang by default for quicker build times, the more liberally licensed code-base compared to GCC recent versions being under the GPLv3.
No matter what programming language you use the most important aspect of programming is to be skillful enough to write correct (according to a standard, if such exist), secure, and well performing code. Another important aspect is to understand that there is no single programming language suitable for all kinds of work.
Some languages handles low-level programming better than others. Other languages handle lots of text better that others. Again other languages are better suited when it comes to mathematical work. Choosing the right programming language for the job is important otherwise time may be wasted trying to solve a problem in a language that isn't suitable.
Languages like C, C++ and Ada can be used to do anything, but lot of different questions are raised when one has to choose a programming language for a specific task.
- How well does the compiler perform and does there exist a working Open Source compiler?
- Does the program demand a long-life maintenance and development?
- How productive do you want to be?
- How experienced are you?
- How "unsafe" is the language? Do you have experience with security?
- Is the program going to run on more than one architecture?
If the compiler doesn't perform well, you might end up with errors and bugs in your software even though the source code is error free. It is therefore important to understand that much power lies in the compiler.
If the software you develop demands a long-life maintenance, a language with a good human readable syntax is important because it makes it a lot easier to re-read the code when time has passed. One of the major problems with the C language is that it can be quite difficult to read, even illogical at times.
Productivity in programming is about how fast you can get the job done, so that your program does what it is supposed to do. Spending a lot of time hunting bugs because the language syntax is hard to read is a waste of time.
Depending on the job some languages result in a very high level of productivity while other languages can be extremely counter productive. Many people think that this is about the language, but in reality it is about choosing the right language best suited to deal with the problem at hand.
Ada was specifically designed to avoid errors, and it is therefor used in much safety demanding applications in the industry. However, it never gained wide usage. A post on Hacker News discusses Why Ada isn't popular.
Some argue that a language is only as safe as the programmer, and basically that is true, but adding security demanding features to the language and compiler, in order to avoid errors, can be very time saving, very productive, and also life saving.
strlcpy function, developed by Todd C. Miller and Theo de Raadt OpenBSD for use in the C programming language, is intended to replace the function
strcpy and provide a simpler and more robust interface than
strcpy. It is designed to copy the contents of a string from a source string to a destination string.
strlcpy offers two features that are designed to help software developers avoid problems.
One could say that the car is only as safe as the driver, but in my opinion adding airbags, traction control systems, ABS, and other safety features helps a lot.
The many different programming languages has all been designed with specific usage in mind. Some was originally designed only as a solution to a single problem. Others was designed as a way of improving a former language. Again others where designed to be general purpose languages.
Most languages grow and change over the time. Some becomes very popular like C. Other becomes more or less extinct like BASIC.
No matter what programming language are being used, it is always the responsibility of the programmer to really know and understand the language he is using. With some languages, like C, it take skills to be really good.
As long as one understands that the programmer has to be skillful, no matter what language he is using, it doesn't hurt when the language itself has some features to prevent errors.
But what about when the compiler itself has errors?
When the compiler is buggy it doesn't matter how well written your code is or how skillful you are, the result will also be buggy.
This can greatly affect the "success" of a programming language, because it doesn't matter how well, secure, powerful and feature rich the language is, if the compilers on the market are bad, the result will be bad and this seems to be the case with Ada today.
There is only one Open Source compiler for Ada: GNAT. GNAT accepts Ada source code and generates executable (machine) code. GNAT is a compiler and does not translate into C code (as some mistakenly believe). It is based on the Free Software Foundation FSF's GCC compiler. GNAT generates relatively good code, and is expected to improve further as its developers transition from developing initial functionality to optimizing it.
The best way to learn a new human language is to speak it right from the outset, listening and repeating, leaving the difficulty of the grammar for later. The same applies to computer programming languages. To learn a new programming language effectively, you must start writing programs as quickly as possible.
Experimenting with code is one of the best ways to learn a programming language. After gaining some general knowledge it is a good idea to read from the language reference, if one exists, and pay attention to the details of how different problems should be addressed.
Another good way is to look at other peoples code, especially code from well written programs. If you are learning to program in C get your hands on some of the code from the OpenBSD operating system, it is some of the best C code out there.
Most people don't program with security in mind, it is a good idea to do that from the beginning to avoid the build-up of bad habits, but learning about the security related matters to a particular programming language can be difficult. To do so you have to make sure that you really truly understand how a specific function or feature works and how it will affect the operating system.
The initial development of C occurred at AT & T Bell Labs between 1969 and 1972. It was named "C" because many of its features were derived from an earlier language called "B", which according to Ken Thompson was a stripped down version of the BCPL programming language. C wasn't "designed" to be widely used, it was "designed" by top hackers for their own personal use, but eventually "got out". It was designed for writing operating systems, compilers and other system tools, and in this role it has become almost totally dominant.
It can provide excellent performance (assuming good choice of algorithm and good C skills), and allows low-level hardware access, but these are not normally things required by the beginner. C's use of pointers are a source of frustration and confusion for many, but they are essential in even fairly trivial C programs.
Further, C's string handling is very weak compared to many other modern languages (the scanf function is notoriously problematic).
The C programming language is a general-purpose, block structured, procedural, imperative computer programming language. Although C was designed as a system implementation language, it is also widely used for applications. C has also greatly influenced many other popular languages, especially C++, that was originally designed as an extension to C. C was very useful for many applications that had formerly been coded in assembly language.
The Linux kernel is written in C and so is the operating systems of all the BSD flavors. Many other applications such as the game Quake 3 is also written in C.
Despite its low-level capabilities, the language was designed to encourage machine-independent programming. A standards-compliant and portably written C program can be compiled for a very wide variety of computer platforms and operating systems with minimal change to its source code. The language has become available on a very wide range of platforms, from embedded micro controllers to super computers.
Despite its popularity, C has been widely criticized. Such criticisms fall into two broad classes:
- Desirable operations that are too hard to achieve using unadorned C, because no native function exist.
- Undesirable operations that are too easy to accidentally invoke while using C - and this is the main reason for programming bugs.
The safe and effective use of C requires more programmer skill, experience, effort, and attention to detail than is required for most other programming languages.
From a productivity point of view C can cause a lot of wasted time if used in the wrong areas of work, but this isn't because C is a bad language, it's because C was meant to be a tool mainly for low-level programming.
The standard C doesn't provide that many functions and features, and that makes C a relatively small programming language. For example dealing with date and time programs demands very creative programming skills.
C also has an impressive number of subtle pitfalls, and many of these can be leveraged by a skilled attacker to execute code on a computer on which these vulnerable programs run. But while almost everybody seems to understand the significance of these programming mistakes, few actually sit down and evaluate their code from a security perspective.
C++ is regarded as a mid-level language, as it comprises a combination of both high-level and low-level language features. It is a statically typed, free-form, multi-paradigm, usually compiled language supporting procedural programming, data abstraction, object-oriented programming, and generic programming.
Dr. Bjarne Stroustrup developed C++ in 1979 at Bell Labs as an enhancement to the C programming language and named it "C with Classes". In 1983 it was renamed to C++. Enhancements started with the addition of classes, followed by, among other features, virtual functions, operator overloading, multiple inheritance, templates, and exception handling.
The C++ programming language standard was ratified in 1998 as ISO/IEC 14882:1998, the current version (at the time of writing this document) is the 2003 version, ISO/IEC 14882:2003. A new version of the standard (known informally as C++0x) is being developed.
The idea of creating a new language originated from Stroustrup's experience in programming for his Ph.D. thesis. Stroustrup found that Simula had features that were very helpful for large software development, but the language was too slow for practical use, while BCPL was fast but too low-level and unsuitable for large software development. When Stroustrup started working in AT & T Bell Labs, he had the problem of analyzing the Unix kernel with respect to distributed computing. Remembering his Ph.D. experience, Stroustrup set out to enhance the C language with Simula-like features. C was chosen because it is general-purpose, fast, portable and widely used. Besides C and Simula, some other languages that inspired him were ALGOL 68, Ada, CLU and ML. At first, the class, derived class, strong type checking, inlining, and default argument features were added to C via Cfront. The first commercial release occurred in October 1985.
The name of the language was changed from "C with Classes" to C++, and new features were added including virtual functions, function name and operator overloading, references, constants, user-controlled free-store memory control, improved type checking, and BCPL style single-line comments with two forward slashes.
In 1985, the first edition of The C++ Programming Language was released, providing an important reference to the language, as there was not yet an official standard. In 1989, Release 2.0 of C++ was released. New features included multiple inheritance, abstract classes, static member functions, const member functions, and protected members. In 1990, The Annotated C++ Reference Manual was published. This work became the basis for the future standard. Late addition of features included templates, exceptions, namespaces, new casts, and a Boolean type.
As the C++ language evolved, a standard library also evolved with it. The first addition to the C++ standard library was the stream I/O library which provided facilities to replace the traditional C functions such as printf and scanf. Later, among the most significant additions to the standard library, was the Standard Template Library.
After years of work, a joint ANSI–ISO committee standardized C++ in 1998 (ISO/IEC 14882:1998). For some years after the official release of the standard, the committee processed defect reports, and published a corrected version of the C++ standard in 2003. In 2005, a technical report, called the "Library Technical Report 1" (often known as TR1 for short) was released. While not an official part of the standard, it gives a number of extensions to the standard library which are expected to be included in the next version of C++. Support for TR1 is growing in almost all currently maintained C++ compilers.
While the C++ language is royalty-free, the standard document itself is not freely available.
In the Design and Evolution of C++, Bjarne Stroustrup describes some rules that he uses for the design of C++. The following is a summary of the rules. Much more detail can be found in The Design and Evolution of C++.
- C++ is designed to be a statically typed, general-purpose language that is as efficient and portable as C.
- C++ is designed to directly and comprehensively support multiple programming styles (procedural programming, data abstraction, object-oriented programming, and generic programming).
- C++ is designed to give the programmer choice, even if this makes it possible for the programmer to choose incorrectly.
- C++ is designed to be as compatible with C as possible, therefore providing a smooth transition from C.
- C++ avoids features that are platform specific or not general purpose.
- C++ does not incur overhead for features that are not used (the "zero-overhead principle").
- C++ is designed to function without a sophisticated programming environment.
Ada is an advanced, modern programming language, designed and standardized to support and strongly encourage widely recognized software engineering principles: reliability, portability, modularity, re-usability, programming as a human activity, efficiency, maintainability, information hiding, abstract data types, genericity, concurrent programming, object-oriented programming, etc.
All validated Ada compilers have passed a controlled validation process using an extensive validation suite. Ada is not a superset or extension of any other language. Ada does not allow the dangerous practices or effects of old languages such as C, although it does provide standardized mechanisms to interface with other languages such as Fortran, Cobol, and C.
Ada is defined by an international standard (the language reference manual, or LRM). Ada is taught and used all around the world. Ada is used in a very wide range of applications: banking, medical devices, telecommunications, air traffic control, airplanes, railroad signaling, satellites, rockets, etc.
Ada is a structured, statically typed, imperative, and object-oriented high-level computer programming language. It was originally designed by a team led by Jean Ichbiah of CII Honeywell Bull under contract to the United States Department of Defense during 1977 - 1983 to supersede the hundreds of programming languages then used by the US Department of Defense (DoD).
Ada addresses some of the same tasks as C or C++, but Ada is strongly typed (even for integer-range), and compilers are validated for reliability in mission-critical applications, such as avionics software. Ada is an international standard - the current version (at the time of writing this document) (known as Ada 2005) is defined by joint ISO/ANSI standard (ISO-8652:1995), combined with major Amendment ISO/IEC 8652:1995/Amd 1:2007.
Ada supports run-time checks in order to protect against access to unallocated memory, buffer overflow errors, off by one errors, array access errors, and other avoidable bugs. These checks can be disabled in the interest of runtime efficiency, but can often be compiled efficiently. It also includes facilities to help program verification. For these reasons, Ada is widely used in critical systems, where any anomaly might lead to very serious consequences, i.e., accidental death or injury. Examples of systems where Ada is used include avionics, weapon systems (including thermonuclear weapons), thermonuclear reactors and spacecrafts.
The syntax of Ada is simple, consistent and human readable. It minimizes choices of ways to perform basic operations, and prefers English keywords to symbols.
Ada is as powerfull as C, but the language - as a language - is much more safe. To compare C with Ada, if you absolutely must, is like comparing two race cars, both are really fast and effective, but one of them has airbags and traction control (Ada), the other doesn't (C).
Comparing programming languages is necessary when trying to figure out what language will best solve a particular problem, but comparing languages just for the fun of it is always a bad idea.
C vs. C++
Modern critics of the C++ language raise several points. First, since C++ is based on and largely compatible with C, it inherits most of the criticisms leveled at that language. Taken as a whole, C++ has a large feature set, including all of C, plus a large set of its own additions, in part leading to criticisms of being a "bloated" and complicated language. Bjarne Stroustrup points out that resultant executables don't support these claims of bloat, but that's not the point. The resultant executables isn't the problem, the problem lies in getting all the bloat to work correctly in the first place!
As an example, C++ was designed to handle strings very well because C is so bad at doing it, but C++ reverts back to chars when working with streams. In C++ "ifstream" and "ofstream" can't work with string variables, so in order to work with a pathname contained in a string variable, you have to convert the string into a pointer to an array of characters - ie. you're back writing C code. Then you miss the whole point and might as well do it in C!
Linus Torvalds made a popular comment on C++ saying:
In other words, the only way to do good, efficient, and system-level and portable C++ ends up to limit yourself to all the things that are basically available in C.
And he is right!
To serve as a small and simple example. A piece of C code:
char buff; strcpy(buff, getenv("HOME"));
The C code above could result in a buffer overrun and in C++ this is avoided by using a string instead:
string myhomepath; myhomepath = getenv("HOME");
Since the string dynamically increases, the risk of a buffer overrun is eliminated, but if you actually need to use the contents of that string to something useful, such as writing a configuration file into the home directory of the user, you have to convert it into a pointer to an array of characters.
string myhomepath; myhomepath = getenv("HOME"); myhomepath += "./myprog/myprog.conf"; const char *filename = myhomepath.c_str(); ofstream mydir; mydir.open(filename); ... mydir.close();
To use this string you first have to revert it the programming back into C. Then the outcome has to be converted back into C++ before you finally can get some work done.
When you work with C++ you often tend to get stuck using some old C function anyway.