Paul Brittain

Software and how it's made - A brief introduction from end to end.

March 16, 2019 • ☕️ 5 min read

Computer software, also known as computer programs or simply software, is a generic term that refers to a collection of computer instructions made up of 1’s and 0’s, commonly known as binary, that tell a computer how to work, in contrast to the physical hardware from which the system is built, and actually performs the work. This binary is the only language a computer understands. Each binary number that can be either a 1 or a 0 is called a bit. The binary information is stored at the byte level. This consists of 8 bits that form a hexadecimal code. Hexadecimal, also known as base 16 or simply hex is a positional numeral system with a radix, or base, of 16. Hexadecimal numerals are widely used by computer programmers as they provide a more human-friendly representation of binary coded values. Originally, when the first computers existed it was necessary to input all information including the source code by hand. To do this, one of the first set of programming languages were introduced named “Assembly languages”. These languages allowed programmers to write computer instructions at the binary level but with programming statements that made sense once learned, and that could be read, with practice.

This introduces a fundamental concept in computer science and software development called abstraction. The concept is that of a simple interface covering a more complex system. Take as an example a button that launches a missile; the interface you would have with this button is very simple but the chain of events including hundreds of important variables associated with launching, guiding, and hitting a target is complex1. Abstractions allow the wielding of ever increasingly complex mechanisms in a human-friendly way. The closer toward binary instructions and the bare metal of a computer’s central processing unit (CPU), the “lower-level” you are said to be. The more abstracted-away from this you are the “higher-level” you are said to be.

Computer software is written using instructions to the computer that are human readable, this is called source code. Source code can be written in many different programming languages and each language varies in levels of abstraction from the aforementioned data formats the computer ultimately understands. Programming languages are also generally built for different purposes, operations, industries, and problem-sets. Common examples of programming languages are Java, C#, C++, Python, JavaScript. The computer programmer writes these instructions line-by-line and organizes the code into logical blocks, files, and folders. In order for a computer to read and run the program, the source code must be turned into binary in a process called compilation. This process is undertaken by another computer program called a compiler that transforms source code written in one programming language (the source language) into another programming language (the target language). The target language is generally binary code (machine code), or hexadecimal code (byte code). Machine code is immediately readable by the computer’s CPU (compiling directly from the target language to machine code is a compilation strategy known as “Ahead-of-time”, which means translated all together before the program is ran), but byte code is a very low-level intermediate representation of computer instructions that can be translated into machine code in a compilation strategy called “Just-in-time”, which means translated as it’s needed while being ran. Compilers also check the source code for errors such as misspellings or misuses of the language and fail the compilation process returning the location of the error in addition to varying levels of explanation as to the cause of said error depending on the error in question and the compiler being used.

The software can be written by individual programmers or by many. The more complex and wider the scope of a program’s functionality then the more programmers are generally required to complete the task of writing the necessary software. Professionally written and commercialized software is broadly written by large numbers of collaborating programmers and take a matter of years to complete. One concept that allows the simultaneous editing of source code by more than one programmer is version control. This is the management and collection of the editing history of files. The complete collection of source code files is known as the codebase. The version control software will keep track of all revisions to the codebase, changes are usually identified by a number or letter code termed the “revision number”, a timestamp, the person who made the change, what lines were deleted/removed, and what lines were added. Another concept that builds upon this is distributed version control. This is a form of version control where the complete codebase - including version history - is mirrored on every developer’s computer. This is a peer-to-peer approach to version control and results in a system that does not rely on a centralized master copy of the codebase. An individual programmer can make changes to one or more files in the codebase and submit these changes to a shared master version of the codebase (this is known as the master branch). In order to do this, the programmer must possess an up-to-date version of this master branch, and their changes are submitted as a new revision of the master branch. This new revision is added to the master branch, and all other programmers must update their own version of the master branch to pull in this change before they can submit their own new changes. If a bad revision has been incorporated into the master branch, the revision in question can easily be reverted bringing the codebase back to a healthy historical version of itself, undoing the bad changes.

Computer software can be distributed for free or sold for a price. The source code can also be released to the public, open-source, or kept secret from the public, closed-source. Generally, when a user of the software downloads or otherwise receives the software they are in possession of the compiled, computer readable code that is owned by a person or company in the form of an executable file. This is how software is distributed whilst maintaining its closed-source nature. It is impossible to completely reverse this process and be in possession of source code that resembles the original. However, it is possible though extremely difficult to decompile executable files and machine code back into a higher level representation of the instructions so that a skilled individual or group of individuals may make their own changes to the program. Most proprietary software follows this model in order for secrecy and to maintain control over the intellectual property of the software solution. It ensures that nobody can copy their source code and immediately compete with them. The opposing model, open-source, is publicly available, downloadable, readable, and in most cases editable and re-distributable source code. The source files are released with one of many available special licenses that grant different degree and combinations of these freedoms. Open-source software is essential to enabling community development of free software. Many programs you use are likely open-source. Common examples of these are The Mozilla Firefox Browser, Audacity, VLC, 7-Zip, and the Linux operating system. These open-source software projects are developed and maintained by passionate unpaid volunteers, sometimes in addition to paid developers.