On the 7th of December, we presented an overview of some of the major developments over the last half year in the NSX
R&D team. This post is a guiding article for the presentation, which summarizes the notable topics that we touched on.
We now also include an overview of planned development over the course of the coming half year.
On the 15th of June, we presented an overview of some of the major developments over the last half year in the NSX R&D
team. This post is a guiding article for the presentation, which summarizes the notable topics that we touched on.
The technique of inheritance was introduced as an essential part of the object-oriented programming paradigm, aiming to
deliver more anthropomorphic code by mimicking the concept of ontological refinement. Just like a jet fighter is a
special type of an airplane, or a bird is a special type of animal, and a sparrow is a special type of bird, inheritance
was created to define classes as refinements of other classes. Such a subclass would inherit the member variables and
methods of a superclass, and extend it with more specific variables and methods for that particular refinement. The
notion of ontological refinement is strongly related to taxonomy, i.e., the practice and science of the
classification of things or concepts, including the underlying principles. Originally, taxonomy referred only to
classifying – or classifications of – organisms. This probably explains that the technique of inheritance in software
programming languages is often illustrated with examples of animals or plants.
However, the use of ontological refinement to define classes entails several issues, both on a conceptual and technical
level. First, taxonomies are essentially multi-dimensional, i.e., they can be based on multiple criteria. For example,
one could refine or categorize a human person based on gender, nationality, or age. Constructing a taxonomy tree based
on various criteria could lead to a combinatorial explosion, as nearly all combinations would be possible, e.g.,
BelgianMaleChild and GermanFemaleAdult. Another option would be to ignore the multi-dimensional nature of the taxonomy,
and to use a single primary criterion at a certain level. This approach could limit the anthropomorphic meaning of the
classes, as can be illustrated by the distinction between mammals and fish based on the procreation mechanism, e.g.,
whales and dolphins are no fish although their appearance and behavior is typical for fish. Therefore, it seems often
preferable to use references to external classes, such as gender, nationality and age, instead of using inheritance to
create subclasses based on taxonomy trees.
Second, the introduction of additional attributes does not always correspond with ontological refinement. Though it is
indeed true that a more specific concept exhibits in general more specific attributes, this is not the only mechanism.
For instance, a point in 3-dimensional space has an additional attribute with respect to a 2-dimensional point, but it
is a generalization instead of a refinement. It is not that rare that attributes of a general concept get default values
for more specific concepts and can therefore be omitted. In such cases, programmers are tempted to define the more
general concept as a subclass, which is once again detrimental to the anthropomorphic meaning of the classes.
Besides the underlying conceptual issues, the use of inheritance in a traditional object-oriented language creates
several technical issues in the context of software development. These issues are mainly related to the introduction of
technical coupling between the various classes. Every subclass in the inheritance tree has unrestricted access to the
variables and methods of the superclass(es). This coupling is essentially hidden, i.e., the inherited attributes and
methods cannot be seen in the class source code, and may be spread out over many classes across many inheritance levels.
The fact that modern IDE tools visualize inherited attributes and methods does not really change that. A simple typo can
remain invisible during compilation as the compiler may find an actual attribute or method with that name somewhere in
the inheritance hierarchy, resulting in the use of the wrong attribute or method. The fact that some languages do not
allow multiple inheritance and authors often advise to limit the number of levels in multilevel inheritance, is a clear
testimony to the dangers of this technical coupling.
We have often argued that the principle of separation of concerns implies that object-oriented classes cannot contain
both data and behavior from a functional point of view. A class needs to either represent data with some auxiliary
utility methods, or to represent a behavioral action with some auxiliary utility data attributes. Therefore, we need to
distinguish between data and behavior to further elaborate the preferred way to deal with inheritance or ontological
refinement.
For behavior functions or methods, we believe that the use of polymorphism is both appropriate and sufficient to provide
the required functionality. The mechanism enables the software engineer to provide a variety of implementations that may
represent ontological refinement, while realizing at the same time action version transparency for different versions
and variants. For instance, an Encrypter interface can encapsulate various specific ways of encrypting a message, while
a Publisher interface may be used as a common concept for different ways of message delivery. Every implementation class
can be passed in a way that makes the presence of the behavior explicit and allows the programmer to invoke that
behavior.
Concerning data attributes or references, we want to stress again that it is often preferable to use external taxonomy
entities to avoid the combinatorial explosion of inheritance trees due to the multi-dimensional nature of the
taxonomies. A person could for instance have separate taxonomy entities for gender, nationality, and age, or a car could
have separate taxonomy entities for the car brand, vehicle type, and propulsion type. Nevertheless, in many cases
ontological refinement could be preferable as a number of common attributes may reflect an important common notion or
concept that needs to be made explicit. For instance, a home could be an important concept with common attributes that
needs to be refined for a house or an apartment. Or a legal person, being a person or legal entity. We argue here once
again that the use of polymorphism is both appropriate and sufficient to support such a concept. By providing an
interface to represent the common concept, such as a home, the programmer is able to identify, handle, pass, and access
the various attributes of this common concept through all ontological refinements.
Information systems still use to a large extent relational databases. To support ontological refinement, we distinguish
traditionally three possible options for the database tables as reflected in the drawing below.
One super-table containing the union of all subclass attributes.
Dedicated tables for every subclass, duplicating the common attributes.
Main table with common attributes, and dedicated tables with specific attributes for subclasses.
Option a does not have an explicit representation for the various subclasses, and would lead to lots of empty entries in
the various instance rows. Moreover, the introduction of new subclasses – and even new inheritance levels – would lead
to an ever expanding structure of the single table.
Option c seems to be the most efficient in terms of database columns or attributes. However, in terms of instances of
columns or attributes, it is not. With respect to option b, it requires for every subclass instance an additional type
attribute and link key. In addition, it would not be evident to assign an anthropomorphic name to the various dedicated
tables. And finally, the software dealing with the persistency data would have to perform some dereferencing logic to
retrieve all the attributes, which would not scale well with the introduction of new subclasses or inheritance levels.
Option b seems to have redundant columns for the common attributes, but has less redundancy and more efficiency in terms
of instances of database entries. It does not need dereferencing logic, and integrates easily with a polymorphism
implementation, where the common concept and attributes are represented as an interface. The fact that the refinement
relation is not explicit seems reasonable as the concept is only present in object-oriented classes and not in
relational databases.
Based on this reasoning, we have initiated support in Normalized Systems tooling for inheritance or ontological
refinement using an implementation that is based on polymorphism for the software classes, and utilizes option b for the
organization of the database tables.
2022 was a fruitful year for R&D at NSX. Adapting to life during the prior two years marked by the COVID-19
pandemic was certainly something the software industry excelled at. Nevertheless, being back in the office and having
face-to-face interactions and the open exchange of ideas has also been greatly beneficial to our work.
With the passing of 2022, we want to look back at many of the notable changes coming out of R&D. Several milestones
in the development of our modelling and code generation frameworks have opened up many new possibilities for innovation
and have accelerated development even further.
The use of transactions in software systems is quite a fundamental and non-trivial issue. Every one knows the obvious example of retrieving money from a cash machine. Either your account is debited and you receive the money, or none of the above happens. In case the machine fails to dispense the money, the transaction requires a rollback of the debiting. However, it is not always that straightforward. Though the use of transactions providing automated rollbacks is often desirable to end-users and system analysts, it may well be unnecessary complicated or even plainly impossible to implement for the software developer.
In order to enable scaling and to avoid deadlocks, transaction processing systems typically deal with large numbers of small things, and strive to come in, do the work, and get out. This means that it is in general not feasible to use technical transactions to provide end-to-end transactional integrity for customer transactions. For instance, guaranteeing the customer transaction of a money transfer is in general not implemented through a technical transaction encompassing the debiting of the source bank account and the crediting of the destination bank account, which may reside in another bank across the globe. It is typically guaranteed by taking the transfer request as fast as possible into a secure database, followed by a number of sequential processing steps or individual transactions. This flow may even require human intervention to make sure that the customer transaction, i.e., the transfer request stored in the secure database, is executed properly.
The use of end-to-end technical transactions would often cause lots of problems in the real world. For instance, defining a technical transaction around the reservation of a flight ticket, a rental car, and a hotel room, could seem desirable to indulge the customer, but would soon lead to the simultaneous locking of all reservation systems around the world. Avoiding such deadlocks is closely related to the NS theorem Separation of States. This theorem states that after every task or action, the result state should be stored in a corresponding and appropriate data entity. This will enable the independent execution, and therefore evolution, of the implementation of the processing task, and the error handling of the task. Besides better evolvability, this strategy improves the tracing and handling of errors, and reduces the chances of having locked resources.
Another often overlooked issue in transaction management is the complexity of the rollback. While a previous write operation in the same database could be compensated by a relatively simple rollback, developers would have to implement dedicated compensating transactions in case the various steps imply changes across different systems. These compensating actions are not always straightforward or even possible. For instance, in case one of the steps has triggered a physical action, like shipping a product or opening a switch or water tap, it is simply impossible to rollback. Unless mopping the floor is part of the compensating transaction.
These issues are also quite relevant when client systems invoke an API (Application Programming Interface) of a target system. Examples of such client systems are business process management systems invoking services from underlying transactional systems, or user interface front-ends developed upon the service APIs provided by back-end systems. The client systems often desires a customer transaction encompassing several service calls on the target system. This requires client and target system to agree on one or multiple scenarios, that define the responsibilities and behavior at both ends. We discuss here the various possibilities.
The client system defines the transaction and the target system is expected to honor the transaction. By invoking some start-transaction interface, the client system expects the target system to be able to rollback everything that is being processed until the stop-transaction interface is called. This is a very tough, cumbersome, and nearly impossible requirement to fulfil for the target system.
The client system attempts to both manage and honor the transaction. Within the scope of a desired transaction, the client system keeps track of the various changes it incurs in the target system, and will rollback and/or compensate the various changes that have already been performed in case something fails.
The target system provides an explicit aggregated service API for certain customer transactions required by the client system. In this case, the target system attempts to manage and honor the transactional integrity for a specific set of customer transactions, thereby providing appropriate rollbacks and/or compensating actions for these specified transactions.
The target system provides an asynchronous request API for the aggregated customer transaction required by the client system. The submitted request will be stored immediately in the database, and the target system can start the processing flow performing the various tasks, compensating already performed tasks if required, thereby even supporting manual interventions. The client system can choose to perform other tasks while checking regularly for an answer, or opt to wait or block until the aggregated service is fully executed. The latter option would emulate a synchronous service call at the client side.
Decomposing business process models into state machines, it is quite common to encounter so-called hierarchical workflows, i.e. the creation in a workflow of an instance of another data element, which triggers a secondary workflow. We distinguish two possible scenarios.
Fire and forget: a workflow task triggers a secondary flow, which can be processed in an independent way.
A workflow task triggers a secondary flow, but the task in the primary flow can only be considered complete when processing is completed in the secondary flow.
In the second scenario, one should be cautious about possible dependencies and the corresponding coupling. More specifically, coupling may surface in two ways.
In case the primary workflow and/or task is polling the secondary flow that it has triggered, this could lead to lots of threads and coupling in time between software modules.
In case the secondary workflow has to report back to the primary flow upon completion, this could lead to a bidirectional dependency between the two workflows. This could result in a technical issue in case the two workflows belong to a different software component.
In order to avoid coupling and even possible locking in time, NS theory suggests that the secondary workflow needs to report back upon completion. However, to avoid bidirectional coupling between the code of both workflows, this should be done using a loosely coupled mechanism. An example of such a loosely coupled mechanism would be a service interface on the primary workflow. This would allow the primary flow to pass the URL of that service interface as a callback function to the secondary flow, avoiding tight coupling in the source base.
Based on the confirmation or completion message of the secondary workflow, the primary flow will set the corresponding status on the target data element of the primary flow. In this way, the secondary workflow does not need any knowledge of the possible states of the primary flow. In case multiple subflows need to be processed and completed before the primary flow is able to continue, it is again (a task in) the primary flow that needs to have the appropriate knowledge to combine the information provided by the various subflows.