My name is Russell Duhon, and I’m a software developer with ModusBox, where a lot of what we do is help organizations send, receive, and combine data, between lots of different places, in managed, reliable ways. We mostly do those things in the context of Digital Transformation, helping organizations accomplish business processes routinely and quickly, and enabling their customers to accomplish tasks and receive instant updates anywhere from the web to their phone.
One of the most critical steps along the way is data transformation. In order to move data in and out of core business systems, where orders are kept and processed, where invoices are managed, where customer service records are kept, and every other thing organizations do on computers, that data needs to be transformed from or to the format some other system needs, perhaps to comply with an interoperable technical standard, or to match what analytics software expects, or to mimic a payload an app was already written to receive.
This transformation can be very complicated, involving extensive business rules, enrichment with other data, and so on. What’s more, it often needs to be updated regularly, as business rules and the details of the data flowing to them change over time.
A common, powerful approach is to manage data transformation code separately, and have it updated by business analysts who are experts in the data itself.
But how do people create the data transformations?
There are a variety of available choices, from visual designers to dynamically loaded programming languages. But many common choices, such as XSLT and Javascript, have a lot of sharp corners for people who are not full time software developers. Others are proprietary, and often cannot be used across multiple platforms.
And that’s why we’ve created DataSonnet.
DataSonnet is an open source data transformation platform, using an open source programming language named JSonnet that is already in wide use for configuration management. It runs on the Java Virtual Machine, which is the most common runtime for Digital Transformation, and can also be used as an external tool on other platforms, so it can be easily ported into any system, and is already being ported into several of the most common.
DataSonnet, via JSonnet, has features that make writing data transformations easier and safer.
⁃ Data structures in code look like JSON data structures
⁃ Unlike Javascript and many other languages, missing variables and incompatible type combinations give clear, immediate errors
⁃ Functional, immutable data makes spaghetti code with global effects impossible
⁃ Comprehensions lead to much more readable processing over multiple items and are similar to those in Python
⁃ Strong syntax for making derived and updated values in data structures
⁃ Clear stack traces for errors that involve multiple pieces of code
⁃ Formally defined and deeply tested semantics for the core language, avoiding unexpected and inconsistent behavior
Additionally, in DataSonnet we’re providing additional functionality to support data transformation specifically.
⁃ Pluggable data formats for input and output, including for Java objects
⁃ Powerful regular expressions that are guaranteed to run quickly
⁃ Extensive date & time manipulation capabilities
⁃ JSONPath lookups of values in data structures
⁃ Utility functions for common data structure transformations
We’re also providing additional tooling, including an IntelliJ Plugin.
If you currently do or need data transformation in your organization, we hope you’ll look at DataSonnet. We’re using DataSonnet in production already, and are happy to help with integrating it in your projects. We’ve already had our first community contribution, and are excited for more.
I’ll write more about our road map later, but some of the things on it include automatic discovery of DataSonnet libraries written in Java, full autocomplete in IntelliJ, schema validation, and external lookup tables. If you aren’t seeing everything you need, reach out, we want to make DataSonnet better for everyone. I can be contacted at russell.duhon@modusbox.com.
If you’d like to try things out, the easiest way is via our Getting Started tutorial in our documentation, at https://datasonnet.s3-us-west-2.amazonaws.com/docs-ci/primary/master/index.html, which guides you through installing the IntelliJ Plugin and using it to demonstrate how the language works. We’re especially hoping to rapidly expand our documentation to include everything you need to know, including a long list of cookbook examples ranging from simple to complex.