Converting a Python Backend to Clojure

I inherited an application written entirely in Python. That application was written exclusively in Python 2, and only ran on Windows. It has been a pain because the code is inflexible and insecure. Python 2 has been end-of-lifed since January of 2020! I'll leave it to the reader to look up what year it actually is (and what year it will be in 2 weeks). This necessitated a rewrite of most of the code, at the very least to Python 3, to improve security.

I did not choose Python 3, due to many reasons:

  1. This would require distributing a Python runtime to customers, which some did not want to install
  2. This increased the operations burden, of keeping the Python runtime updated and keeping Python dependencies patched.
  3. Python is slow, and this code runs on customer computers that are slow and outdated.

Client-side

The application was made up of different components. Some ran on the client computer, while others could be offloaded to the server. The existing monolithic, mono-file Python app though ran entirely client-side. I made the decision to re-write the client components in C#, replacing Python components on the customer computer with .NET AOT binaries. This reduced the operational burden placed on the customer, simplified our deployment model, and made the application much faster and more efficient with customer resources. Not what I came here to talk about though.

Server-side

There was a number crunching component to this app that would do well on the server. Clients really only needed to gather data.

For this server-side component, as you could guess, I chose Clojure. Clojure made development of the backend fast and easy. Why do I think that is? Because Clojure is expressive, interactive, and has wonderful interop with its host ecosystem Java.

Clojure is Expressive

This is all going to be from my nooby perspective here, but I love the expressive power of Clojure, and what you can get done in smaller amounts of code than in other languages. The power comes with the macro system, and the destructuring that one can do with their data. The issue you may have coming from other languages though, is getting used to a data-first mentality.

(defn- average-point
  "Given a vector of points return the average point"
  [pts]
  (let [c (count pts)
        key-summer (fn [k] (->> pts (map #(get % k)) (reduce +)))
        os (key-summer "os")
        xs (key-summer "x")
        ys (key-summer "y")]
    {:os (double (/ os c))
     :x (double (/ xs c))
     :y (double (/ ys c))}))

One of the tasks that I had to complete in development of this backend was to implement a relative moving average of points gathered by the client. Already a lot of things are happening here that would not be possible in other languages. Let's walk through them one-by-one from the top down.

  1. let allows us to create local variable bindings for the scope defined by the let expression.
    1. key-summer defines a function that takes the input pts to average-point, and sums together the result of getting an input key k across the pts vector using reduce
    2. os, xs, and ys are the results of calling key-summer for input string keys.
  2. Where's the return? The last expression evaluated in Clojure is the return value of a function. In our case, it's the map with :os, :x, and :y

So what does this example actually demonstrate? In my opinion, it demonstrates two key concepts.

First is that it's data-first. We define this function to take in a vector of maps, and return a map as output. This is very common in Clojure code that I have found, that most functions aim to take in input, transform it some way, and return output. Without side-effects, this is called functional purity. Functional purity means that the same input will always lead to the same output.

Second, the expressive power of the language. The key-summer function is simple, but the power that it affords us in later bindings to get the values we need is quite complex. Functions are just like any other value in Clojure, so we can define key-summer like an ordinary variable and use it to define other variables as we go. This is because functions are first-class citizens in Clojure, due to Clojure being a functional programming language.

Here's another example I like, using de-structuring binds and a threading macro:

(defn map-slides
  [{:keys [slides] :as e} slide-map-fn]
  (->> slides
       (map slide-map-fn)
       vec
       (assoc e :slides)))

So what's going on here? What's that funky syntax within the function declaration? What's that argument list? What's the ->> mean?

This is using de-structuring and a threading macro. Here is an equivalent Python function. I'll write this Python in an imperative style to show what the Clojure code is doing.

def my_function(e, map_fn):
  slides = e['slides']
  for i, slide in enumerate(slides):
    slides[i] = map_fn(slide)
  e['slides'] = slides

Now, there are a few things wrong with this Python code that we could point out. For one, it has a side-effect. slides when we access it with e['slides'] is a reference, not a copy. When we loop over the slides on the next line, we are modifying the data present within e instead of a copy of that data. This is a problem because it is impure. Every time you call my_function with the same values, you'll get different results.

Clojure doesn't have these issues. You'll get the same result every time you call map-slides with a new input because Clojure data structures are immutable by default. The last expression in the threading macro, (assoc e :slides) is going to return a new copy of e with the :slides key updated.

Now, to explain de-structuring and threading macros. In map-slides, the function argument {:keys [slides] :as e} may look confusing. This still expects one argument to be passed in this slot. What makes this unique though, is two different assignments are happening. The value slides is being bound to the equivalent of (get e :slides). And e is being bound to the entire map being passed as input. Clojure provides these niceties as part of the language, because immutability and data are so central to the language design. You'll be pulling apart maps all day, so it may as well take less characters to do so.

->> is a threading macro. It will thread together function calls passing the result of the previous expression to the next one. ->> is the "thread-last" macro, and means that the result of evaluating the previous function will be passed as the last input to the next function.

Clojure is Interactive

The language is unlike anything else I've ever used in terms of interactivity. Every time I fire up my editor I enter the flow state, due to how fast I can get feedback on the code that I'm writing. This has ended up being one of my favorite features of the language.

Typical Workflow Shot

Interactivity allows you to interact with the program as you build it. I'll admit as a noob I had no idea what that meant. I was firmly in the "Python has a REPL too" camp until very recently. The power that this affords you is incredible. This has led to a workflow like this for me:

  1. Start with a "scratch" file for creating functions that are loosely what I'm thinking the program should look like.
    1. Evaluate those functions by sending their definitions to the REPL
    2. Re-evaluate those functions by sending new function definitions to the REPL
  2. Define "scratch" data for those functions to act on.
    1. Evaluate and re-evaluate my data as I develop.
  3. As the functions mature and grow, move them to their new home, a shiny new namespace if necessary.
    1. Evaluate those namespaces in the REPL
  4. Start pulling in "real" data. In my example workflow above, that was from Firebase.
  5. Further create and refine functions using "real" data.

The real power is in the re-definition of functions and data. Usually this is hot-keyed in your editor (mine is Emacs) so that you can re-define functions by having your cursor on them and pressing a magic key combination. Why is this so important? Because there is no compile cycle here. You don't like the way a function is running, you change it while developing. And finally, you don't know how something actually works? You can just create a test function and test data while reasoning through the problem. Once you've clarified the problem that you are trying to solve, you can create your real function and put it into it's proper namespace, moving on to the next issue.

No time is wasted in the above approach in boilerplate, like defining API endpoints. I don't waste that time because I don't have to. What I want to work on is the actual business logic that I will be replacing with Clojure code. The scratch file can get unwieldy, but every time it does I start thinking about how to re-organize things.

When I would notice that I have a group of functions forming that deal primarily with the shape of the data, I'd create a new file/namespace for those functions and move them there. As the programmer you are free to see the patterns emerging and start shuffling around your functions as you see fit. This is freeing for someone like me who spends a lot of time getting lost in languages like Java where boilerplate is king. I have all the time in the world after the business logic is developed to pull in libraries that turn this code into a web service, but I don't need to to start.

Clojure has great Interop with Java

Clojure being hosted on the JVM, calls Java seamlessly. As I progress in my development career, I've become more enamored with established platforms for their sustainability and their performance. Legacy code can be awful, but these platforms have staying power and that is undeniable. The final reason I chose Clojure, was due to the Java interop and needing some Java dependencies.

Here is a snippet from my codebase:

(ns notebooks.hit-exploration
  (:import (com.google.firebase FirebaseApp FirebaseOptions FirebaseOptions$Builder)
           (com.google.cloud.firestore Firestore)
           (com.google.firebase.cloud FirestoreClient)
           (com.google.auth.oauth2 GoogleCredentials)
           (java.net URL)))

;; Firebase initialization code
(def creds (GoogleCredentials/getApplicationDefault))
(def options (-> (new FirebaseOptions$Builder)
                 (.setCredentials creds)
                 (.setProjectId "myproject")
                 .build))
(defonce app (FirebaseApp/initializeApp options))
(def db (FirestoreClient/getFirestore))
(def test-collection (-> db
                         (.collection "mycollection")
                         .get))

This snippet shows how I interact with the Firebase SDK through Java interop, starting with the :import directive in the namespace. Java code is imported through :import, and you specify the java classes to import in pairs.

com.google.firebase is the package, and FirebaseApp, FirebaseOptions, and FirebaseOptions$Builder are the classes I'm importing (from the first pair in my ns declaration.

You can then use these classes like any other value in Clojure. I use new to instantiate them, but then pass them through threading macros like any other Clojure value. The difference here is what functions I'm calling. In Clojure, you call methods on Java objects using the syntax (.methodName <java object>) . This works wonderfully for methods that require multiple arguments, as you can use the threading macro to ensure that your Java object is always the first argument. -> is the "thread-first" macro.

Finally, defonce is used because Firebase doesn't like to initialize an application more than once in code. I was running into this issue when re-evaluating my buffer in Emacs (interacting with the program as I develop it), defonce ensures that the binding happens only once. Any further attempt to evaluate that binding will do nothing.

Conclusion

I'll be writing more about Clojure as I continue my journey with the language. I've been in software development for 10 years now and had been feeling burnout from previous jobs, but languages like Clojure are giving me hope again. It's been a long time since I've looked at the clock and seen hours go by that felt like minutes. I hope to one day work in Clojure, and to contribute to the world of open source software with the Clojure programming language!