Aggregate deeply nested JSONField data in Django

October 24, 2018, 6:19 am

I ran across this problem the other day and it seems there aren't useful answers on this particular issue.

A combination of Django and Postgres makes it very easy to persist JSON data in your models. I think generally when building an application data schema storing JSON data is usually a bad idea. In fact this post and its partial solution is an example of issues you can have expecting JSON fields to behave predictably (in ORM terms).

In this case I wanted to store responses from a third party API and didn't think I would need to query data in those responses (hence no formal schema for the values within). Of course a requirement eventuated that required querying and then aggregation on values in the JSON I was storing and that is where I hit the problem described below.

(N.B. As I write this, with some hindsight, I think the correct solution is actually to migrate the values out of the JSON and store them in dedicated fields in the model, but anyway).

So if you want to filter by deeply nested values in a JSONField that part is fine. It actually works nicely with query_set syntax that matches doing joins across reverse relationships or foreign keys.

class ThingResponse(TimeStampedModel):
user = models.ForeignKey(User, on_delete=models.PROTECT, related_name="thing_responses")
    data = JSONField(_('The actual response from thing'))

If an example data looks like:

{
  "house": {
    "name": "Lannister",
    "seat": "Casterly Rock",
    "scion": "Tywin",
    "heir": "Jaime",
    "chapter_introduced": 5
  },
  "name" : "Tyrion Lannister",
  "age" : 24
}

and you want to find all the Lannisters, you could write a queryset like this:

ThingResponse.objects.filter(data__house__name="Lannister")

Things get tricky when you want to aggregate data in the JSONField. Aggregating on a value at the top level is fine (as described in this SO answer):

from django.db.models.functions import Cast
from django.db.models import Min, IntegerField
from django.contrib.postgres.fields.jsonb import KeyTextTransform

ThingResponse.objects.annotate(age_value=Cast(KeyTextTransform('age', 'data'), IntegerField())).aggregate(Min('age_value'))

You can annotate the queryset with nested values by nesting the KeyTextTransform like this:

ThingResponse.objects.annotate(chapter=Cast(KeyTextTransform('chapter_introduced', KeyTextTransform('house', 'data')), IntegerField()))

So this works fine and will annotate your result with a 'chapter' property with the nested value in it. Problem happens when you try and aggregate on that nested value:

ThingResponse.objects.annotate(chapter=Cast(KeyTextTransform('chapter_introduced', KeyTextTransform('house', 'data')), IntegerField())).aggregate(Min('chapter_introduced'))

You'll get an error like this:

File "[env path]/lib/python3.6/site-packages/django/db/models/sql/compiler.py", line 128, in get_group_by if (sql, tuple(params)) not in seen: TypeError: unhashable type: 'list'

I think what is happening here is the KeyTextTransform hasn't been resolved by this time so instead of a value it is getting a list of arguments.

So this is as far as I got with this and ended up implementing the aggregation in memory. If you do have a similar issue and find a solution to the issue with aggregating on nested JSON values I would love to hear from you.

Permalink - Comments - Tags: Development

Aggregate functions are not allowed in this query

September 22, 2017, 11:19 am

Got burnt today by an odd error thrown by the Django ORM when writing some tests. It was an easy mistake to make and the exception "Aggregate functions are not allowed in this query" has lots of unrelated Google juice and was not helpful in getting to the bottom of the problem. Hence this post.

Start with a pretty simple model that we are going to be updating via an API.

1
2
3
class Foo(models.Model):
    name = models.CharField(max_length=512)
    value = models.CharField(max_length=512)

Then write some code to hit an API via requests and update the database with a new value.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
import requests
from .models import Foo


def make_request():

    # go get foo from your api
    resp = requests.get("http://your.thing/api/foo/name")

    # the name and value of the foo?
    value = resp.json()["value"]

    # Grab the instance of foo corresponding to name
    # and update its value
    foo = Foo.objects.get(name="name")
    foo.value = value
    foo.save()

Then a test using a patch on yourthing.helpers.requests so we don't hit the real API when we run our test. Let's pretend that make_requets has some side effect that we are testing for rather than the value stored in the Foo table. Given that assumption we might not mock the actual values returned by the API (see THIS --->).

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
from mock import Mock, patch

from django.test import TestCase

from .models import Foo
from .helpers import make_request


class MockTestCase(TestCase):

    @patch('yourthing.helpers.requests')
    def test_an_api(self, mock_requests):

        # Create the instance we are going to update
        Foo.objects.create(name="name", value=0)

        # THIS ---> mock_requests.get.return_value.json.return_value = { "value" : 1}
        make_request()

        # Test for something unrelated to the contents of the Foo table. Perhaps some other
        # side effect of the make_requests() call.

Run this test and you will get this error.

======================================================================
ERROR: test_an_api (yourthing.test_helpers.MockTestCase)
----------------------------------------------------------------------
Traceback (most recent call last):
File "/Users/tomhorn/environments/test_mock/lib/python3.6/site-packages/mock/mock.py", line 1305, in patched
return func(*args, **keywargs)
File "/Users/tomhorn/dev/sandbox/test_mock/yourthing/test_helpers.py", line 18, in test_an_api
make_request()
File "/Users/tomhorn/dev/sandbox/test_mock/yourthing/helpers.py", line 17, in make_request
foo.save()
File "/Users/tomhorn/environments/test_mock/lib/python3.6/site-packages/django/db/models/base.py", line 807, in save
force_update=force_update, update_fields=update_fields)
File "/Users/tomhorn/environments/test_mock/lib/python3.6/site-packages/django/db/models/base.py", line 837, in save_base
updated = self._save_table(raw, cls, force_insert, force_update, using, update_fields)
File "/Users/tomhorn/environments/test_mock/lib/python3.6/site-packages/django/db/models/base.py", line 904, in _save_table
forced_update)
File "/Users/tomhorn/environments/test_mock/lib/python3.6/site-packages/django/db/models/base.py", line 954, in _do_update
return filtered._update(values) > 0
File "/Users/tomhorn/environments/test_mock/lib/python3.6/site-packages/django/db/models/query.py", line 664, in _update
return query.get_compiler(self.db).execute_sql(CURSOR)
File "/Users/tomhorn/environments/test_mock/lib/python3.6/site-packages/django/db/models/sql/compiler.py", line 1191, in execute_sql
cursor = super(SQLUpdateCompiler, self).execute_sql(result_type)
File "/Users/tomhorn/environments/test_mock/lib/python3.6/site-packages/django/db/models/sql/compiler.py", line 863, in execute_sql
sql, params = self.as_sql()
File "/Users/tomhorn/environments/test_mock/lib/python3.6/site-packages/django/db/models/sql/compiler.py", line 1143, in as_sql
raise FieldError("Aggregate functions are not allowed in this query")
django.core.exceptions.FieldError: Aggregate functions are not allowed in this query
----------------------------------------------------------------------

This unhelpful error might lead us to examine queries & relationship looking for an aggregate function. In fact there is no aggregate function. Instead this is happening because without an explict mock the resp.json()["value"] call in make_request will return a MagicMock instead of a string. When the ORM tries to save the instance it gets into trouble dealing with a field that will mock any property it tries to access.

1
<MagicMock name='requests.get().json().__getitem__()' id='4385055128'>

So I guess the moral of the story is to make sure you mock every response that ends up being passed to a Django ORM model instance. If you don't you might spent a chunk of time investigating rabbit holes as I did.

Permalink - Comments - Tags: Development,Django

clojure.pprint/cl-format is slow

June 5, 2016, 12:51 pm

Had a serious performance problem last week. After some digging we narrowed it down to a call to cl-format.

Some test code to demonstrate the discrepancy:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
;; Format a string, an integer, a conditional and fancy plural thing
(defn cl-format-test [] (clojure.pprint/cl-format nil "
    ~@[(Id: ~D)~] ~:[foo~;bar~] [~D second~:P ago]>"
                                      (java.util.UUID/randomUUID)
                                      (rand 1)
                                      (> (rand 2))
                                      (rand 100)))
;; Same inputs, just concat them and return the string
(defn str-test [] (str (java.util.UUID/randomUUID) " "
                            (rand 1) " "
                            (> (rand 2)) " "
                            (rand 100)))
;; Just format a string
(defn cl-format-simple-test [] (clojure.pprint/cl-format nil "~S" (java.util.UUID/randomUUID)))
;; Just return the string
(defn str-simple-test [] (str (java.util.UUID/randomUUID)))

(defn test-func [f tests] (time (doseq [_ (range 0 tests)] (f))))

(test-func cl-format-test 1000)
(test-func str-test 1000)
(test-func cl-format-simple-test 1000)
(test-func str-simple-test 1000)

With the following results. For the simple case, cl-format is 10 times slower than a simple string return. For more complicated format strings, closer to 100 times slower:

(test-func cl-format-test 1000)
"Elapsed time: 328.033183 msecs"
(test-func str-test 1000)
"Elapsed time: 5.939625 msecs"
(test-func cl-format-simple-test 1000)
"Elapsed time: 34.2998 msecs"
(test-func str-simple-test 1000)
"Elapsed time: 3.515013 msecs"

Having a quick look at the implementation of cl-format I immediately noticed a compile-format function which seems like something that might help out with performance for many calls to a cl-format with the same format string:

Compiles format-str into a compiled format which can be used as an argument to cl-format just like a plain format string. Use this function for improved performance when you're using the same format string repeatedly

It also seems like cl-format will check if it's format string is already compiled and skip compilation if that is the case:

(if (string? format-in) (compile-format format-in) format-in)

Oddly however, compile-format is not public, so I can't use it. So I am left a little confused. I am going to do some more digging tomorrow to solve this mystery as I am sure I am missing something here.

Not sure if pre-compilation will help solve the horrible performance you get from cl-format, but I am guessing it might help.

Update

After some discussion with my colleagues, it seems I haven't missed anything and we think this is a bug.

Permalink - Comments - Tags: Development,Clojure

Decoding Polylines from Google Maps Direction API with Clojure

January 17, 2016, 12:56 pm

My first crack at porting some imperative code to Clojure goodness.

I needed some code to turn the polyline points encoding you get back from Google Directions API. This stuff:

"overview_polyline" : {
            "points" : "_rtwFpgubMtBuGVNFPfAr@TNIVkB|FsBpG_C_BqA}@mGeEwH_FsDcCaGwDkNiJsJmG_SoMyByAzA}EPi@nA{Dp@qBtDdCLh@@VCT_@nAKLMBOAIG}@{Aq@eAWWOMIEWEY?UDYJi@]^w@|AcC|Ew@bBy@hCg@dBQ`AU~A_ChH_D|J_AzCc@rAwXl|@yLt_@k@hCSrBAtABp@LlALp@FPZ~@Tf@Z^X^jAx@PXRTrCxB~@`@lA^`@HVLv@j@n@h@`@Rb@Dl@Gl@WVQTU^w@PaA@gAKcAMa@Q]c@i@}@g@iDoAeDaAeDq@qC]i@Cg@@eAT[L[Vk@h@SZQ^_@bAeJ`XoAbEmB~GqAvDkDzIsFvNcAvCcFrRkBnHcE`Pq@~Bi@rAk@hA}@hA_@f@cAfAoDjDiAv@iB|@a@XoAf@iBj@yA`@aB{Dz@yCl@yAmE`A}Aj@qCjAuC~AkBnA_@ZcBxAoApA{BhCoBtCuJrOkAxAkBzCeH~KkAlBeMrR{I|M}CtEyBjD}EpHmLbQ{BdDiBvCUBMJy@l@g@VeAZs@HiABuKT_BEiBUaEkAcA]oCeAmAi@iCsAyCkBgEaDcCuBa@]kHgGkDyCQQOOG_@Fk@h@YfI_DxAi@vAg@^`@h@l@h@dL@v@Ef@Od@m@fAy@nBmAxCc@dAALAHQf@Qv@G^Et@@n@Fr@Z`BHlABnAp@Cj@Q"

I found Jeffrey Sambells' Java code here that does the job. I am just getting my teeth into writing Clojure for my day job at GoCatch, so I need a Clojure version. This is my first attempt. It's midnight and I haven't had a chance to check the line ends up on a map correctly, but it looks pretty good to me:

;; Port of the Java code to decode google polylines that I found here -> http://jeffreysambells.com/2010/05/27/decoding-polylines-from-google-maps-direction-api-with-java
(defn decode-next-result [encoded]
  ;; keep scanning through encoded till b>=0x20
  ;; returns the next latitude/longitude increment
  (loop [[current & rest] encoded shift 0 result 0]
    (let [b       (- (int current) 63)
          result  (bit-or result (bit-shift-left (bit-and b 0x1f) shift))
          shift   (+ shift 5)] 
      (if (>= b 0x20) 
        (do 
          ;; if we are encoding the next result then we 
          ;; must have more characters to scan
          (assert rest)
          ;; keep looking for our next result
          (recur rest shift result))
        ;; we found our next result 
        (let [return-value (if (not= (bit-and result 1) 0)
                             (bit-not (bit-shift-right result 1))
                             (bit-shift-right result 1))]
          [return-value rest])))))

(defn lat-lng-double [lat-lng-int]
  (/ lat-lng-int 1E5))

(defn path-for-encoded-polyline [encoded] 
  (loop [rest encoded lat 0 lng 0 results []]
    (if rest
      ;; if there is anthing in the encoded array
      ;; we should have two more results at least 
      (let [next-result       (decode-next-result rest)
            new-rest          (second next-result)
            next-lat-result   (+ lat (first next-result))
            next-result       (decode-next-result new-rest )
            new-rest          (second next-result)
            next-lng-result   (+ lng (first next-result))]
          ;; add our lat lng result to the results
          (recur new-rest next-lat-result next-lng-result (conj 
                                                            results 
                                                            {:latitude (lat-lng-double next-lat-result) 
                                                             :longitude (lat-lng-double next-lng-result)})))
      ;; we are done, return our results
      results)))

#_
(def example-polyline "pdymEssfy[rJhAlANNeALyBD{@XqFBW^_AZi@NMp@sAPu@Bk@Ce@Qs@uA_Di@{@u@oAi@kAYm@_@mASgBOgBAcC?sBKyBYkFCyD@q@XyDLu@pAsDzDyJ`AaCJa@He@Bq@FoCw@[aBcAMQm@a@eAw@k@]yBaAwBaAqAaA_@S}DmBwD{BuLeHQKu@y@}BiCIOoEaLaJmUSQKSa@sAe@wDK}@@[g@}Am@uAkBkEkCmGg@gAk@{AQk@e@wBo@yAcAyAs@aAgDqEYWSUYa@o@{AoDeKcAwCkBqGg@eCQyAGcBC{AHsCmCf@iD^iAx@}AdAcA|@w@rAmAlBaBt@a@t@Y|Bo@xAs@z@y@n@iAf@wANq@J}@FyAEgAIkBImB?s@DkAHu@eCVeDRiGF}FGiB[sBc@{Am@qBMu@MiBe@gNIwAUmAk@uCMcAU_FKcAe@aCw@uByA}BaAwAMUw@{AoAoCk@mAYi@eAuBmBuD_AuBMICCiB_EeCsFYu@Uu@c@yCo@}CAe@i@sBIc@K_ACaADyAr@iFvBgMbB_Kz@eFj@kD^}A`@uA`CiHPcABYB[Dq@@]Dg@eBZeAr@oAl@w@VUdBsAr@c@~CuBbC_Bn@]d@_@V]NUPa@Pq@Hs@@{@Cm@Ii@Ug@UcA_@kAo@uCqAeHg@kCSi@GK]uAUs@OWEGSYQQk@e@}@a@c@Kc@AiBQiAA[CiC?GAwBAUAu@USI[UQ[GMMk@EqBG_DKsCCcBJcDcCIuBK")
#_
(path-for-encoded-polyline (seq (char-array example-polyline)))


Permalink - Comments - Tags: Development,Clojure

Hundreds Chart Dev Diary

June 28, 2015, 12:32 am

I've had this idea for a game to teach kids times tables since I finished Wordflight two years go. I got bogged down in my initial implementation using OpenGL ES and a few weeks ago I decided to re-boot the project with flat UI, vivid colours and UIDynamics to make things interesting.

This time the project took two solid weekends (and a few late nights), but I am pretty happy with how it turned out


If you are reading this and you live in Australia, make sure you check out goCatch next time you need to catch a taxi.

Permalink - Comments - Tags: Development,iPhone,App