Python - Custom defaultdict (ENG)
I want to show you a way how to implement general custom class with defaultdict which you can nested together to create very useful structures.
Task
- Make a counter which will store a number of a certain event for certain time window
- After specified amount of time will save the values of a counter into the database (not on this pages)
- The event is described by datetime (of logging window), id, zone_id and type
Firstly, I was thinking about series of nested defaultdict:
self._counter = defaultdict(lambda: defaultdict(lambda: defaultdict(lambda: defaultdict(int))))
...
self._counter[datetime][item_id][zone_id][type] += count
That solution is working but is there a “better more object-oriented” way? Of course - more complicated custom class with defaultdict as a parameter:
class BaseCounter(object):
def __init__(self):
print ("BaseCounter init = ")
self._counter = 0
def increment(self, count=1):
self._counter += count
def items(self):
return self._counter
class DictCounter(object):
def __init__(self, dict_class):
self._counter = defaultdict(dict_class)
def increment(self, key, value, *args, **kwargs):
print (key, value, args, kwargs)
self._counter[key].increment(value, *args, **kwargs)
def items(self):
result = []
for key, counter in self._counter.items():
result.append((key, counter.items()))
return result
Then if you want to do the same as above:
y = DictCounter(lambda: DictCounter(lambda: DictCounter(lambda: BaseCounter())))
y.increment(10,1,2,3)
y.increment(10,1,2,3)
y.increment(10,1,3,3)
y.increment(10,2,2,3)
to get:
10 1 2 6
10 1 3 3
10 2 2 3
Granularity
Nice way … but what about time granularity (time logging window)? Granularity means that I want to group events in certain time window (for example 5 minutes). So, we can just make a child of class DictCounter with that functionality:
class TimeGranularCounter(DictCounter):
def __init__(self, dict_class, granularity=10):
super(TimeGranularCounter, self).__init__(dict_class)
self._granularity = granularity
def increment(self, item_id, zone_id, type, dt, count=1): # pylint: disable=arguments-differ
key = self.granular_datetime(dt)
print(key)
self._counter[key].increment(item_id, zone_id, type, count)
def items(self):
result = []
for dt, counter in self._counter.items():
for k, v in counter.items():
result.append((dt, k, v))
return result
def granular_datetime(self, dt):
assert isinstance(dt, datetime)
minute = dt.minute - (dt.minute % self._granularity)
return dt.replace(minute=minute, second=0, microsecond=0) + timedelta(minutes=self._granularity)
usage is as simple as DictCounter:
counter = TimeGranularCounter(DictCounter(DictCounter(DictCounter(BaseCounter))), 12)
dt = datetime(2018, 7, 25, 10, 48)
counter.increment(327, 874, 'click', dt, 11)
counter.increment(327, 874, 'click', dt, 11)
dt = datetime(2018, 7, 25, 10, 50)
counter.increment(327, 874, 'click', dt, 11)
dt = datetime(2018, 7, 25, 23, 48)
counter.increment(327, 874, 'click', dt, 11)
counter.increment(327, 874, 'impress', dt, 11)
to get
2018-07-25 11:00:00 327 [(874, [('click', 33)])]
2018-07-26 00:00:00 327 [(874, [('click', 11), ('impress', 11)])]
Pretty nice, hah?
Watch my mistake
Don’t do the same mistake as me … If you define DictCounter like this
class DictCounter(object):
def __init__(self, dict_class):
self._counter = defaultdict(lambda: dict_class)
...
y = DictCounter(DictCounter(DictCounter(BaseCounter())))
it won’t be working. Try to guess why not?
Explanation
Because that way, we will be using the same object for every new key in our object’s counter. That means that there will be unexpected behavior with references. For more detail see this.