Machine Learning Prague 2019

Last weekend I was lucky to attend Machine Learning Prague 2019 for free thanks to the CzechAi. The whole conference takes two days and there was about 45 speaker, but the main part was meeting old friends and make the new one. In the end, I was surprised that there were so many great people in one place. I liked this conference mainly because of them.

Machine Learning Prague 2019
Machine Learning Prague 2019

I will try to summarize my notes from lectures which I found interesting:

  • Data-driven System health determination in Monitoring Software for Operational Intelligence (by Vitezslav Vlcek from Broadcom)
    • What is the anomaly?
      • Rule is also known as the 68–95–99.7 rule or the “three-sigma rule of thumb”
      • Detection based on Wave Function Collapse algorithm where you are trying to match different patterns on a time series data and look for anomalies - the advantage is the adaptation to new patterns (it is an anomaly at the beginning, but then it is normal)
      • Need far less data than Neural Networks
3 Sigma rule
3 Sigma rule
  • Parameter Server Suck, All Hail Horovod (by Ruksi Laine from Valohai)
    • I have seen usage of distributed computing (by the way, also one of the hardest subject from my master study)
    • For Asynchronous SGD
    • Ring/Butterfly Allreduce
    • Framework Horovod (Open MPI or another MPI implementation)
      • Horovod is a distributed training framework for TensorFlow, Keras, PyTorch, and MXNet. The goal of Horovod is to make distributed Deep Learning fast and easy to use.
Data vs Performance by Andrew Ng
Data vs Performance by Andrew Ng
Butterfly Allreduce aneb Motylek
Butterfly Allreduce aneb Motylek
  • Solving the Text Labelling challenge with EnsembleLDA and Active Learning (by Alexander Loosley from Data Reply)
    • ELMo -> Word Mover Distance -> tSNE -> kNN
    • Looking forward to videos/material

  • The Labels are Out There (by Lotem Peled)
    • Using datasets which are not datasets (opensubtitles.org)
    • Using crowdsourcing which is not crowdsourcing (Fiverr - profesionals to annotate your data)

  • Deep Neural Networks for Optical, Multispectral and Radar Satellite Imagery. Can GANs help us? (by Jan Zikes from Spaceknow)
    • Solving problems with clouds in the pictures
      • SAR (radar) data
    • Pix2Pix - solution to image-to-image translation problems
    • CycleGAN - image-to-image translation (i.e. pix2pix) without input-output pairs
    • Luigi pipeline - helps you build complex pipelines of batch jobs

  • Machine Learning for recommender system (by Marc Romeyn from Spotify)
    • Discovering new demands (playlist Peaceful Piano)
    • Way to create new playlists
      • By Editors
      • By Algorithm
      • By Editors + Algorithm
    • Word2Vec for songs
      • Sentence is a history of songs
      • Word is a song
How Word2Vec is used in Spotify
How Word2Vec is used in Spotify

Extra notes

  • Using black-box approach is not enough for winning Kaggle and you also need data/business understanding
  • Massive usage of XGBoost in Seznam for correcting typos in query
    • provides a parallel tree boosting (also known as GBDT, GBM) that solve many data science problems in a fast and accurate way
  • Actual “Last letter” (after committing suicide) usually contains specific vocabulary, longer sentences and neutral sentiment

Links

Videos:

My pictures

There is a typo in CezchAi
There is a typo in CezchAi
Solution of typo after few drinks
Unfortunately, I found a solution for a typo after few drinks
Lematization in Czech
Lematization in Czech
Spotify in numbers
Spotify in numbers
One of the most technical presentation of MLPrague
One of the most technical presentation of MLPrague
EnsembleLDA
EnsembleLDA

I am looking forward to getting through all these new ideas. Unfortunately, the days have only 24 hours

Written on February 25, 2019