Machine Learning Prague 2019
Last weekend I was lucky to attend Machine Learning Prague 2019 for free thanks to the CzechAi. The whole conference takes two days and there was about 45 speaker, but the main part was meeting old friends and make the new one. In the end, I was surprised that there were so many great people in one place. I liked this conference mainly because of them.
I will try to summarize my notes from lectures which I found interesting:
- Data-driven System health determination in Monitoring Software for Operational Intelligence (by Vitezslav Vlcek from Broadcom)
- What is the anomaly?
- Rule is also known as the 68–95–99.7 rule or the “three-sigma rule of thumb”
- Detection based on Wave Function Collapse algorithm where you are trying to match different patterns on a time series data and look for anomalies - the advantage is the adaptation to new patterns (it is an anomaly at the beginning, but then it is normal)
- Need far less data than Neural Networks
- What is the anomaly?
- Parameter Server Suck, All Hail Horovod (by Ruksi Laine from Valohai)
- I have seen usage of distributed computing (by the way, also one of the hardest subject from my master study)
- For Asynchronous SGD
- Ring/Butterfly Allreduce
- Framework Horovod (Open MPI or another MPI implementation)
- Horovod is a distributed training framework for TensorFlow, Keras, PyTorch, and MXNet. The goal of Horovod is to make distributed Deep Learning fast and easy to use.
- Solving the Text Labelling challenge with EnsembleLDA and Active Learning (by Alexander Loosley from Data Reply)
- ELMo -> Word Mover Distance -> tSNE -> kNN
- Looking forward to videos/material
- The Labels are Out There (by Lotem Peled)
- Using datasets which are not datasets (opensubtitles.org)
- Using crowdsourcing which is not crowdsourcing (Fiverr - profesionals to annotate your data)
- Deep Neural Networks for Optical, Multispectral and Radar Satellite Imagery. Can GANs help us? (by Jan Zikes from Spaceknow)
- Solving problems with clouds in the pictures
- SAR (radar) data
- Pix2Pix - solution to image-to-image translation problems
- CycleGAN - image-to-image translation (i.e. pix2pix) without input-output pairs
- Luigi pipeline - helps you build complex pipelines of batch jobs
- Solving problems with clouds in the pictures
- Machine Learning for recommender system (by Marc Romeyn from Spotify)
- Discovering new demands (playlist Peaceful Piano)
- Way to create new playlists
- By Editors
- By Algorithm
- By Editors + Algorithm
- Word2Vec for songs
- Sentence is a history of songs
- Word is a song
Extra notes
- Using black-box approach is not enough for winning Kaggle and you also need data/business understanding
- Massive usage of XGBoost in Seznam for correcting typos in query
- provides a parallel tree boosting (also known as GBDT, GBM) that solve many data science problems in a fast and accurate way
- Actual “Last letter” (after committing suicide) usually contains specific vocabulary, longer sentences and neutral sentiment
Links
- Datalore - an intelligent web application (your data are in the cloud) for data analysis
- Tesseract OCR
- Cloud AutoML from Google - finding topology and hyperparameters of the models
- 2017 Yellow Taxi Trip Data
- MLFlow - (currently in beta) is an open source platform to manage the ML lifecycle, including experimentation, reproducibility and deployment
- AI art Online - a collection of art, music and design using machine learning
- Magnitude - a fast, simple vector embedding utility library
Videos:
My pictures
I am looking forward to getting through all these new ideas. Unfortunately, the days have only 24 hours …
Written on February 25, 2019