Dr. Jennifer Prendki
10日培训讲师
VP of Machine Learning, Figure Eight
Introduction to active learning and its practices
The greatest challenge when building high performance model isn't about choosing the right algorithm or doing hyperparameter tuning: it is about getting high quality labeled data. Without good data, no algorithm, even the most sophisticated one, will deliver the results needed for real-life applications. And with most modern algorithms (such as Deep Learning models) requiring huge amounts of data to train, things aren't going to get better any time soon. Active Learning is one of the possible solutions to this dilemma, but is, quite surprisingly, left out of most Data Science conferences and Computer Science curricula. This workshop is hoping to address the lack of awareness of the Machine Learning community for the important topic of Active Learning.
Workshop objectives:
The workshop is intended for practitioners interested in optimizing the amount of data they use to train a model (typically because of a limited data labeling budget), or simply interested in learning about Active Learning or Semi-Supervised Learning in general.
Outlines:
Traditionally, data scientists have been trained to collect data prior to fully understanding the task at hand or its implication because a lot of time was required to gather sufficiently enough historical data; with the advent of Big Data this has changed even though the old habits die hard. As an attendee, you will learn how to build your training set and your model concurrently using Active Learning and how to reduce the amount of required labels which still reaching the same accuracy. Most data scientists have no working knowledge of the field, and there is very little literature on how to build an Active Learning strategy in the real world. Most papers translate research performed within academical circles which never address a general methodology, but rather case-by-case reviews of different scenarios which don’t really help in real life. In this interactive training, you will be able to design an active learning approach to specifically improve the model of your choice, and will gain real-life experience in building querying strategies.
9:00-10:00am: Introduction on active learning
10:00-11:00am: deep dive on active learning
11:00-12:00pm: code labs
12:00-13:30pm: lunch break
14:30-16:00pm: project
16:00-16:30pm: wrap up
What attendees will learn:
1、How Active Learning works, and why it is attractive
2、The challenges and limitations of Active Learning
3、Identify if a specific use case and dataset are appropriate for Active Learning
4、Design a querying strategy appropriate for their own task (if relevant)
5、The typical limitations related to Active Learning
Difficulty level:
Intermediate. The techniques associated with Active Learning are fairly orthogonal to Machine Learning, but it understanding the fundamentals of Machine Learning and Deep Learning is still needed to understand how Active Learning operates.
Prerequisite:
A solid understanding of Machine Learning, in particular supervised learning.
A working knowledge of Python.
题目:主动式机器学习具体应用和实战
简介:创建高性能模型的最大挑战不是选择合适的算法或调谐超参数,而是获取高质量的标记数据。没有高质量的数据,再复杂的算法也无法交付实际应用所需的结果。由于大部分现代算法(比如深度学习模型)都需要大量的训练数据,这种情况短期内也不会有所好转。主动式机器学习是解决这一困境的可行方法之一。本次培训通过技术介绍,细节分析,案例介绍和动手练习来学习主动式机器学习的具体应用和实战。
大纲:
一直以来,为充分理解手头的任务或其影响,数据科学家会先收集数据。他们在这方面接受了专门的训练,因为收集充足的历史数据需要大量的时间。随着大数据时代的到来,虽然旧习难改,但情况已经不复往昔。通过参加本次培训,您将会学习如何通过主动式机器学习同步构建自己的训练集和模型,以及如何在减少所需标记量的同时,保持其准确性。 大部分数据科学家没有这方面的应用知识,关于如何在实践中创建主动学习策略的文献也很少。大部分论文是在解释学术界开展的研究,却从未涉及一般的方法论,只是对不同的场景进行个案分析,这对真正的实践帮助不大。在本次交互式培训中,您将能够设计一种主动学习方法以专门改进您选择的模型,并获取创建查询策略的实战经验。
9:00-10:00am: 主动式机器学习介绍
10:00-11:00am: 深入分析和介绍技术要点和设计
11:00-12:00pm: 动手练习
12:00-13:30pm: 午餐
13:30-14:30pm: 案例分析和讨论
14:30-16:00pm: 项目练习
16:00-16:30pm: 总结和讨论
你的收获:
• 主动式机器学习的原理及其为什么别具吸引力
• 主动式机器学习的挑战和局限性
• 辨别特定用例和数据集是否适用于主动式机器学习
• 设计适用于各自任务的查询策略(如相关)
• 与主动式机器学习相关的常见限制
适合人群:
本培训主要面向对优化模型训练数据量(通常数据标记预算有限)感兴趣的从业人员,或者单纯地希望了解主动学习或半监督学习的从业人员。
难度:
入门到中等。主动学习与机器学习的相关技能有重合之处。为了充分理解主动学习的运行机制,您需要先掌握机器学习和深度学习的基础知识。
提前掌握知识点:
• 机器学习知识基础,尤其是在监督学习方面
• Python 应用知识
注意事项:
1.携带笔记本电脑做练习
2.英文演讲+中文翻译支持
3.小班上课、限额50人