30 Things I learned Organizing South East Asia’s Largest Datathon
Organizing Data Unchained Malaysia was one of the most rewarding things I have ever done in my career. In November 2018, we invited 100 brilliant data enthusiasts, out of 300 candidates, to a resort in Kuala Lumpur to solve a data problem and to create a suitable business model for it.
We gave participants anonymized sample internet and phone call data, connected car records and some points of interest. We asked them to predict the destinations towards which cars are moving, to create a safe driving index and to build a business plan that uses these two models commercially — all this in just 24 hours. It was crucial for participants to possess a good blend of technical and business skills to make it through the competition.
Participants and judges in the competition
The datathon, or hackathon for data enthusiasts, was a collaboration between:
● A Malaysian digital and telecom conglomerate operating in 11 countries in South Asia and South East Asia, where I oversee Analytics / A.I.
● The Asia School of Business, a Joint Venture between the Malaysian Central Bank and Massachusetts Institute of Technology (MIT).
To set up the context, we had three main objectives when we launched this datathon:
● Energizing our organization and accelerate our data-driven cultural journey.
● Strengthening our employer brand and hiring the best available talent.
● Giving back to society and develop a talent pool of data professionals in Malaysia.
I learned as many things organizing this datathon as the participants themselves learned while solving the problem. These are my learnings from the perspective of a Chief Data Analytics Officer organizing a corporate datathon.
1) Participants must be both internal and external to your organization. Inviting external participants (up to 70%) will not only contribute to your employer brand and hiring pipeline, it will also energize your internal teams (who should be up to 30%) even more since external participants are not subject the biases and group-thinking of your organization.
2) Participants must be deeply cross-functional. Around 60% of them should be technical (data scientists, data engineers, full-stack developers, data architects and IT professionals) and 40% should have a background on different areas of business, most importantly marketing, operations, design and customer experience).
Judges in Data Unchained 2018
WORKING WITH PARTNERS:
3) You need a business school as a partner, not a technical school. Technical schools or universities might help you attract technical participants, but analytics is not about math, it is about solving real business problems through math. Find the most reputed business school in your city or country and partner up with them. The business school should provide the venue, co-lead marketing activities and provide a significant pool of their own students to participate.
4) You need a TV station or a media company as a partner, if branding is one of your objectives. Getting on traditional media will be difficult unless you get a mainstream local politician or a celebrity as a guest speaker in your datathon. You will be able to publish your press release in some specialized websites, but nobody reads press releases anyway.
5) You need industry or start-up partners, which are relevant to your industry and to the analytics / A.I. community. Industry partners can market the event through their own channels, which were extremely useful in our case. Partners can also contribute with excellent judges or keynote speakers. Additionally, cloud vendors might provide infrastructure services or interesting external datasets for the competition.
6) You might need sponsors. Organizing a datathon is not expensive though. Venue, food, sizable prizes to attract talent and a few travel expenses are all you need. If you budget is tight finding sponsors among your industry partners is a very realistic option.
24 teams working on their data and business models
MARKETING YOUR EVENT:
7) Paid Facebook advertising did not work, linked-in worked and it was free. Business-oriented channels are more suitable for the kind of profiles you need. Anyway, if they are not on linked-in you are not going to hire them. Are you? (Yes… I know your CEO is not on linked-in but that is another story)
8) You need to bring 3 to 5 heavyweight judges (including guest speakers), who have an interesting story to tell, ideally from abroad. Reputed analytics professors, an analytically-oriented C-Level of your organization or from leading industry players will make excellent judges.
CHOOSING THE PROBLEM:
9) You need to work on a real business problem which must be open-ended and challenging. It does not matter if participants cannot finish it. What matters is to make people think big and out of the box. Narrowly defined questions such as those in Kaggle.com competitions, where participants are asked to maximize a few technical parameters, are not that useful in business, because business is open-ended and ever-charging. It will add a lot value to the competition if you can bring a person from your organization or a partner to explain to participants which ones of her real business problems she thinks data analytics might help solving. In our case, we were lucky enough to partner with GoCar, a Malaysian online car rental start-up.
10) The problem should not be too specific to your industry (like churn prediction in telecom). Choosing a topic of a new adjacent area where your business might play in the future is more energizing for internal teams and it allows to create better synergies between internal and external participants
11) Everything can and should be challenged. There are no rules about problem solving to be respected. Participants need to assume that everything can start from scratch.
12) Teams need to have a working prototype which generates actionable insights out of data. This is not an idea pitching competition, but a demo followed by a business case presentation with a clear development path including all technical and business steps required.
GETTING THE DATA:
13) You do not need a massive dataset, unless you are focusing on deep learning. Keep your data relatively manageable (up to a few GB) since you are looking for creativity rather than for algorithm finetuning. A smaller dataset can run faster (even on a laptop) and be transferred faster.
14) Do not limit yourself to internal data. Internal data does not have the potential to change your business. Only external data has that potential. You can get data from industry partners or public sources. You can also scrap the Web or buy it.
15) You need data from dissimilar and unrelated sources. This is obvious for a data scientist, but the power of data increases dramatically when you connect information that was not connected before. How to connect information is often difficult. You can ask participants to solve it. Also please encourage participants to use any external data they can find on their own.
16) It is far easier to manage the whole data life-cycle for the datathon inside a cloud, than providing copies to participants. However participants do prefer to have their own copy so that they can analyze with their own tools.
17) It is totally mandatory to correctly anonymize all information used in the datathon to assure data confidentiality. Introducing a small white error to the data might also help.
SETTING THE FORMAT:
18) The competition should be by teams, but it should be possible to identify individual contributions. In the real world we work in teams but if you are planning to extend an employment offer to some of the participants, you must identify them first. As a result, you need individual and team prizes. The best way to identify individual contribution is to mingle with participants and see how they work and think. Tell your team to sit with them, chat with them without directly help them and try to think whether they would excited to work in the team with us. A datathon is a 24-hour undercover interview.
19) Participants should be allowed to choose their own teams. You could argue that in the real world you cannot choose your colleagues, but the reality is that not being able to choose their team is a big drawback for participants. In our case we had 24 teams of 4 or 5 participants each.
20) The competition must be face-2-face. There can be online sections, or some participants might access only online. However, face-2-face interactions are key if your objectives including branding, hiring or corporate social responsibility.
21) The 24-hour format to solve the problem works very well because allows introducing the problem and having a few keynote speakers, solving it in 24 hours, selecting the winners and giving the awards in one weekend. However, it requires to have good facilities that allow participants to stay overnight, if they wish to do so.
Judging technical models at 3:00 am
22) Your team needs to have solved the problem first and that solution is the benchmark against which you measure participants against model performance and creativity.
23) Judging needs to run in two stages: a technical pre-selection phase followed by a business-oriented phase. You and your analytics team, who defined the problem and prepared the data, should also evaluate the submissions. After the technical pre-evaluations, only teams who have a good working model will be allowed to pitch their business model to the judges in the second phase.
24) Judging will be extremely exhausting: in our case, technical deliberations to judge 24 teams took 6 hours after all participants went back home. If the datathon is exhausting for participants, imagine how it is for your team, who is observing and mingling with all participants, helping with logistics and performing the technical pre-selection.
SUSTAINING CULTURAL CHANGE:
25) Develop the content within your team to energizer your team. You can use an event organizer and outsource anything to them. But there are three things you should not outsource: format definition, problem definition and data preparation. Doing it in-house is also an excellent opportunity for team-building. It is indeed a lot of fun to do organize a datathon.
26) Continue the datathon vibe in your company to keep the mindset change going. A datathon is a spark that can temporarily energize your company around data analytics. But if you do not put in place the right processes to sustain the momentum, that surge of enthusiasm will disipate. Analytics training programs or internal mini-datathons in different parts of your company or defining a “data-first” imperative can help to sustain the energy.
27) It is the best thing we have done for our employer brand equity. A datathon provides a very targeted spotlight for your organization among data enthusiasts, who are the people you want to attract as employees or partners. It also provides the opportunity for participants to interact with yourself and your analytics team during the whole event and get to know more about your company. As I said above, at the end of the day, a datathon is a 24-hour interview.
Judging technical models at 3:00 am
28) External participants will always do better than internal participants.While external participants join because they want, in general internal participants are told to join by their boss, which is less motivating.
29) The winners will have the right combination of technical and business skills. Technical-only teams will fail.
30) You will never be able hire the one or two top winners. Top winners can go anywhere. But there is so much talent in the competition to build lasting relations with.
Disclaimer: Opinions in the article do not represent the ones endorsed by the author’s employer.
ABOUT THE AUTHOR
Pedro URIA RECIO is thought-leader in artificial intelligence, data analytics and digital marketing. His career has encompassed building, leading and mentoring diverse high-performing teams, the development of marketing and analytics strategy, commercial leadership with P&L ownership, leadership of transformational programs and management consulting.
Link: 30 Things I learned Organizing South East Asia’s Largest Datathon