[29th Anniversary Special Feature] 2014 Big Data Overview (Part 3)
November 1, 2014, Reporter Go Suyeon
The Unicorn of the Data Realm: Nurturing Data Scientists, the Triumvirate of Data Handling
If "Big Data" is the buzzword penetrating 21st-century IT, then those who possess the technology to handle Big Data are
referred to as "Big Data Analysts," the alchemists of the 21st century. According to Wikipedia, a Big Data Analyst, also
known as a "Digital Scientist" or "Data Scientist," is defined as an expert in Big Data analysis.
In 2012, the IT industry was grappling with how to define the relatively unfamiliar term "Big Data." However, by 2014,
nobody was discussing the definition of Big Data anymore. Instead, the focus shifted to the discussion of those who
directly handle Big Data: data scientists.
Data scientists have garnered significant attention, being labeled as "the sexiest job of the 21st century," "wizards,"
"unicorns," and more. Who are data scientists, and what are the methods for nurturing them?
Who are Data Scientists? Data scientists extract valuable insights from vast amounts of data by deciphering patterns or
trends. They can discern individuals' inclinations or predict trends by analyzing posts on social networking services like
Twitter and Facebook. Furthermore, they analyze large volumes of Big Data to predict behavioral patterns or economic
conditions.
While traditional computer experts focused on collecting and storing data, data scientists go a step further by analyzing
and processing the content of data.
To become a data scientist, one needs expertise in statistics, an understanding of business consulting, and proficiency in
techniques for data analysis and design. In other words, while data processing skills are crucial for collecting and
processing data scattered across various sources, data scientists also need to be able to formulate hypotheses and
models for analysis and understand the results of their analysis.
Professor Heo Myeong-hoe of Korea University, speaking as the keynote speaker at the first conference related to data
scientists in Korea, addressed the qualities and conditions required to become a data scientist under the theme of
"Becoming a Data Scientist."
Professor Heo stated, "Data science extracts information from observed and collected data and creates knowledge from
extracted information. To achieve this process, both scientific and engineering approaches are necessary." He argued, "It
should be called not just data science but data science and engineering."
Professor Heo emphasized the qualities of a data scientist as including an active attitude, problem-solving skills,
creativity, and communication skills. He also highlighted the skills required for data scientists, such as statistics
(regression models, multivariate EDA, Machine Learning), computer science (databases, web technologies), mathematics
(calculus, linear algebra), domain knowledge (history, economics, social sciences, engineering), and a pragmatic
philosophy. Additionally, proficiency in languages such as R, Python, and Java is necessary.
Do Data Scientists Really Exist? Examining the conditions required to become a data scientist reveals why terms like
"wizard" and "unicorn" are used. Is there really someone who meets the conditions to be a data scientist?
Professor Jo Seong-jun of Seoul National University expressed skepticism about the existence of data scientists at the
2014 BI conference. He said, "When you talk about data scientists, you would typically refer to someone who is proficient
in handling large volumes of data, adept at handling Hadoop or NoSQL, and skilled in parallel processing programs. Next,
they should be able to perform traditional statistics and machine learning-based analysis and modeling, visualize the
results to enable decision-making by practitioners or CEOs, and, most importantly, have an understanding of the
essence." He questioned, "Can such a person really exist? Even in the United States, such a person does not exist."
Professor Jo continued, "While it is imaginable that one data scientist does everything, such a person does not exist in
reality." He suggested, "Instead, a team composed of specialists may serve as an alternative to a data scientist. However,
even these teams require the basic ability to communicate with others."
Jong-seok Lee, Director of the Shinhan Card Big Data Center, echoed Professor Jo's sentiments. He said, "While data
scientists exist, they are already receiving significant recognition from companies, making it unlikely to hire new ones.
The optimal alternative is a data scientist organization rather than individual data scientists. This means gathering
individual competencies to form an organization and secure data scientist capabilities."
The Shinhan Card Big Data Center is constructing a data scientist organization by gathering competencies in three areas:
business expertise, analysis expertise, and data architecture expertise.
Director Lee said, "The competencies needed for data scientists at Shinhan Card include consulting skills to clearly convey
the value obtained through big data and specialized knowledge of data preprocessing and analysis techniques. However,
since Shinhan Card pursues a data scientist organization, anyone who possesses one of these two qualifications is
considered a data scientist."
Can Data Scientists be Nurtured? According to a May 2011 McKinsey survey, retail companies that analyze large amounts
of data can expect up to 60% of operational profits, and in the healthcare industry, savings of up to 8% or $200 billion
annually are possible in the era of Big Data.
As global companies with powerful analytical capabilities achieve results, the demand for data scientists is increasing. In
the industry, data scientists are already considered key personnel who will determine the outcome of the Big Data battle,
leading to fierce competition for talent acquisition.
According to a survey by SAS, by 2017, the demand for Big Data specialists is expected to more than double from the
current 69,000 to 6,9000. The report also stated that three out of five major UK companies are struggling to recruit people
with specialized Big Data skills. Additionally, the Korea Database Agency predicted a surge in demand for Big Data
experts, with the United States requiring 490,000 experts, the United Kingdom needing 58,000, and Korea needing 10,000.
However, because data scientists require a wide range of skills and technologies compared to traditional experts, they are
not easy to find. Currently, in Korea, university professors teach theoretical aspects and modeling analysis techniques
from a statistical perspective to foster data scientists. However, there are difficulties when it comes to practical training
and using various Big Data technologies to find problems or necessary items in data.
Data scientists need to be nurtured gradually, obtaining the required competencies as their roles expand. The Shinhan
Card Center plans to secure consulting capabilities to explain the analyzed results from the customer's perspective and
develop storytelling skills, in addition to analytical expertise.
The director emphasized, "The competencies required for data scientists at Shinhan Card include consulting skills
Curriculum of the Department of Business Data Fusion at Chungbuk University
SAS Korea offers SAS Data Scientist training courses targeting analysis experts or technical professionals considering
the adoption of Big Data and Big Data analytics systems, as well as field personnel performing Big Data analysis tasks
and those responsible for building Big Data systems.
The curriculum is designed to be practical, focusing on understanding analytical techniques through SAS solution
practice, deriving data insights, and formulating business strategies. It involves not only SAS's expert consultants
or education team instructors but also external analysis experts or professors who contribute diverse insights.
It provides various information ranging from simple concept clarification about Big Data and analytics to hands-on
practice using solutions like SAS Visual Analytics.
This five-day training covers topics such as why Big Data analysis is necessary, what is needed for Big Data analysis,
how to perform Big Data analysis, and enterprise application cases and practices of Big Data data construction and
analysis application methods.
SAS Data Scientist Training Course Curriculum
The Health Insurance Review & Assessment Service (HIRA) collaborates with SAS Korea to develop certification programs
such as the "Healthcare Data Scientist Certification Program" aimed at understanding and certifying the ability to use
ealthcare big data and "SAS Education Program Tailored to HIRA Business," which aims to train internal big data experts
for the Review & Assessment Service. Additionally, they jointly organize the "Healthcare Data Mining Contest" annually to
discover new talent.
On September 30, the 12th SAS Mining Championship, jointly hosted by SAS Korea and HIRA, was successfully held.
The championship saw participation from 250 teams consisting of 750 university students nationwide aspiring to become
future data scientists, engaging in fierce competition. The participants were tasked with developing a "clinic location
prediction model" by analyzing the risk and predicting future sales to support the establishment of medical institutions
in specific regions.
Each participant selected external public data, such as those released by Statistics Korea, the National Health Insurance
Service, and the Ministry of Land, Infrastructure and Transport, deemed valuable, and incorporated them into their
models. As a result, each team submitted different modeling results. Participants were evaluated not only on their ability
to collect external public information that could affect clinic demand and supply but also on their comprehensive value
judgment through data analysis.
Mr. Shin Yong-won, Executive Director of the Professional Services Division at SAS Korea, stated, "As the cases of
implementing big data solutions increase rapidly, the demand for data scientists is also growing. Next year, the national
certification system for data scientists will be implemented, encouraging their cultivation at the national level."
He further added, "SAS Korea has been hosting the SAS Mining Championship every year long before data scientists
garnered attention, and we will continue to build a systematic system and curriculum to supply competent data scientists
across various fields in Korea."
Hwang Ui-dong, Director of the Treatment Information Analysis Division at HIRA, stated, "Following last year, we held the
SAS Mining Championship again this year, obtaining many innovative ideas that can utilize the vast accumulated medical
and health care data. HIRA will actively pursue measures based on the excellent results of the winners to mitigate
regional disparities in medical services."
To train practical experts capable of utilizing big data in their work, the Korea Database Agency opened the Big Data
Academy in June 2012. It offers two operational courses - Big Data Technology Specialist and Big Data Analysis Specialist -
with curriculum development and operation targeting professionals with at least three years of experience, based on job
standards, prerequisites, and course-specific curriculum.
The Big Data Academy, which produced over 200 trainees in its first year of operation, selects trainees with at least three
years of experience in data technology and analysis to undergo a three-month education program. The average age of
graduates is 37.9 years old, with an average of 10.1 years of experience, highlighting the participation of industry
professionals.
In particular, the Big Data Academy curriculum includes practical projects such as predicting delisted companies using
corporate financial and disclosure information in the "Delisted Company Prediction" project and analyzing audience
preferences for newly released movies to classify marketing targets and predict future box office revenues in the
"Latest Movie Analysis" project.
This year, the Big Data Academy has incorporated the results of big data job analysis into the curriculum, reflecting five
core competencies—planning, processing, analysis, visualization, and operational management—and 21 competency
units and 67 competency unit elements. Through online and offline education, projects, field training, workshops, etc.,
itaims to train a total of 200 big data technology specialists and big data analysis experts.
Furthermore, to enhance practical application satisfaction, the academy plans to provide practice servers and establish
a professional mentorship team to provide essential information necessary for project operations.
Additionally, the Korea Database Agency develops and operates the Advanced Data Analytics certification, also known as
the Big Data Analysis certification. It divides the certification into two categories: Data Analysis Professional (ADP) and
Advanced Data Analytics Semi-Professional (ADsP).
Data Analysis Professionals perform tasks such as data analysis planning, data analysis, and data visualization to support
scientific decision-making processes for process innovation and marketing strategy determination. Eligibility criteria
include holding a doctoral degree, having a master's degree with over one year of practical experience in the field,
holding a bachelor's degree with over three years of practical experience, graduating from a vocational college with over
six years of practical experience, or having obtained the ADsP qualification. Candidates who meet one or more of these
criteria are eligible to take the exam, and a score of 75 or higher in both written and practical exams is required to pass.
Advanced Data Analytics Semi-Professionals have no eligibility restrictions, and only a written exam is required for
certification. A score of 60 or higher is required to pass.
To foster data scientists, several challenges need to be addressed. Firstly, data scientists are not properly recognized in
Korea despite being advanced professionals. Secondly, it is challenging to cultivate interdisciplinary talents due to the
lack of educational systems. Lastly, the corporate culture that undervalues data scientists as mere departmental or
dedicated organizational resources poses a significant obstacle. Therefore, efforts should focus on integrating analytics
into corporate culture through initiatives such as the establishment of Business Analytics Competency Centers.
It is widely acknowledged that big data is emerging as a new paradigm and a driver of economic and social development
in the ICT sector. Without data scientists, deriving value from big data analysis is virtually impossible. However, the
prevailing misconception among companies that simply hiring data scientists will facilitate big data analysis poses a
significant challenge. Moreover, the failure to recognize the value of data scientists exacerbates the difficulty of
cultivating them.
Nevertheless, universities are adapting their curricula to produce practical big data experts, and the government is
providing strategic support to promote the utilization and development of big data at the national level. This positive
trend suggests progress in the right direction.
Go Soo-yeon | 2014.11.01 09:00
http://www.itdaily.kr/news/articleView.html?idxno=56973
[29th Anniversary Special Feature] 2014 Big Data Overview (Part 3)
November 1, 2014, Reporter Go Suyeon
The Unicorn of the Data Realm: Nurturing Data Scientists, the Triumvirate of Data Handling
If "Big Data" is the buzzword penetrating 21st-century IT, then those who possess the technology to handle Big Data are
referred to as "Big Data Analysts," the alchemists of the 21st century. According to Wikipedia, a Big Data Analyst, also
known as a "Digital Scientist" or "Data Scientist," is defined as an expert in Big Data analysis.
In 2012, the IT industry was grappling with how to define the relatively unfamiliar term "Big Data." However, by 2014,
nobody was discussing the definition of Big Data anymore. Instead, the focus shifted to the discussion of those who
directly handle Big Data: data scientists.
Data scientists have garnered significant attention, being labeled as "the sexiest job of the 21st century," "wizards,"
"unicorns," and more. Who are data scientists, and what are the methods for nurturing them?
Who are Data Scientists? Data scientists extract valuable insights from vast amounts of data by deciphering patterns or
trends. They can discern individuals' inclinations or predict trends by analyzing posts on social networking services like
Twitter and Facebook. Furthermore, they analyze large volumes of Big Data to predict behavioral patterns or economic
conditions.
While traditional computer experts focused on collecting and storing data, data scientists go a step further by analyzing
and processing the content of data.
To become a data scientist, one needs expertise in statistics, an understanding of business consulting, and proficiency in
techniques for data analysis and design. In other words, while data processing skills are crucial for collecting and
processing data scattered across various sources, data scientists also need to be able to formulate hypotheses and
models for analysis and understand the results of their analysis.
Professor Heo Myeong-hoe of Korea University, speaking as the keynote speaker at the first conference related to data
scientists in Korea, addressed the qualities and conditions required to become a data scientist under the theme of
"Becoming a Data Scientist."
Professor Heo stated, "Data science extracts information from observed and collected data and creates knowledge from
extracted information. To achieve this process, both scientific and engineering approaches are necessary." He argued, "It
should be called not just data science but data science and engineering."
Professor Heo emphasized the qualities of a data scientist as including an active attitude, problem-solving skills,
creativity, and communication skills. He also highlighted the skills required for data scientists, such as statistics
(regression models, multivariate EDA, Machine Learning), computer science (databases, web technologies), mathematics
(calculus, linear algebra), domain knowledge (history, economics, social sciences, engineering), and a pragmatic
philosophy. Additionally, proficiency in languages such as R, Python, and Java is necessary.
Do Data Scientists Really Exist? Examining the conditions required to become a data scientist reveals why terms like
"wizard" and "unicorn" are used. Is there really someone who meets the conditions to be a data scientist?
Professor Jo Seong-jun of Seoul National University expressed skepticism about the existence of data scientists at the
2014 BI conference. He said, "When you talk about data scientists, you would typically refer to someone who is proficient
in handling large volumes of data, adept at handling Hadoop or NoSQL, and skilled in parallel processing programs. Next,
they should be able to perform traditional statistics and machine learning-based analysis and modeling, visualize the
results to enable decision-making by practitioners or CEOs, and, most importantly, have an understanding of the
essence." He questioned, "Can such a person really exist? Even in the United States, such a person does not exist."
Professor Jo continued, "While it is imaginable that one data scientist does everything, such a person does not exist in
reality." He suggested, "Instead, a team composed of specialists may serve as an alternative to a data scientist. However,
even these teams require the basic ability to communicate with others."
Jong-seok Lee, Director of the Shinhan Card Big Data Center, echoed Professor Jo's sentiments. He said, "While data
scientists exist, they are already receiving significant recognition from companies, making it unlikely to hire new ones.
The optimal alternative is a data scientist organization rather than individual data scientists. This means gathering
individual competencies to form an organization and secure data scientist capabilities."
The Shinhan Card Big Data Center is constructing a data scientist organization by gathering competencies in three areas:
business expertise, analysis expertise, and data architecture expertise.
Director Lee said, "The competencies needed for data scientists at Shinhan Card include consulting skills to clearly convey
the value obtained through big data and specialized knowledge of data preprocessing and analysis techniques. However,
since Shinhan Card pursues a data scientist organization, anyone who possesses one of these two qualifications is
considered a data scientist."
Can Data Scientists be Nurtured? According to a May 2011 McKinsey survey, retail companies that analyze large amounts
of data can expect up to 60% of operational profits, and in the healthcare industry, savings of up to 8% or $200 billion
annually are possible in the era of Big Data.
As global companies with powerful analytical capabilities achieve results, the demand for data scientists is increasing. In
the industry, data scientists are already considered key personnel who will determine the outcome of the Big Data battle,
leading to fierce competition for talent acquisition.
According to a survey by SAS, by 2017, the demand for Big Data specialists is expected to more than double from the
current 69,000 to 6,9000. The report also stated that three out of five major UK companies are struggling to recruit people
with specialized Big Data skills. Additionally, the Korea Database Agency predicted a surge in demand for Big Data
experts, with the United States requiring 490,000 experts, the United Kingdom needing 58,000, and Korea needing 10,000.
However, because data scientists require a wide range of skills and technologies compared to traditional experts, they are
not easy to find. Currently, in Korea, university professors teach theoretical aspects and modeling analysis techniques
from a statistical perspective to foster data scientists. However, there are difficulties when it comes to practical training
and using various Big Data technologies to find problems or necessary items in data.
Data scientists need to be nurtured gradually, obtaining the required competencies as their roles expand. The Shinhan
Card Center plans to secure consulting capabilities to explain the analyzed results from the customer's perspective and
develop storytelling skills, in addition to analytical expertise.
The director emphasized, "The competencies required for data scientists at Shinhan Card include consulting skills
Curriculum of the Department of Business Data Fusion at Chungbuk University
SAS Korea offers SAS Data Scientist training courses targeting analysis experts or technical professionals considering
the adoption of Big Data and Big Data analytics systems, as well as field personnel performing Big Data analysis tasks
and those responsible for building Big Data systems.
The curriculum is designed to be practical, focusing on understanding analytical techniques through SAS solution
practice, deriving data insights, and formulating business strategies. It involves not only SAS's expert consultants
or education team instructors but also external analysis experts or professors who contribute diverse insights.
It provides various information ranging from simple concept clarification about Big Data and analytics to hands-on
practice using solutions like SAS Visual Analytics.
This five-day training covers topics such as why Big Data analysis is necessary, what is needed for Big Data analysis,
how to perform Big Data analysis, and enterprise application cases and practices of Big Data data construction and
analysis application methods.
SAS Data Scientist Training Course Curriculum
The Health Insurance Review & Assessment Service (HIRA) collaborates with SAS Korea to develop certification programs
such as the "Healthcare Data Scientist Certification Program" aimed at understanding and certifying the ability to use
ealthcare big data and "SAS Education Program Tailored to HIRA Business," which aims to train internal big data experts
for the Review & Assessment Service. Additionally, they jointly organize the "Healthcare Data Mining Contest" annually to
discover new talent.
On September 30, the 12th SAS Mining Championship, jointly hosted by SAS Korea and HIRA, was successfully held.
The championship saw participation from 250 teams consisting of 750 university students nationwide aspiring to become
future data scientists, engaging in fierce competition. The participants were tasked with developing a "clinic location
prediction model" by analyzing the risk and predicting future sales to support the establishment of medical institutions
in specific regions.
Each participant selected external public data, such as those released by Statistics Korea, the National Health Insurance
Service, and the Ministry of Land, Infrastructure and Transport, deemed valuable, and incorporated them into their
models. As a result, each team submitted different modeling results. Participants were evaluated not only on their ability
to collect external public information that could affect clinic demand and supply but also on their comprehensive value
judgment through data analysis.
Mr. Shin Yong-won, Executive Director of the Professional Services Division at SAS Korea, stated, "As the cases of
implementing big data solutions increase rapidly, the demand for data scientists is also growing. Next year, the national
certification system for data scientists will be implemented, encouraging their cultivation at the national level."
He further added, "SAS Korea has been hosting the SAS Mining Championship every year long before data scientists
garnered attention, and we will continue to build a systematic system and curriculum to supply competent data scientists
across various fields in Korea."
Hwang Ui-dong, Director of the Treatment Information Analysis Division at HIRA, stated, "Following last year, we held the
SAS Mining Championship again this year, obtaining many innovative ideas that can utilize the vast accumulated medical
and health care data. HIRA will actively pursue measures based on the excellent results of the winners to mitigate
regional disparities in medical services."
To train practical experts capable of utilizing big data in their work, the Korea Database Agency opened the Big Data
Academy in June 2012. It offers two operational courses - Big Data Technology Specialist and Big Data Analysis Specialist -
with curriculum development and operation targeting professionals with at least three years of experience, based on job
standards, prerequisites, and course-specific curriculum.
The Big Data Academy, which produced over 200 trainees in its first year of operation, selects trainees with at least three
years of experience in data technology and analysis to undergo a three-month education program. The average age of
graduates is 37.9 years old, with an average of 10.1 years of experience, highlighting the participation of industry
professionals.
In particular, the Big Data Academy curriculum includes practical projects such as predicting delisted companies using
corporate financial and disclosure information in the "Delisted Company Prediction" project and analyzing audience
preferences for newly released movies to classify marketing targets and predict future box office revenues in the
"Latest Movie Analysis" project.
This year, the Big Data Academy has incorporated the results of big data job analysis into the curriculum, reflecting five
core competencies—planning, processing, analysis, visualization, and operational management—and 21 competency
units and 67 competency unit elements. Through online and offline education, projects, field training, workshops, etc.,
itaims to train a total of 200 big data technology specialists and big data analysis experts.
Furthermore, to enhance practical application satisfaction, the academy plans to provide practice servers and establish
a professional mentorship team to provide essential information necessary for project operations.
Additionally, the Korea Database Agency develops and operates the Advanced Data Analytics certification, also known as
the Big Data Analysis certification. It divides the certification into two categories: Data Analysis Professional (ADP) and
Advanced Data Analytics Semi-Professional (ADsP).
Data Analysis Professionals perform tasks such as data analysis planning, data analysis, and data visualization to support
scientific decision-making processes for process innovation and marketing strategy determination. Eligibility criteria
include holding a doctoral degree, having a master's degree with over one year of practical experience in the field,
holding a bachelor's degree with over three years of practical experience, graduating from a vocational college with over
six years of practical experience, or having obtained the ADsP qualification. Candidates who meet one or more of these
criteria are eligible to take the exam, and a score of 75 or higher in both written and practical exams is required to pass.
Advanced Data Analytics Semi-Professionals have no eligibility restrictions, and only a written exam is required for
certification. A score of 60 or higher is required to pass.
To foster data scientists, several challenges need to be addressed. Firstly, data scientists are not properly recognized in
Korea despite being advanced professionals. Secondly, it is challenging to cultivate interdisciplinary talents due to the
lack of educational systems. Lastly, the corporate culture that undervalues data scientists as mere departmental or
dedicated organizational resources poses a significant obstacle. Therefore, efforts should focus on integrating analytics
into corporate culture through initiatives such as the establishment of Business Analytics Competency Centers.
It is widely acknowledged that big data is emerging as a new paradigm and a driver of economic and social development
in the ICT sector. Without data scientists, deriving value from big data analysis is virtually impossible. However, the
prevailing misconception among companies that simply hiring data scientists will facilitate big data analysis poses a
significant challenge. Moreover, the failure to recognize the value of data scientists exacerbates the difficulty of
cultivating them.
Nevertheless, universities are adapting their curricula to produce practical big data experts, and the government is
providing strategic support to promote the utilization and development of big data at the national level. This positive
trend suggests progress in the right direction.
Go Soo-yeon | 2014.11.01 09:00
http://www.itdaily.kr/news/articleView.html?idxno=56973