Saturday 29 July 2017

Big Data Specialization E-Learning


The 3 areas that support the Internet of Things (IoT) are Big Data Analytics, Artificial Intelligence (AI) and Cyber Security. Being enthusiastic in IoT, I began taking the first step to acquire some foundation knowledge for these 3 areas. Having prior knowledge and experience in data extraction and manipulation using relational databases (e.g. Oracle, MS SQL, Postgres etc), Big Data Analytics naturally became the first item on my learning roadmap.

As IoT involves linking multiple devices together via network in order to communicate with one another, a huge volume of data would be generated in the process. These data must be analyzed in order to make further decision to achieve the intended purpose of having little or no human intervention. Another objective of Big Data Analytics is to discover hidden patterns or trends from huge volume of raw data for the purpose of making decisions to improve business processes or generate more revenue.

Due to limited time, the best way for me to acquire some foundation knowledge for these 3 areas is through Massive Open Online Courses (MOOC). The multiple advantages which MOOC provides includes having flexible schedule (study at your own convenience and pace), inexpensive (most courses are below USD 100) and having a pay-as-you-learn model which allows learners to pay to attend courses and stop whenever they wish to without incurring large sum of money. The best part is that everything which the learner has accomplished will be saved and he/she can choose to pay to resume anytime later from where he/she has stopped. These are major plus points for busy working professionals with full time job and family commitments. With MOOC, we can enroll for and study a course to gauge our interest and suitability before committing huge amount of time and money to pursue more in-depth courses for the same area. The major disadvantage of MOOC is the lack of recognition by employers and educational institution due to the difficulties in ensuring that the learners submit true copies of their own work and attempt the online assessments themselves without any assistance. Nevertheless, having MOOC certificates would show possessing initiative to learn which is something that employers look out for. If you expect to be qualified and eventually be employed for a particular profession after completing a course, you should enroll for a non-MOOC course.

Since I had already completed an online course through Coursera (the one mandated by my institution for all staff to attend), I made use of the same MOOC provider to search for and pursue a data analytics course. After 3 months of intensive studying online, I have completed the Big Data Specialization offered by Coursera. This is the first specialization from this course provider, Coursera which I had completed.

Here are some key points which I wish to share about this specialization.

About this Specialization

  • Objective is to gain an understanding of what insights big data can provide through hands-on experience with the tools and systems used by big data scientists and engineers.
  • Consists of 6 courses created by the University of California San Diego and offered through Coursera.
  • No prior programming or big data experiences required but advantageous to understand SQL and how to work with relational database management systems.
  • Subscription to this specialization cost approximately USD50 per month. Learners are expected to spend an average of 7 months to complete all 6 courses.
  • Mode of learning includes video lectures, quizzes (theory & hands-on), peer-graded assignments and discussion forums.
  • As this specialization requires the Cloudera Virtual Machine (VM) and some open-source tools to be installed, there are hardware and OS requirements to be met


List of Courses
 

  1. Introduction to Big Data 
  2. Big Data Modeling and Management Systems 
  3. Big Data Integration and Processing 
  4. Machine Learning With Big Data 
  5. Graph Analytics for Big Data 
  6. Big Data - Capstone Project


About the Capstone Project


The Capstone project is about analysing the data set for a game and to make recommendations to improve the game or generate more revenue from the game.

The name of the game is “Catch the Pink Flamingo” and its key details are as follows.
  • Online game created by Eglence Inc. (an imaginary company).
  • Multi-user and multi-level game where players can choose to join or form a team.
  • Objective​ of the game is to catch as many Pink Flamingos as possible. These Pink Flamingos randomly pop up on a gridded world map based on missions that change in real­ time. The levels get more complicated in mission speed and map complexity as the users or team move from level to level.
  • Provides chat boards for the teams to keep in touch.
  • Users are allowed to purchase items to be used in the game. This is a major source of revenue for the company.
  • Another form of revenue is advertisements shown in the game. Users’ clicks on advertisements are recorded.


Tools used in the Capstone Project

  1. Splunk - Tool for analyzing machine-generated big data.
  2. KNIME - Open source data analytics, reporting and integration platform.
  3. Apache Spark - Open-source distributed computing framework.
  4. Neo4J - Graph database management system.


Processes used in the Capstone Project


Part 1 (Aggregation & Filtering using Splunk)
  1. Review the data sets (in CSV format) and the Entity Relationship Diagram provided.
  2. Perform aggregation on the items purchased and revenue generated.
  3. Perform filtering on the total amount of money spent by the top ten users (ranked by how much money they spent).
Part 2 (Classification using KNIME)
  1. Perform classification to predict which user is likely to purchase big-ticket items (i.e. cost more than $5).
  2. Generate the decision tree and confusion matrix. The decision tree shows the predicted number of users based on categories and the confusion matrix shows the number of correctly and incorrectly predictions.
  3. Conclude the analysis and make recommendations.
     

Part 3 (Clustering using Spark)
  1. Select attributes from the CSV files provided and aggregate them.
  2. Perform clustering. This may be a repetitive task as the attributes selected may not reveal any significant differences between the clusters. The results also varies according to the number of clusters generated.
  3. Recommend actions to help improve the company’s business.

Part 4 (Analyzing graphs using Neo4J)
  1. Load all the CSV files containing chat data for the game to create the graph database.
  2. Query the graph database created to find useful information.

My Opinions on this Specialization


Pros
  • Suitable for beginners who have no experience in data analytics.
  • Flexible schedule that allows learners to learn at their own pace and availability.
  • Subscription fees are inexpensive which allows learners to have an idea of Big Data Analytics in order to decide if they are interested to commit more time and money to pursue more in-depth courses in this area.

Cons
  • Without proper assessments, the certifications obtained may not recognized by employers or academic institutions.
  • Need to install VM and tools for hands-on. There are hardware and OS requirements to be met. The VM and tools consume huge amount of memory and disk space which slows down the computer.
  • When facing difficulties in tools installation and/or course syllabus are encountered, only source of help is the discussion forums. After submitting a post in the forum, the learner can only wait for the instructors or fellow learners' reply.
  • Certain chapters contain complex mathematical formulas which are intimidating to some learners.
  • Course materials not up to standard (i.e. Theory quizzes too easy. Instructions not clear and contains mistakes).

A week after the completion of this specialization, I received the following email from Coursera Community inviting me to be part of their Beta Tester team. This is indeed a great opportunity for me to preview new courses in areas which I am interested in before their launch. I certainly hope to give constructive feedback to the course instructors to ensure high standards in the learning materials.


Sunday 23 July 2017

Adapting & Growing With Technology

It has been a hectic 3 months since I last updated my blog. Besides the increasing work load due to the academic cycle in my institution, I have been aggressively gaining knowledge and skills for the sake of my future. Some people are perhaps wondering why I chose to torture myself to take up courses when there is no intention to switch jobs or have any direct benefit to my work. The real reason is because I simply have no choice. Especially for professionals in the technology related fields, constant upgrading of skills is necessary to stay relevant in our profession. Although some people find it silly to be in technology related fields such as IT or engineering, I beg to differ. Technology is advancing at such a supersonic pace that half the jobs that exist today may be phased out in a decade or earlier. One good example of such job is  vehicle driver. Driverless cars and buses are already in testing phase and are expected to roll out commercially in a couple of years time. Many jobs are in the process of being replaced by technology. However, it does not necessarily mean that human intervention is no longer needed. When the North East MRT line started operating in Singapore more than a decade ago, the trains were driverless. Without the need for a staff to operate each train, there was however a need for at least one staff to be stationed in each train to maintain order and deal with technical glitches or emergency situations. In this case, there was no reduction in headcount. Instead, additional headcount is needed to ensure the well-being and safety of commuters. As the train system became more complex through the use of driverless technology, more technical expertise are required to perform the maintenance of the train system. In a way, the use of more advanced technology created more job opportunities for the workforce. As such, there is job opportunities for everyone as long as you are willing to adapt and grow. 

There are some people especially those in middle or upper management levels who gladly assume that their jobs are as stable as a mountain and refuse to upgrade their skills. This is certainly not true according to a theory from Charles Darwin, the famous naturalist, geologist and biologist which states that "It is not the strongest species that will survive but those that are adaptable to changes". A good example to prove the trueness of this theory is the acquisition of NK, the once renowned mighty mobile phone manufacturer in the world. Its products were so wonderful that in its hay days, out of 10 persons walking on the streets, at least 7 of them owned its phone. Since its products were great and the company did not make any mistakes or wrong decisions, why did it end up in such a pathetic state? The reason was simply because the company was not well adaptable to changes. Although the company launched smart phones during the era when iPhone and smart phones from other technology firms became popular, its products were disappointing. The company failed to incorporate innovative features into their 3G smart phones like how it did to include famous games to its 2G mobile phones which took the world by storm. Worse of all, the Operating System and software installed on its smart phones were lacking in user friendliness and full of bugs. Obviously, the company failed to maintain innovation and quality by adapting to the changes of mobile phone technology transiting from 2G to 3G. This is clearly an example of the fittest species which was unable to adapt to changes and being eliminated eventually.

As Singapore aspires to transform herself into a smart nation, many of her citizens and residents look forward to the benefits which the Internet of Things technology can offer. The government has been aggressively injecting funds for such projects and promoting awareness. However, the biggest obstacle in this transformation journey is probably the lacking in expertise. First of all, Engineering and IT have always been the least popular choice of study among the Institutes of Higher Learning in Singapore. A large portion of the students enrolled into the courses for these two technology related fields either achieved examination grades that do not qualify them for courses in more sought after fields (such as Business or Finance) or failed to secure a place in those courses which are more popular. Few students would choose technology related courses because of interest in this area. Due to the ongoing smart nation campaign, a lot of children as young as those in primary schools started learning basic programming but not many of them would eventually choose IT or Engineering related studies by the time they reach tertiary level in education. Secondly, although there are technology related students graduating every year, a large portion of graduates from technology related courses do not remain in the profession for long. The long working hours, unattractive salary and worse of all, little recognition forces most of them to switch to other professions. These factors resulted in an acute shortage of technology related professionals so much that the government has to accept foreigners to makeup the shortfall. The smart nation dream would most likely be a fantasy without people with the right skills to implement and support. Given such circumstances, development jobs which are outsourced to technology professions based overseas, quality control and communication becomes a challenge. Even after the project has been completed and the product successfully launched, it has to be supported by professions based locally for faster response. The route to a smart nation and thereafter certainly creates more technology related jobs for interested job seekers to fill. In fact, Singapore already has an oversupply of management professions and what is lacking most is technical positions waiting for people to fill.