Big Data Analytics
Page 1 of 6
UEL-CN-7031
Summative assessment Final Project 100%
Submission instructions
• Cover sheet to be attached to the front of the assignment when
submitted
• Question paper to be attached to assignment when submitted
• All pages to be numbered sequentially

Module code UEL-CN-7031
Module title Big Data Analytics
Assignment title Big Data Analytics: Coursework
Assignment number 1
Weighting 100%
Submission date Week 12
Additional information
AssignmentTutorOnline

Page 2 of 6
UEL-CN-7031 – Big Data Analytics
This coursework (CRWK) must be attempted as an individual work. This coursework is
divided into two sections: (1) Big Data analytics on a real case study and (2) presentation.
Overall mark for CRWK comes from two main activities as follows:
1- Big Data Analytics report (around 5,000 words, with a tolerance of ± 10%) (60%)
2- Presentation (40%)
Marking Scheme

Topic Total Remarks
mark (breakdown of marks for each sub-task)
Big Data (10) Providing big data queries using HIVE.
Analytics using 30
(10) Using Built-in (Date, Math, Conditional, and String)
HIVE Functions in HIVE.
(10) Visualizing the results of queries into the graphical
representations and be able to interpret them.
(15) Analyzing the dataset through statistical analysis methods.
Big Data 50 (35) Designing single- and multi-class classifiers and evaluate
Analytics using and visualize the accuracy/performance.
Spark
Individual 10 (10) (1) Find alternative solutions for high level languages and
assessment analytics approaches (use references), and Express
findings from big data analytics with the relevant theories.
Documentation 10 (10) Write down a scientific report.
Total: 100
Good Luck!
Page 3 of 6
Big Data Analytics using Hadoop and Spark
UEL-CN-7031 – Big Data Analytics
Tasks:
(1) Understanding Dataset: UNSW-NB15
The raw network packets of the UNSW-NB151 dataset was created by the IXIA PerfectStorm
tool in the Cyber Range Lab of the Australian Centre for Cyber Security (ACCS) for generating
a hybrid of real modern normal activities and synthetic contemporary attack behaviours.
Tcpdump tool used to capture 100 GB of the raw traffic (e.g., Pcap files). This data set has nine
types of attacks, namely, Fuzzers, Analysis, Backdoors, DoS, Exploits, Generic,
Reconnaissance, Shellcode and Worms. The Argus and Bro-IDS tools are used and twelve
algorithms are developed to generate totally 49 features with the class label.
a) The features are described here.
b) The number of attacks and their sub-categories is described here.
c) In this coursework, we use the total number of 10-million records that was stored in
the CSV file (download). The total size is about 600MB, which is big enough to
employ big data methodologies for analytics. As a big data specialist, firstly, we would
like to read and understand its features, then apply modeling techniques. If you want
to see a few records of this dataset, you can import it into Hadoop HDFS, then make
a Hive query for printing the first 5-10 records for your understanding.
(2) Big Data Query & Analysis by Apache Hive [30 marks]
This task is using Apache Hive for converting big raw data into useful information for the end
users. To do so, firstly understand the dataset carefully. Then, make at least 4 Hive queries
(refer to the marking scheme). Apply appropriate visualization tools to present your
findings numerically and graphically. Interpret shortly your findings.
Finally, take screenshot of your outcomes (e.g., tables and plots) together with the
scripts/queries into the report.
Tip: The mark for this section depends on the level of your HIVE queries’ complexities, for
instance using the simple select query is not supposed for full mark.
1source: https://www.unsw.adfa.edu.au/unsw-canberra-cyber/cybersecurity/ADFA-NB15-Datasets/
Page 4 of 6
(3) Advanced Analytics using PySpark [50 marks]
In this section, you will conduct advanced analytics using PySpark.
3.1. Analyze and Interpret Big Data (15 marks)
We need to learn and understand the data through at least 4 analytical methods
(descriptive statistics, correlation, hypothesis testing, density estimation, etc.). You need to
present your work numerically and graphically. Apply tooltip text, legend, title, X-Y labels etc.
accordingly to help end-users for getting insights.
3.2. Design and Build a Classifier (35 marks)
a) Design and build a binary classifier over the dataset. Explain your algorithm and its
configuration. Explain your findings into both numerical and graphical
representations. Evaluate the performance of the model and verify the accuracy and
the effectiveness of your model. [15 marks]
b) Apply a multi-class classifier to classify data into ten classes (categories): one normal
and nine attacks (e.g., Fuzzers, Analysis, Backdoors, DoS, Exploits, Generic,
Reconnaissance, Shellcode and Worms). Briefly explain your model with supportive
statements on its parameters, accuracy and effectiveness. [20 marks]
Tip: you can use this link (https://spark.apache.org/docs/2.2.0/mlclassification-regression.html) for more information on modelling.
(4) Individual Assessment [10 marks]
Discuss (1) what other alternative technologies are available for tasks 2 and 3 and how they
are differ (use academic references), and (2) what was surprisingly new thinking evoked
and/or neglected at your end?
Tip: add individual assessment of each member in a same report.
(5) Documentation [10 marks]
Document all your work. Your final report must follow 5 sections detailed in the “format of
final submission” section (refer to the next page). Your work must demonstrate appropriate
understanding of academic writing and integrity.
Page 5 of 6
FORMAT OF FINAL SUBMISSION
You need to prepare one single file in PDF format as your coursework within the
following sections:
1. Use ONLY one Cover Page
2. Table of Contents
3. Report of the tasks (it needs sub-sections for few tasks, accordingly)
4. References (if any)
SUBMISSION
single PDF into Turnitin in Moodle, by the end of Week 12
PLAGIARISM
The University defines an assessment offence as any action(s) or behaviour likely to confer
an unfair advantage in assessment, whether by advantaging the alleged offender or
disadvantaging (deliberately or unconsciously) another or others. A number of examples are
set out in the Regulations and these include:
“D.5.7.1 (e) the submission of material (written, visual or oral), originally produced by another
person or persons, without due acknowledgement, so that the work could be assumed the
student’s own. For the purposes of these Regulations, this includes incorporation of
significant extracts or elements taken from the work of (an) other(s), without
acknowledgement or reference, and the submission of work produced in collaboration for
an assignment based on the assessment of individual work. (Such offences are typically
described as plagiarism and collusion.)”. The University’s Assessment Offences Regulations
can be found on our web site. Also, information about plagiarism can be found on the
programme’s handbook.
FEEDBACK TO STUDENTS
Feedback is central to learning and is provided to students to develop their knowledge,
understanding, skills and to help promote learning and facilitate improvement.
• Feedback will be provided as soon as possible after the student has
completed the assessment task.
• Feedback will be in relation to the learning outcomes and assessment criteria.
As the feedback (including marks) is provided before Award & Field Board, marks are:
• Provisional
• available for External Examiner scrutiny
• subject to change and approval by the Assessment Board
Page 6 of 6
Assessment Criteria:

Criteria Given Mark
Demonstrate/interpret the HIVE analysis/queries 10
Understand Hadoop and Spark engines 5
Demonstrate/interpret the PySpark analysis/coding 15
Ability to answer questions 10
Overall mark 40

Analytics of Big Data

UEL-CN-7031, page 1 of 6

Final Project 100 percent summative assessment

Instructions for submitting

• When attaching the cover sheet to the front of the assignment

submitted

• When submitting an assignment, include a question paper.

• All pages must be numbered in order.

UEL-CN-7031 is the module code.

Big Data Analytics is the title of the module.

Big Data Analytics: Coursework is the title of the assignment.

The first assignment

100 percent weighting

Week 12 is the deadline for submission.

supplementary information

AssignmentTutorOnline

2nd of 6 pages

Big Data Analytics (UEL-CN-7031)

This coursework (CRWK) must be completed independently. This course is for you.

divided into two sections: (1) Big Data analytics on a real case study and (2) presentation.

Overall mark for CRWK comes from two main activities as follows:

1- Big Data Analytics report (around 5,000 words