CZ4034 Project

Through our project, our group has crawled for the latest customers’ reviews from the various filters and conditions available at the Amazon Customer Reviews webpages via the Google Chrome extension, Amazon Web Scraper. We then created a web interface, consisting of HTML and PHP files, and integrated them with the functionalities of the indexing software, Solr, where the required data manipulation, indexing and selection are carried out. After which, we proceed with data pre-processing using techniques such as merging and formatting the multiple CSV files of crawled data, removal of duplicated data, stemming, lemmatization, stop words removal and count vectorization. Following that, we performed our sentiment prediction using various classification models such as Naïve Bayes classifier, K-nearest Neighbour and Support Vector Machine. The models are then evaluated using evaluation metrics such as F-measure, precision and recall values and the results are compiled and analyzed. The powerful LDAvis library is used to visualize text data without difficulty. Lastly, we explored some innovations to enhance our classification such the use of GridSearch and k-fold Cross Validation and the use of ensemble classification model like Random Forest to compare with the models’ results.
1 Found Helpful
35 Pages
Essays / Projects
iconYear Uploaded: 2022
This document is 20 Exchange Credits
About Document
Details
More about this document
This document has been hand checked
This document has been hand checked
Every document on Thinkswap has been carefully hand checked to make sure it's correctly described and categorised. No more browsing through piles of irrelevant study resources.
Document Type
This is an Essay / Project
Essays / Projects are typically greater than 5 pages in length and are assessments that have been previously submitted by a student for academic grading.
Exchange Credits
What are Exchange Credits?
Exchange Credits represent the worth of each document on Thinkswap. In exchange for uploading documents you will receive Exchange Credits. These credits can then be used to download other documents for free.
Satisfaction
Satisfaction Guarantee
We want you to be satisfied with your learning, that’s why all documents on Thinkswap are covered by our Satisfaction Guarantee. If a document is not of an acceptable quality or the document was incorrectly described or categorised, we will provide a full refund of Exchange Credits so that you can get another document. For more information please read Thinkswap's Satisfaction Guarantee
Integrity
Studying with Academic
Integrity
Studying from past student work is an amazing way to learn and research, however you must always act with academic integrity.

This document is the prior work of another student. Thinkswap has partnered with Turnitin to ensure students cannot copy directly from our resources. Understand how to responsibly use this work by visiting ‘Using Thinkswap resources correctly’.
Academic Integrity
How Thinkswap works
search
Find the study resources that suit your needs
Browse 200,000+ study notes and past assignments.
swap
Swap your credits
Earn credits by sharing your own documents or buy credits to access resources.
study
Study anytime
Access and download PDFs of your materials online or offline.
Explore more
Similar documents to CZ4034 Project
Let the revision begin

Browse NTU Subjects

Thinkswap's high quality resources are categorised by subject or course.
Our Study Resources
Explore Thinkswap
search icon
Choose Region
Choose university or high school