Developing a Web Dashboard for analyzing Amazon's Laptop sales data
- Try the App at: http://dashboard.sinfronteras.ws
- An example of the data (JSON file) we have scraped from Amazon: Media:AmazonLaptopReviews.json
- Open it with Firefox if you don't have a proper GoogleChrome plugin to visualize JSON files.
Contents
1 Introduction
1.1 What do we want to build?
1.1.1 Goals
This project aim to develop a Data Dashboard for analyzing Laptop Customer Reviews from Amazon.
In a general sense, «a Data Dashboard is an information management tool that visually tracks, analyzes and displays key performance indicators (KPI), metrics and key data points to monitor the health of a business, department or specific process» [1]. That is not a bad definition to describe the application we want to build. We just need to highlight that our Dashboard is going to display information about Laptop sales data from Amazon (customer reviews, in particular).
We wanted to mentione that the initial proposal was to develop a Dashboard for analyzing a wider range of products from Amazon (not only Laptops). However, because we have limited HR and time to accomplish this project, we were forced to reduce the scope of the application. This way, we are trying to make sure that we count with enough time to develop a functional Dashboard with visual appeal and a decent web design.
Some examples of the kind of Application we want to build:
- Example 1: (Figure 1)
- Sentiment Analysis Dashboard: https://powerbi.microsoft.com/en-us/partner-showcase/faction-a-sentiment-analysis-dashboard/
- Demo video: https://www.youtube.com/watch?v=R5HkXyAUUII
- This is the most similar application to the one we want to build that we have found in our research. Our initial requirements and prototype will be based in some of the component shown in this application.
- Example 2: In Figure 7 and Figure 3 we show two other Dashboard. These ones don't have a similar function to the Dashboard we want to built. They are other purposes dashboards, but they are good examples of the design and visual appeal of the application we aim to build.
- https://designrevision.com/demo/shards-dashboard-lite-react/blog-overview
- https://themesdesign.in/upcube/layouts/horizontal/index.html
- Example 3: You can also try some Dashboard that have been developed with the same technologies that we are going to use (Python - Dash). We will talk a little more about this technologies later:
1.2 Project rationale and business value
In marketing and business intelligence, it is crucial to know what customers think about target products. Customer opinions, reviews or any data that reflect the experience of the customer, represents an important information that companies can use to enhances their marketing and business strategies.
Marketing and business intelligence professionals, also need a practical way to display and resume the most important aspects and key indicators about the dynamic of the target market; but, what we want to say when we refer to the dynamic of the market? Well! we are going to use this term to refer to the most important information that business professional require to understand the market and thus be able to make decision that seek to improve the revenue of the company.
Now, let's explain with a practical example, which kind of information business professionals need to know about a target product or market. Suppose that you are a Business Intelligence Manager working for an IT company (Lenovo, for example). The company is looking to improve its Laptops sale strategy. So!, which kind of information do you need to know, to be able to make key decisions related to the Tech Specs and features that the new generation of Laptops should have to become a more attractive product into the market? You would need to analyses, for instance:
- Which are the best selling Laptops.
- Are Lenovo Laptos in a good position into the market?
- What are the top Lenovo Competitors in the industry.
- What are the key features that customers take into consideration when buying a Laptop.
- What are the key tech specs that customers like (and don't like) about Lenovo and Competitors Laptops.
- How much customers are willing to pay for a Laptop.
Those are just some examples of the information a business intelligence professional need to know when looking for the best strategy. Let's say that, after analyzing the data, you found that the top selling Laptops are actually the most expensive ones. Laptops with high quality tech specs and performance. You also found that Lenovo Laptops are, in general, under the range of prices and quality tech specs of the top selling Laptops.
With the above information, a logical strategy could be to invest in an action plan to improve the tech specs and general quality of Lenovo Laptops. If, on the contrary, the information highlights that very expensive Laptops have a very low demands, an intelligent approach could be a strategy to reduce costs of the new generation of Laptops.
So, we have already seen the importance of analyzing relevant data to understand the dynamic of the market when looking to enhance the business strategy. Now, from where and how can we get the necessary data to perform a market analysis for a business plan?
This kind of data can been collected by asking directly information from retailers. For example, if you have access to a detailed Annual Report for Sales & Marketing of a computer retailer, you will have the kind of information that can be valuable to understand the dynamic of the market. The report could contains details about the best selling computers, prices, tech specs, revenues, etc. However, from a Sales Annual report, would be missing detailed information about what customer think of the product they bought. Traditionally, this kind of data has been collected using methods such as face to face or telephone surveys.
Recently, the huge amount of data generated every day in social network and online retailer, is being used to perform analysis that allow to gain a better understanding of the market and, in particular, about customer opinions. This method is becoming a more effective, practical and cheaper way of gathering this kind information compare to traditional methods.
1.2.1 Intended target market
A data analysis dashboard for sale strategy has a very wide target market. For any company that sales something, it would be beneficial to analyses data that allows them to understand the dynamic of the market THEY are targeting.
As we have already mentioned, we were forced to reduce the scope of this project because of the time that we have to accomplish it. So the final result will be a Web Dashboard for analyzing «Laptops» sale data from «Amazon» (Customer online reviews, in particular). However, we would like to invite you to think about this project as a methodology to build a Dashboard for analyzing online retailers sale data; regardless the target products or where the data is scraped from.
Now, let's be more specific about who can use our app. We will address two examples:
- Example 1: When explaining the business value of the project, we talked about the example of a computer manufacturer that is looking to enhance its sale strategy. That is an example for a very high business level. However, as we will see in the next example, this kind of analysis can also be beneficial for small business.
- Example 2: Think about a small computer retailer that is looking to enhance its sales. They also need information about the dynamic of the Laptop market: which are the top sellers, how much customers are willing to pay, which features are customers looking in a laptop, etc. This information will allow them to invest in the best marketing and sale strategy.
2 Requirements
Depending on the nature of the projects, the requirements of a software development strategy can be gathered using different methods:
- Questionnaires and Interviews: If there is a client or if we are in contact with the final users, methods like questionnaires and interviews with the client/users are usually carried out to determine requirements. In case of a client, initial requirements are usually provided by the client at the beginning of the project and redefined in every stage of the the development process as the client and developers identifies new ones. [2]
- Assessment of the current computer system: If there is a current system, this must be tested and evaluated to determine requirement of the new one. [2]
- Scenarios: «A scenario is a sequence of interactions between a user and the system carried out in order to satisfy a specified goal» [2]. This is a very popular method to determine requirements since it can be used when there is no a client, final users or current computer system.
In our case, we started building a list of requirements by analyzing similar current Dashboards and using Scenarios. This way, the first prototype of our Dashboard will be built based on some of the components of this Application: https://www.youtube.com/watch?v=R5HkXyAUUII
3 Stage of development and technologies
Python is the main programming language that will be used in all stages of development. Some of the libraries:
- Pandas, NumPy, NLTK, Plotly, Cufflinks.
- Scrapy.
- Dash - Plotly.
- Why Python?
- Backend:
- Data Analytic:
- Natural language processing:
- Text pre-processing: Removing punctuation, Removing stopwords, Tokenization, etc.
- Sentiment Analysis.
- Text filtering.
- Technologies: Python. Some of the libraries: Pandas, NumPy, NLTK, Plotly, Cufflinks.
- Frontend:
- Layout design and development.
- Data visualization. We will use several kinds of charts to visualize the data. E.g.:
- Word cloud: It will be used to visualize Word frequency in reviews.
- Bar chart: We will use this kind of charts to visualize data comparison (price comparison, average customer reviews comparison, Word frequency in reviews, etc.)
- Histogram.
- Pie Charts.
- Bubble plot.
3.1 Scrape data from Amazon
We first need to get the data we want to analyze and display in the Web App.
As we have already said, we want to extract data related to laptops for sale from www.amazon.com. The goal is to collect the details of about 100 Laptops from different brands and models and save this data as a JSON text file.
To better understand which information we need, In Figure 4, we show a Laptop sale page from https://www.amazon.com/A…4-53N-77AJ/dp/B07QXL8YCX. From this page, we need to extract the following information:
Main | Tech details | Reviews |
---|---|---|
|
|
|
The process of extracting data from Websites is commonly known as Web scraping [1]. A web scraping software can be used to automatically extract large amounts of data from a website and save the data for later processing. [4]
The Web scraping solution used in this project is Scrapy. This is one of the most popular Python Web Scraping frameworks. «Scrapy, is a fast high-level web crawling and web scraping framework used to crawl websites and extract structured data from their pages» [3]
The following code (???) shows an example of a Scrapy Python code. This is a portion of the program we built to extract Laptop data from www.amazon.com.
Web scraping can become a complex task when the information we want is structured in more than one page. For example, in our project, the most important data we need is the customer reviews. Because a Laptop in Amazon can have countless reviews, this information can become so extensive that it needs to be displayed on a set of similar pages. For instance, consider the following links:
For instance, consider the following links:
- This is the link to the base Amazon laptos page. This page display the first group of Laptops available at Amazon.com including all branches and features. If you want to keep reviewing laptos you need to ...
import scrapy
class QuotesSpider(scrapy.Spider):
name = "amazon_links"
start_urls=[]
myBaseUrl = 'https://www.amazon.com/Laptops-Computers-Tablets/s?rh=n%3A565108&page='
for i in range(1,3):
start_urls.append(myBaseUrl+str(i))
def parse(self, response):
data = response.css("a.a-text-normal::attr(href)").getall()
links = [s for s in data if "Lenovo-" in s
or "LENOVO-" in s
or "Hp-" in s
or "HP-" in s
or "Acer-" in s
or "ACER-" in s
or "Dell-" in s
or "DELL-" in s
or "Samsung-" in s
or "SAMSUNG-" in s
or "Asus-" in s
or "ASUS-" in s
or "Toshiba-" in s
or "TOSHIBA-" in s
]
links = list(dict.fromkeys(links))
links = [s for s in links if "#customerReviews" not in s]
links = [s for s in links if "#productPromotions" not in s]
for i in range(len(links)):
links[i] = response.urljoin(links[i])
yield response.follow(links[i], self.parse_compDetails)
def parse_compDetails(self, response):
def extract_with_css(query):
return response.css(query).get(default='').strip()
price = response.css("#priceblock_ourprice::text").get()
product_details_table = response.css("#productDetails_detailBullets_sections1")
product_details_values = product_details_table.css("td.a-size-base::text").getall()
k = []
for i in product_details_values:
i = i.strip()
k.append(i)
product_details_values = k
ASIN = product_details_values[0]
average_customer_reviews = product_details_values[4]
number_reviews_div = response.css("#reviews-medley-footer")
number_reviews_ratings_str = number_reviews_div.css("div.a-box-inner::text").get()
number_reviews_ratings_str = number_reviews_ratings_str.replace(',', '')
number_reviews_ratings_str = number_reviews_ratings_str.replace('.', '')
number_reviews_ratings_list = [int(s) for s in number_reviews_ratings_str.split() if s.isdigit()]
number_reviews = number_reviews_ratings_list[0]
number_ratings = number_reviews_ratings_list[1]
reviews_link = number_reviews_div.css("a.a-text-bold::attr(href)").get()
reviews_link = response.urljoin(reviews_link)
tech_details1_table = response.css("#productDetails_techSpec_section_1")
tech_details1_keys = tech_details1_table.css("th.prodDetSectionEntry")
tech_details1_values = tech_details1_table.css("td.a-size-base")
tech_details1 = {}
for i in range(len(tech_details1_keys)):
text_keys = tech_details1_keys[i].css("::text").get()
text_values = tech_details1_values[i].css("::text").get()
text_keys = text_keys.strip()
text_values = text_values.strip()
tech_details1[text_keys] = text_values
tech_details2_table = response.css("#productDetails_techSpec_section_2")
tech_details2_keys = tech_details2_table.css("th.prodDetSectionEntry")
tech_details2_values = tech_details2_table.css("td.a-size-base")
tech_details2 = {}
for i in range(len(tech_details2_keys)):
text_keys = tech_details2_keys[i].css("::text").get()
text_values = tech_details2_values[i].css("::text").get()
text_keys = text_keys.strip()
text_values = text_values.strip()
tech_details2[text_keys] = text_values
tech_details = {**tech_details1 , **tech_details2}
reviews = []
yield response.follow(reviews_link,
self.parse_reviews,
meta={
'url': response.request.url,
'ASIN': ASIN,
'price': price,
'average_customer_reviews': average_customer_reviews,
'number_reviews': number_reviews,
'number_ratings': number_ratings,
'tech_details': tech_details,
'reviews_link': reviews_link,
'reviews': reviews,
})
3.2 Data Analytic
3.3 Front-end
4 Wireframe
5 The data
6 References
- ↑ klipfolio.com, What is a data dashboard? https://www.klipfolio.com/resources/articles/what-is-data-dashboard
- ↑ 2.0 2.1 2.2 Carol Britton and Jill Doake, A Student Guide to Object-Oriented Development, 2005 , Elsevier.
- ↑ 3.0 3.1 Scrapy.org, Official Scrapy website, https://scrapy.org/
- ↑ 4.0 4.1 Wikipedia.org, Scrapy, https://en.wikipedia.org/wiki/Scrapy
- ↑ datacamp.com, Dash for Beginners, https://www.datacamp.com/community/tutorials/learn-build-dash-python
- ↑ Create Charts & Diagrams Online, https://www.lucidchart.com