Home Overview Exploratory Analysis Predictive Analysis Conclusion Contact

Discussion and Conclusion:

To sum up, the restaurant data is collected in three major dimensions, restaurant facilities, nearby locational information, and demographical information within the neighborhood. In addition, business review data are collected for text analytics. The data analytics are conducted in multiple dimensions including clustering, association rule mining, PCA analysis and multiple classification algorithms including SVM, Logistics Regression, Naïve Bayes, Decision Tree, Random Forest. Moreover, sentiment analysis and topic modeling are performed on the review data. Following insights are drawn from those analyses:

  1. Out of the three dimensions, restaurant facilities are the most important dimension, followed by nearby facilities, and the demographical information contributes litter to the success of the restaurant.

  2. Among all the internal features of restaurants, price and noise level has the most significance. The higher the price, the higher the rating could be. This meets the assumption of the data is biased towards the positive review. In addition, customers are sensitive about the noise level, which means they pay attention to the privacy and atmosphere during dining.

  3. Among all nearby faculties data, the most important factor is the number of gyms. Moreover, the association is positive. This finding might relate to the increasing consciousness about healthy food.

  4. The ratings of the restaurant are predictable, but the decision boundary is only clear between good restaurant and the rest. No models achieve good separation between moderate and poor restaurants.

  5. Out of all predictive models, Random Forest achieved best restaurants, AUC value for good restaurants prediction is 0.90. ‘Price’, ’Noise_level’ and ‘Number of gyms’ are the most important factors towards the good ratings of restaurants.

  6. Sentiment analysis achieved 92% prediction precision; however, the model is biased towards positive reviews, as there are much more false-positive predictions than false-negative predictions.

  7. Topic modeling results reveal that the top topics generally contains positive adjectives, which suggest that the business review data is biased towards positive.

Based on the analyses above, there are no most significant factors determines the success of restaurants and the data being collected is somewhat biased towards the positive side.

Future analysis will focus more on extracting meaningful information from the business review data per restaurants using text mining and natural language processing techniques. Business reviews can be given more specifically by analyzing the positive and negative reviews from customers.

1 2