Published on

Property Investment Assessment

216 words2 min read–––

In this project we process the Property Assessment Roll data of the city of Buffalo, NY. Based on the cleaned dataset we plan to analyze the distribution of properties in Buffalo and build a model which can be used to recommend properties for investment purposes based on the various features available in the data.

Data source

City of Buffalo 2021-2022 Property Assessment Roll

Raw data has about 94000 data points. Missing value correction; 3 steps:

  • For columns with missing values in more than 80% of the rows: Dropped the column as a whole.
  • For columns with missing values in less than 20% of the rows: Dropped the rows corresponding to those values.
  • For other columns:
    • Imputed using median for numerical features.
    • Imputed using mode for categorical features.

Average Value of Property in the neighborhood

Feature Selection

  • The dataset after imputing missing values as 73 features. (CURSE OF DIMENSIONALITY)
  • Feature selection using Pearson Correlation.
  • Selected 18 features which makes the most impact.


Four regression models were developed:

  • Multiple Linear Regression Model (MLR model).
  • LASSO Regression Model.
  • Random Forest Regression Model.
  • RIDGE Regression Model.
ModelTrain R-squared valueTest R-squared valueMean Squared ErrorMaximum Error
Multiple Linear Regression0.709180.689643881123589.51689658.64
LASSO Regression0.708310.688543894879623.56690620.07
Random Forest Regression0.984740.890581368317257.65596768.00
Ridge Regression0.709170.689603881670650.45689261.19