Predicting Singapore private properties prices

Chew Lin Kiat
4 min readSep 22, 2020

I shared my dream to own a private property in Singapore with my team mates. How about doing a data science project on the property market? One of my team mates post this challenge to me. Hmm… I wondered.

Properties is all about location, location and location, as the saying goes. In Singapore, there are 3 main regions. The core central region i.e CCR, which is the central business district, orchard road, district 9,10,11 and Sentosa. The rest of core central region i.e RCR is located just outside of CCR. They are located in Queentown, Telok Blangah, Little India and Toa Payoh which are just minutes drive to the central region. Beyond the RCR is referred to as outside core central region i.e OCR. They are located at the outskirts i.e Punggol, Sengkang, Jurong, Tampines and East coast.

Is it really all about location? After all, Singapore has one of the best world class transport network. It just takes minutes to travel anywhere in Singapore, whether your drive or take the train.

My curiosity got better of me, as I pored over the data which I web scraped from one of the Singapore’s property website. I looked through several hundreds of listings and put them into a database. Hmm.. which private property can I buy?

For the data, I would used python software to ‘clean’ the data. Many of the data has to be modified such as converting numbers to integers and labelling the district codes into regions i.e CCR, RCR and OCR as mentioned earlier. After that, I have used python modules such as pandas to build the dataframe, mathplotlib, seaborn to visualise the data and scikitlearn to do the modelling.

After much sweat and hardwork, I have managed to do up a heatmap on some of the factors affecting property prices. See below. I have labelled target1 as the selling price.

Heatmap on factors affecting property prices (i.e Target1)

The heatmap is a snapshot of the factors showing correlation between the factors affecting the property. We can see that property size labelled as size1 has a strong correlation i.e 0.85 with property price. It means the larger the size of the property, the higher the selling price. It is also observed that bedroom and bathroom are highly correlated i.e 0.78. We have to exclude one of them from our pricing model later.

Wait a minute, how about location as a factor? My team mates saw my heatmap and queried me.

I have put location and tenure using the OLS model(i.e ordinary least square)to evaluate their influences on the private property prices.

OLS model on factors affecting private property prices

Features selection, is commonly referred to in data science allows us to select factors affecting the target i.e selling price of the private property. Those factors that exceeded the P value of 0.05 are discarded as they are deemed not able to reject null hypothesis. For non data science readers, just know that P value helps data scientists to determine what factors to consider. I have shown both model 1 which takes in all the features and model 2 which we selected only key features for comparison.

In model 2, we have selected the size, bedrooms, region-OCR and region-RCR for our modeling. We have excluded bathrooms as mentioned earlier as it has high correlation with bedrooms. For feature selection we don’t want same correlated factors in our model. Our model 2 shows a R squared and an adjusted R squared of 0.81. This shows our model 2 can predict our selling price at around 81% success rate based on the factors mentioned.

Surely, there are outliners in your study? My team mates asked.

It is a common problem with many data we have analyzed. I told my team mates as share my findings.

I have done a volin chart on the property selling price by region. As seen below.

Property in the CCR region has the most variance. The maximum selling price can be 1123% or $26mil above that of the mean price in the region. This can impact the accuracy of our model in our prediction.

After I have shared my findings with my team mates, many has asked about affordability, mortgage loans, cooling measures and some even asked if property prices are going to go up.

I told them that I have based my model on existing property prices and factors affecting the prices today. Perhaps, I can helped them answer their queries by incorporating their feedback into my model for my next data science project.

As for my own private property dream. I would need to do more research to make my findings more robust. Location is seen as a key determinant in affecting property prices. As shown in my data science project, CCR properties commands a price premium above the rest. Other factors such as freehold and leasehold are not as strong factors, especially if most leasehold properties are newly launched in the past 5 years.

Feel free to contact me on my linkedin to discuss any data science project.
click here to connect me on linkedin
If you want to view the code on this project. Click here.

Sign up to discover human stories that deepen your understanding of the world.

Free

Distraction-free reading. No ads.

Organize your knowledge with lists and highlights.

Tell your story. Find your audience.

Membership

Read member-only stories

Support writers you read most

Earn money for your writing

Listen to audio narrations

Read offline with the Medium app

No responses yet

Write a response