
Why are vacant and abandoned properties an issue?
Buildings that are left abandoned or vacant for extended periods of time place a severe financial strain on cities. In addition to being a nuisance and safety hazard for neighboring residences and business, these properties require more resources from police, fire, code enforcement and sometime legal departments.
Working directly with cities across the country, we have seen a number of strategies to target known vacancies and deploy city resources in an attempt to mitigate these issues as quickly as possible. From vacant registries to special taxes and legal frameworks, many cities are looking for new ways to combat the problem. But there is still a limited capacity and a never ending strain on city budgets which limits how much any one city can do.
In this article I will discuss how Tolemi engaged with one of our partners to use their existing set of data to predict, not only which properties in the city were likely to be vacant, but what are the leading factors that signal this event. By using this “vacancy prediction model”, cities can direct resources where they are needed most, instead of playing “whack-a-mole” as issues present themselves. Lastly, by understanding the factors that “predict” vacancy, cities can proactively monitor properties that exhibit these symptoms and deploy the correct resources before the problem gets out of hand.
But we have a list of vacant properties!
In most Tolemi partner cities, there is a list compiled by the building department, code enforcement office or fire department that has a set of known vacant buildings in the city. This list is often driven by drive-by inspections, citizen complaints or prior interactions with a city official. In our experience, this list represents a fraction of the true set of vacant or abandoned properties and that difference is magnified with the size of the city.
Since confirming the vacant status of a property typically requires a time consuming site inspection, sending inspectors only to the most likely vacant candidates saves a lot of wasted effort. Just as important, many vacant properties are not on the radar of inspectors, so a data driven approach will both increase the rate at which vacant properties are confirm or discovered, and provide a more comprehensive view of the vacant properties in the city.
Plus, predicting when a property is at risk of becoming vacant requires data about existing vacancies. And since every city is different, and has different sources of data, a vital step in the process will utilize this list to “train” our prediction model.
The Tolemi approach to predicting vacancy
Step 1. Find Existing Vacant Properties
In the case of finding existing vacant properties, we will rely on an inspection event or some other method that the city determines to be definitive to indicate that the property is vacant. These are most likely properties tracked in some type of vacancy list mentioned above or that are part of an official city Vacant Registry. In cities without a definitive list, we have used BuildingBlocks to find a combinations of events such as five 311 calls and five visits by the fire department and no water usage over a certain period of time, to substitute as a vacancy event.
Which ever way that list is constructed, having a known property that we believe to be vacant fits well with predictive modeling. In addition, having a unified set of data from multiple sources tied to all these properties, simplifies the data processing requirements for the model.
Step 2: Gather Available Data Sets
The only way to determine the factors that lead to vacancy and eventually lead to ways to eliminate or mitigate these factors is to have multiple sets of data to work with. This starts by pulling data from multiple city systems, cleaning, geo-coding and tying them to the individual parcel level. Luckily this is what Tolemi does best, so we are easily able to collect and aggregate this data for use in our model.
While every city has different systems and tracks different data, some of the most used data for this process are;
- Code Violations: The result of complaint driven inspections
- Building Permits: For planned or past work that is permitted
- Fire Incidents: Basic NFIRS data from CAD systems
- Police Incidents: Either tied to specific address or in some cases neighborhoods
- 311 Calls: Citizen complaints or calls for service
- Utility: Either the status of, or a history of, water or electricity bills being paid
- Taxes Owed: Status of taxes owed and how long then have been outstanding
- Assessed Value: The value of the property
- ACS: At the block group level some helpful data can be vacancy rate or owner occupancy rate.
Again, this is not an exhaustive list, nor a required list of data, but serves to prove the point that having multiple sources of data, all tied to an individual parcel, is the only way to accurately provide such an analysis

Step 3. Find Properties That Are Likely Vacant ( Create the Model )
With a good accounting of the vacancy problem, we can then begin to predict which properties are likely to become vacant in the near future. While the type of data we use to predict future vacancies will be similar to that used to find existing vacants,there are some important differences in how we approach the problems and process the data. In short, this problem fits well within a traditional predictive model.
I will delve into the specific of this for those that are interested in the technical details, the model used and tuning of that model in my next post.
Step 4. Validate the Model
Once the model has been run, an essential component to any data analysis is “tuning” that model. While I will delve into those specifics as part of the technical details mentioned above, we also have another, more manual way, to see if our predictions are correct. As we have been blessed to have multiple partners work with us on implementing this approach to predicting vacancy, there is a non-technical way to see if we are correct…go check.
One large city working with Tolemi started with only 150 known vacants, a tiny fraction of what we felt was likely the true number. Taking the top 50 that the model predicted as most likely to be vacant, 46 were confirmed through inspection by the city as vacant.
In another city, we were luckily enough to spend two days tracking the top 20 that the model had predicted were vacant. Of those 20, 18 were confirmed vacant. In this case the most interesting development was that the 2 properties that were not vacant had pulled building permits within the last month. This then, showed us how important having that data set in the model was.
Step 5. Prioritize Resources and Proactively Deploy Resources
At the end of the day, predicting where vacant properties are and determining which factors are most indicative of vacancy does not actively prevent anything. But what it does do is provide cities with a real set of data and a real set of properties to help start restoring these neighborhoods and properties. With limited resources and budgets, having inspectors or fire marshals drive around the city looking for problems, or waiting for citizens to call council or complain to the newspapers can’t be the best solution.
By having a list of where “the worst” properties are, our partners are directing their resources to where they are need most. By using this predictive model, cities are utilizing resources most effectively by targeting properties they have confidence are a problem. And just as important, knowing which factors contribute most to properties becoming vacant, and being alerted to these conditions before anyone calls, or before the property goes into disrepair can allow the city to deploy the right strategy or resources based on that data.
Whats’ next?
As mentioned, in my next post I will dive into the model itself and decisions/challenges on the technical side of things. I have been tweaking and changing the model based on results/inputs from multiple cities across the country and look forward to optimizing it with the help from other partners in the future.
One thing is sure though, the future of running advanced algorithms on data that cities already have in multiple, traditionally siloed systems in very bright. Perhaps there are other insights we as a company or community can derive from this massive amount of data. I look forward to those discussions and challenges and hope to tell you about those in this space as well.