This project cleans and prepares the Nashville Housing dataset for analysis. The raw dataset contains issues that need to be resolved including:
-- Inconsistent date formats
-- Null property address values
-- Messy address columns split across multiple fields
-- 'Y'/'N' sold as vacant values instead of 'Yes'/'No'
Duplicate rows
-- MySQL
-- Datagrip
-- The main data cleaning steps include:
-- Standardizing SaleDate format
-- Populating null PropertyAddress values using self-join
-- Splitting address into individual columns using SUBSTRING_INDEX and PARSENAME
-- Converting 'Y'/'N' to 'Yes'/'No' for SoldAsVacant using CASE statement
-- Removing duplicates using ROW_NUMBER() window function
-- Dropping unused columns not needed for analysis
-- The cleaned Nashville Housing dataset contains:
-- Standardized data formats
-- No nulls for PropertyAddress
-- Split address columns for property and owner
-- Easy to understand SoldAsVacant values
-- No duplicate rows
-- Only columns that will be used for analysis