LTBoost

Boosted Hybrids of ensemble gradient algorithm for the long-term time series forecasting (LTSF)

Datasets

The LTBoost framework is evaluated using nine well-established open-source long-term time series forecasting benchmarks. These datasets are tailored to specific use cases, including traffic, weather, disease spread, and industrial problems, characterized by their multivariate nature. This setup is emblematic of complex forecasting challenges commonly encountered in practice, making these datasets exceptionally well-suited for evaluating models that are lightweight, efficient, and quick in forecasting multivariate long-term time series.

Dataset Descriptions

Below is a summary of the datasets used:

Electricity 🔗 ¹: Hourly electricity consumption data of 321 clients from 2012 to 2014 with a total of 26,304 time steps. The last client is labeled as the target value "OT".
Exchange Rate ²: Daily exchange rates of eight countries' currencies against the US dollar from 1990 to 2010, totaling 7,588 timesteps.
Traffic 🔗: Hourly road occupancy rates of 862 sensors on San Francisco highways during 2015-2016, comprising 175,544 timesteps. The last sensor is labeled as the target "OT".
Weather 🔗: Data recorded every 10 minutes in 2020, includes 52,696 time steps with 21 weather indicators. The target value is "$CO_2$", labeled as "OT".
ILI 🔗: Weekly ratios of patients with influenza-like illness from seven Centers for Disease Control and Prevention in the US, spanning from 2002 to 2021 with 966 time steps.
ETT 🔗 ¹: Datasets ETTh1, ETTh2 (hourly) and ETTm1, ETTm2 (15-minute-level) detailing oil and load features of electricity transformers from July 2016 to July 2018.

Dataset Characteristics Table

Refer to the following table for a summary of the datasets' characteristics, including their variates, timesteps, granularity, and stationarity tests. More detailed descriptions and additional tables are available in the Appendix.

Dataset	#Variates	#Timesteps	Granularity	p-value
Electricity	321	26,304	hourly	0.0000
Exchange Rate	8	7,588	daily	0.4166
Traffic	862	175,544	hourly	0.0000
Weather	21	52,696	10 min	0.0000
ILI	7	966	weekly	0.7598
ETTh1	7	17,420	hourly	0.0246
ETTh2	7	17,420	hourly	0.0401
ETTm1	7	69,680	15 min	0.0028
ETTm2	7	69,680	15 min	0.0014

The comprehensive augmented Dickey-Fuller unit root tests conducted for each dataset help assess their stationarity, crucial for modeling long-term forecasts accurately.

Detailed Dataset Descriptions

This appendix provides a more detailed description of the nine real-life datasets used in the evaluation of the Ltboost framework. These datasets, selected for their complexity and real-world applicability, span a wide range of domains including electricity consumption, currency exchange rates, traffic occupancy rates, weather conditions, influenza-like illness occurrences, and electricity transformer temperatures.

Electricity

The Electricity dataset, available from the well established UC Irvine Machine Learning Repository, encompasses hourly electricity consumption data (in kWh) for 321 clients from 2012 to 2014. A cleaned version of this dataset, includes a comprehensive record across 26,304 timesteps. The dataset identifies its final client as target value OT, providing a detailed overview of consumption patterns over the specified period.

Exchange Rate

The Exchange Rate dataset captures the daily exchange rates of eight countries' currencies against the U.S. dollar, encompassing Australia, Great Britain, Canada, Switzerland, China, Japan, New Zealand, and Singapore. Spanning from January 1, 1990, to October 10, 2010, this dataset offers a detailed view through 7,588 timesteps. For a succinct summary of the currencies involved, refer to the table below.

Variate	Country	Currency
0	Australia	Australian dollar (AUD)
1	Great Britain	Sterling (GBP)
2	Canada	Canadian dollar (CAD)
3	Switzerland	Swiss franc (CHF)
4	China	Renminbi (CNY)
5	Japan	Japanese yen (JPY)
6	New Zealand	New Zealand dollar (NZD)
OT	Singapore	Singapore dollar (SGD)

Traffic

The Traffic dataset, derived from the California Department of Transportation, features road occupancy rates measured on a scale from 0 to 1 across 17,544 hourly timesteps, spanning the years 2015 to 2016. Data were collected using 862 sensors deployed along the freeways of San Francisco, with the final sensor designated as target value OT.

Weather

The Weather dataset, recorded in 2020, captures data with a 10-minute granularity, incorporating 21 weather indicators across 52,696 timesteps. Within this dataset, the final indicator, specifically the $CO_2$ concentration, is denoted as target value OT. For an introductory summary of these indicators, see the table below, and for further details.

Symbol	Unit	Variable
p	mbar	air pressure
T	°C	air temperature
T_pot	K	potential temperature
T_dew	°C	dew point temperature
rh	%	relative humidity
VP_max	mbar	saturation water vapor pressure
VP_act	mbar	actual water vapor pressure
VP_def	mbar	water vapor pressure deficit
sh	g kg^-1	specific humidity
H2OC	mmol mol^-1	water vapor concentration
rho	g m^-3	air density
wv	m s^-1	wind velocity
max. wv	m s^-1	maximum wind velocity
wd	degrees	wind direction
rain	mm	precipitation
raining	s	duration of precipitation
SWDR	W m^-2	short wave downward radiation
PAR	μmol m^-2 s^-1	photosynthetically active radiation
max. PAR	μmol m^-2 s^-1	maximum photosynthetically active radiation
Tlog	°C	internal logger temperature
CO2	ppm	CO$_2$-concentration of ambient air

Influenza-like Illness (ILI)

The Influenza-like Illness (ILI) dataset encapsulates the weekly incidence rates of influenza-like illness reported by seven Centers for Disease Control and Prevention (CDC) across the United States. Spanning from 2002 to 2021, this dataset aggregates a total of 966 weekly observations, offering a comprehensive view of ILI trends over nearly two decades.

The dataset is particularly notable for its detailed breakdown of ILI cases, categorizing data according to patient age groups and the reporting healthcare providers, among other variates. Such granularity enables nuanced analyses of ILI spread patterns, potentially aiding in the development of targeted public health responses.

Variable	Description
% WEIGHTED ILI	Percentage of ILI patients weighted by population size
% UNWEIGHTED ILI	Unweighted percentage of ILI patients
AGE 0-4	Number of 0-4 year old ILI patients
AGE 5-24	Number of 5-24 year old ILI patients
ILITOTAL	Total number of ILI patients
NUM. OF PROVIDERS	Number of healthcare providers
OT	Total number of patients

Electricity Transformer Temperature (ETT)

The ETT datasets include ETTh1, ETTh2 (hourly granularity), and ETTm1, ETTm2 (15-minute granularity). These datasets encompass detailed recordings of seven distinct oil and load features for two electricity transformers, offering an in-depth look at their operational dynamics. Spanning from July 2016 to July 2018, the datasets collectively provide respectively 17,420 and 69,680 timesteps of data, showcasing a comprehensive temporal coverage.

Variable	Description
HUFL	High Useful Load
HULL	High Useless Load
MUFL	Middle Useful Load
MULL	Middle Useless Load
LUFL	Low Useful Load
LULL	Low Useless Load
OT	Oil Temperature

Name		Name	Last commit message	Last commit date
Latest commit History 3 Commits
dataset		dataset
LICENSE		LICENSE
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

dataset

dataset

LICENSE

LICENSE

README.md

README.md

Repository files navigation

LTBoost

Datasets

Dataset Descriptions

Dataset Characteristics Table

Detailed Dataset Descriptions

Electricity

Exchange Rate

Traffic

Weather

Influenza-like Illness (ILI)

Electricity Transformer Temperature (ETT)

About

Releases

Packages

License

hubtru/LTBoost

Folders and files

Latest commit

History

Repository files navigation

LTBoost

Datasets

Dataset Descriptions

Dataset Characteristics Table

Detailed Dataset Descriptions

Electricity

Exchange Rate

Traffic

Weather

Influenza-like Illness (ILI)

Electricity Transformer Temperature (ETT)

Footnotes

About

Topics

Resources

License

Stars

Watchers

Forks