Time Series Data Analysis-2

Time Series Data Analysis-2#

Month-to-Month Analysis#

In this section, we explore how values change from one month to the next, allowing us to detect seasonality and short-term patterns.

import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
from datetime import datetime, timedelta
import numpy as np

!git submodule update --init --recursive

Monthly data Aggregation#

We prompt the user to specify a year (16–18) and a month (e.g., 1–12 for January to December).
Construct a file path dynamically to load the corresponding CSV file containing the time-series data for all the months of given year.
The selected data is then read into a pandas dataframe (df), which becomes the basis for further analysis.

import os
current_dir = os.getcwd()
data_dir = "solar-data/pv-data"

file_path18 = []
file_path17 = []
file_path16 = []
for j in range(12):
    file_path18.append(os.path.join(current_dir, data_dir, f"f 2018 {j+1}.csv"))
    file_path17.append(os.path.join(current_dir, data_dir, f"f 2017 {j+1}.csv"))
    file_path16.append(os.path.join(current_dir, data_dir, f"f 2016 {j+1}.csv"))

import pandas as pd

dtype_dict = {
    'f': 'float32'  # Convert numeric columns to float32 to reduce memory usage
}

df18 = [pd.read_csv(file, dtype=dtype_dict) for file in file_path18]

df17 = [pd.read_csv(file, dtype=dtype_dict) for file in file_path17]

df16 = [pd.read_csv(file, dtype=dtype_dict) for file in file_path16]

df = [df16,df17,df18]

#For which year would you like a month to month analysis?(16 - 18)
var=16

dff = df[var - 16]

months = ['January', 'February', 'March', 'April', 'May', 'June',
          'July', 'August', 'September', 'October', 'November', 'December']

Metrics Calculation#

Mean: The average value of the frequency variable (f) for all the months in the selected year.
Standard Deviation: The variability of the frequency data around the mean.
Skewness: The asymmetry of the frequency distribution.

mean = []
for i in range(12):
    alt = dff[i]
    mean.append(alt['f'].mean())
print(f'The MEAN of the months in year 20{var} from January to December is:')
mean

The MEAN of the months in year 2016 from January to December is:

[50.000126,
999954,
999332,
999886,
999596,
99984,
99971,
99976,
999607,
00025,
99959,
999355]

stdev = []
for i in range(12):
    alt = dff[i]
    stdev.append(alt['f'].std())
print(f'The STANDARD DEVIATION of the months in year 20{var} from January to December is:')
stdev

The STANDARD DEVIATION of the months in year 2016 from January to December is:

[0.055868156,
054133136,
056034606,
059400223,
05720208,
05436331,
055509906,
052781273,
0532314,
05495323,
056555677,
05530752]

skew = []
for i in range(12):
    alt = dff[i]
    skew.append(alt['f'].skew())
print(f'The SKEWNESS of the months in year 20{var} from January to December is:')
skew

The SKEWNESS of the months in year 2016 from January to December is:

[0.18413661,
17962934,
16257153,
21715383,
1926185,
17057034,
18004876,
24801944,
30347863,
18840456,
15077758,
13207896]

Visualization#

Clear month to month changes of mean, standard deviation and skewness for the given year to visualize changes within the year.

plt.figure(figsize=(12, 5))
plt.plot(months[0:], mean[0:], marker='o', color='b', linestyle='-', label=f'Month to Month Change in Mean in 20{var}')
plt.xlabel(f'Year 20{var}')
plt.ylabel('Mean')
plt.title(f'Month to Month Change in Mean in 20{var}')
plt.grid(True)
plt.legend()
plt.show()

../../_images/242935c669a37dc7823b7791dfe45b24cca8bd60f35f41d44fb1bd52ba2059ca.png

plt.figure(figsize=(12, 5))
plt.plot(months[0:], stdev[0:], marker='o', color='b', linestyle='-', label=f'Month to Month Change in Standard Deviation in 20{var}')
plt.xlabel(f'Year 20{var}')
plt.ylabel('Standard Deviation')
plt.title(f'Month to Month Change in Standard Deviation in 20{var}')
plt.grid(True)
plt.legend()
plt.show()

../../_images/d36199fc0e2941cff86e735cc05bd5d7120b0153e9f933f30e56b18b0530dc66.png

plt.figure(figsize=(12, 5))
plt.plot(months[0:], skew[0:], marker='o', color='b', linestyle='-', label=f'Month to Month Change in Skewness in 20{var}')
plt.xlabel(f'Year 20{var}')
plt.ylabel('Skewness')
plt.title(f'Month to Month Change in Skewness in 20{var}')
plt.grid(True)
plt.legend()
plt.show()

../../_images/19384510e13890a6715f7d7189421a1bbbbd018c037019b37590c8e13abcbe5d.png

Crosstables#

Crosstables for metrics such as mean, standard deviation, skewness and CPS1 score for all the months and years are created.
They provide an overview of how specific metrics change over months within a year and allow easy year-to-year comparisons.

mean_v = []
std_dev = []
sk_ew =[]
for j in range(3):
    for i in range(12):
        alt = df[j][i]
        mean_v.append(alt['f'].mean())
        std_dev.append(alt['f'].std())
        sk_ew.append(alt['f'].skew())

df_10min = []
for j in range(3):
    print(f"Processing year {2016+1}")
    for i in range(12):
        print(f"- Processing month {i+1}")
        df[j][i]['dtm'] = pd.to_datetime(df[j][i]['dtm'], utc=True)
        df_10min.append(df[j][i].resample('10min', on='dtm').mean())

Processing year 2017
- Processing month 1
- Processing month 2
- Processing month 3
- Processing month 4
- Processing month 5
- Processing month 6
- Processing month 7
- Processing month 8
- Processing month 9
- Processing month 10
- Processing month 11
- Processing month 12
Processing year 2017
- Processing month 1
- Processing month 2
- Processing month 3
- Processing month 4
- Processing month 5
- Processing month 6
- Processing month 7
- Processing month 8
- Processing month 9
- Processing month 10
- Processing month 11
- Processing month 12
Processing year 2017
- Processing month 1
- Processing month 2
- Processing month 3
- Processing month 4
- Processing month 5
- Processing month 6
- Processing month 7
- Processing month 8
- Processing month 9
- Processing month 10
- Processing month 11
- Processing month 12

for i in range(len(df_10min)):
    df_10min[i]['del_f'] = df_10min[i]['f'] - 50

cps1 = []
df1_10min = df_10min
e1 = 0.12
for i in range(len(df1_10min)):
    cps_ratio = df1_10min[i]['del_f'] * df1_10min[i]['del_f'] / (e1 ** 2)
    cps = (2 - cps_ratio)*100
    cps1.append(cps.to_frame(name='cps'))

mean_cps1 = []
meancps1 = 0
i = 0
for i in range(len(df1_10min)):
    mean_cps1.append(cps1[i]['cps'].mean())

start_date = datetime(2016, 1, 1)
end_date = datetime(2018, 12, 1)
date_list = []
while start_date <= end_date:
    date_str = start_date.strftime('%B %Y')
    date_list.append(date_str)
    start_date += timedelta(days=31)
    start_date = start_date.replace(day=1)

years = list(range(2016, 2019))
months = ['Jan', 'Feb', 'Mar', 'Apr', 'May', 'Jun', 'Jul', 'Aug', 'Sep', 'Oct', 'Nov', 'Dec']

data = {
    'Year': np.repeat(years, len(months)),
    'Month': months * len(years),
    'Mean': mean_v,
    'Standard Deviation': std_dev,
    'Skewness': sk_ew,
    'CPS1 score': mean_cps1
}

dff = pd.DataFrame(data)

mean_crosstable = dff.pivot(index='Month', columns='Year', values='Mean')
std_crosstable = dff.pivot(index='Month', columns='Year', values='Standard Deviation')
skew_crosstable = dff.pivot(index='Month', columns='Year', values='Skewness')
cps1_crosstable = dff.pivot(index='Month', columns='Year', values='CPS1 score')
mean_crosstable_str = mean_crosstable.to_string()
std_crosstable_str = std_crosstable.to_string()
skew_crosstable_str = skew_crosstable.to_string()
cps1_crosstable_str = cps1_crosstable.to_string()

pd.set_option('display.max_columns', None)
pd.set_option('display.width', None)

print("Mean Crosstable:")
print(mean_crosstable_str)

Mean Crosstable:
Year        2016       2017       2018
Month                                 
Apr    49.999886  49.999542  50.000671
Aug    49.999760  49.999725  49.999870
Dec    49.999355  49.999798  50.000088
Feb    49.999954  49.999676  49.999046
Jan    50.000126  49.999775  49.998268
Jul    49.999710  49.999832  49.999409
Jun    49.999840  49.999714  50.000587
Mar    49.999332  49.999691  49.998421
May    49.999596  49.999458  49.998764
Nov    49.999592  49.999886  49.999401
Oct    50.000252  49.999844  50.000671
Sep    49.999607  49.999912  49.999363

print("Standard Deviation Crosstable:")
print(std_crosstable_str)

Standard Deviation Crosstable:
Year       2016      2017      2018
Month                              
Apr    0.059400  0.057603  0.065470
Aug    0.052781  0.059625  0.066557
Dec    0.055308  0.060521  0.065476
Feb    0.054133  0.062952  0.066638
Jan    0.055868  0.057272  0.063057
Jul    0.055510  0.058447  0.066160
Jun    0.054363  0.058637  0.065116
Mar    0.056035  0.061541  0.066643
May    0.057202  0.057244  0.061135
Nov    0.056556  0.062230  0.070056
Oct    0.054953  0.065040  0.069178
Sep    0.053231  0.060592  0.068534

print("\nSkewness Crosstable:")
print(skew_crosstable_str)

Skewness Crosstable:
Year       2016      2017      2018
Month                              
Apr    0.217154  0.135294  0.173611
Aug    0.248019  0.160047  0.080792
Dec    0.132079  0.143847  0.027416
Feb    0.179629  0.175180  0.210985
Jan    0.184137  0.141553  0.192764
Jul    0.180049  0.167786  0.132517
Jun    0.170570  0.242695  0.225854
Mar    0.162572  0.155781  0.258984
May    0.192619  0.162235  0.274101
Nov    0.150778  0.084164  0.065809
Oct    0.188405  0.214123  0.031424
Sep    0.303479  0.220359  0.103189

print("\nCPS1 score Crosstable:")
print(cps1_crosstable_str)

CPS1 score Crosstable:
Year         2016        2017        2018
Month                                    
Apr    183.996201  184.516281  181.385391
Aug    186.952255  183.442032  181.345306
Dec    185.348511  182.996033  180.037445
Feb    185.683640  181.742096  180.694016
Jan    184.723755  184.840958  182.318665
Jul    186.201004  184.388245  182.062576
Jun    187.295776  183.947220  181.632950
Mar    185.523132  182.429382  179.868927
May    185.692612  184.486923  182.894455
Nov    185.200119  183.108994  177.509521
Oct    186.462555  181.964569  178.959198
Sep    185.969696  183.100433  179.720093

Purpose of month-to-month analysis#

Comparative Analysis:
- Allow quick comparisons between months within the same year.
- Enable cross-year comparisons for the same month (e.g., January 2016 vs. January 2018).
Trend Identification:

Help identify long-term trends, such as:
- Increasing variability (standard deviation) over years.
- Changes in distribution symmetry (skewness) over time.
- Seasonal or cyclical patterns in frequency (e.g., high means in summer months).
Summarized Insights:
- Provide a clear, high-level summary of key metrics without requiring extensive manual exploration of raw data.

Yearly Data Aggegration#

Data for each year (2016–2018) is stored as separate dataframes (df_2016, df_2017, etc.).
Monthly data within each year is concatenated to create a complete yearly dataset.
All yearly datasets are appended to a list yearsData for easier iteration and analysis.

import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns

df_2018 = pd.concat(df18).reset_index(drop=True)

df_2017 = pd.concat(df17).reset_index(drop=True)

df_2016 = pd.concat(df16).reset_index(drop=True)

yearsData =[]
yearsData.append(df_2016)
yearsData.append(df_2017)
yearsData.append(df_2018)

Computation of Metrics:#

For each year, the following statistics are calculated:

Mean
Standard Deviation
Skewness

The yearly mean, standard deviation, and skewness are displayed in ascending order for quick inspection.

mean = []
for i in range(3):
    var = yearsData[i]
    mean.append(var['f'].mean())
print('The MEAN of the yearly data from 2016 to 2018 in ascending order is:')
mean

The MEAN of the yearly data from 2016 to 2018 in ascending order is:

[49.999775, 49.9998, 49.999474]

stdev = []
for i in range(3):
    var1 = yearsData[i]
    stdev.append(var1['f'].std())
print('The STANDARD DEVIATION of the yearly data from 2016 to 2018 in ascending order is:')
stdev

The STANDARD DEVIATION of the yearly data from 2016 to 2018 in ascending order is:

[0.05547456, 0.060169745, 0.0661998]

skew = []
for i in range(3):
    var2 = yearsData[i]
    skew.append(var2['f'].skew())
print('The SKEWNESS of the yearly data from 2016 to 2018 in ascending order is:')
skew

The SKEWNESS of the yearly data from 2016 to 2018 in ascending order is:

[0.1913346, 0.16806537, 0.14315064]

years = ['2016','2017','2018']

Visualization of Yearly Trends#

Three separate line plots are created to illustrate:

Changes in mean over the years.
Trends in standard deviation across years.
Variations in skewness across the dataset for each year.

These plots help visualize how the frequency variable (f) evolves from 2016 to 2018.

plt.figure(figsize=(8, 5))
plt.plot(years[0:], mean[0:], marker='o', color='b', linestyle='-', label='Year-to-Year Change in Mean')
plt.xlabel('Year')
plt.ylabel('Mean')
plt.title('Year-to-Year Change in Mean')
plt.grid(True)
plt.legend()
plt.show()

../../_images/e727eb4fd34a488f87fa8451bf903c294b153aaa15e5bc4a0f43684341ce4506.png

plt.figure(figsize=(8, 5))
plt.plot(years[0:], stdev[0:], marker='o', color='b', linestyle='-', label='Year-to-Year Change in Standard Deviation')
plt.xlabel('Year')
plt.ylabel('Standard Deviation')
plt.title('Year-to-Year Change in Standard Deviation')
plt.grid(True)
plt.legend()
plt.show()

../../_images/d3c56ed004f6d5b66d7c937ed6e2750f5ab852cd850144bbed1a87febe471b5a.png

plt.figure(figsize=(8, 5))
plt.plot(years[0:], skew[0:], marker='o', color='b', linestyle='-', label='Year-to-Year Change in Skewness')
plt.xlabel('Year')
plt.ylabel('Skewness')
plt.title('Year-to-Year Change in Skewness')
plt.grid(True)
plt.legend()
plt.show()

../../_images/9fb33642becb2ef750750d51bd12d71e368f7dd69af44f935781bbfa2115bb14.png

Conclusion#

We have built a powerful tool for exploring, analyzing, and visualizing frequency data. By combining user interactivity, detailed analysis, and aggregated insights, it provides a flexible and reusable framework for understanding time-series data across multiple timeframes.