NumPy-Pandas-LP-Basics#

1.1 Objective#

This module provides a comprehensive introduction to numerical computing and optimization using NumPy, Pandas, and Linear Programming. These three foundational tools are essential for data science, engineering, and operational research, forming the computational backbone for large-scale numerical analysis and decision-making problems. The primary goal is to equip learners with the ability to manipulate structured data efficiently, perform high-speed numerical computations, and model real-world optimization problems using Python.

  1. NumPy: Introduces array-based computing for efficient numerical operations.

  2. Pandas: Focuses on structured data manipulation and analysis.

  3. Linear Programming (LP): Explores mathematical optimization techniques using GurobiPy.

This module integrates theoretical concepts with hands-on programming using Python, Jupyter Notebook, NumPy, Pandas, and GurobiPy, allowing learners to transition from fundamental numerical analysis to real-world optimization applications.


1.2 Key Components#

1. NumPy Fundamentals#

  • Creating NumPy Arrays

    • Creating a 1D Array

    • Creating a 2D Array

    • Creating Arrays with specific Values

  • Array Attributes

    • Array Shape

    • Number of Dimensions

    • Number of Elements

    • Data Type of Elements

  • Indexing and Slicing

    • Accessing Elements

    • Slicing

  • Mathematical Operations

    • element-wise operations

  • Reshaping and Transposing Arrays

    • Reshaping Arrays

    • Transposing Arrays

  • Statistical Functions

    • Basic statistics functions

  • Stacking and Concatenation

    • Horizontal Stacking

    • Vertical Stacking

    • Concatenation

2. Pandas for Data Analysis#

  • Creating Data Structures

    • Creating a Pandas Series

    • Creating a Pandas DataFrame

  • Reading and Writing Data

    • Reading from a CSV File

    • Writing to a CSV File

  • Data Inspection and Manipulation

    • Using of head( ), tail( ), info( ), describe( )

    • Select specific columns, rows, and filtering

  • Data Cleaning

    • Using of dropna( ) to handle missing data

    • Changing Data Types

  • Modifying Data

    • Adding New Columns

    • Renaming Columns

    • Sorting Data

  • Merging and Joining Data

    • Merging DataFrames

    • Concatenating DataFrames

  • Pivot Tables and Crosstabs

    • Creating a Pivot Table

    • Creating a Crosstab

3. Linear Programming basics#

  • Introduction to LP Optimization Models

    • Understanding the objective functions in LP

    • Understanding the constraints in LP

  • Using Gurobi for Linear Programming

    • Installing Gurobi

    • Defining decision variables

    • Set objective function

    • Add constraints

  • Solving Optimization Models

    • Implementing an LP solver using GurobiPy

    • Extracting and interpreting optimal solutions


1.3 Module Impact#

This module provides a structured approach to mastering numerical computing, data handling, and optimization modeling, making it highly relevant for engineers.

  1. Efficient Data Processing: Participants develop proficiency in handling structured datasets with NumPy and Pandas, leveraging optimized numerical operations.

  2. Analytical Thinking: By working with structured data manipulation, learners enhance their ability to extract insights and analyze trends.

  3. Optimization Modeling: The integration of Linear Programming equips learners with tools to tackle decision-making problems in industry applications.

  4. Hands-on Coding: The practical implementations in Jupyter Notebook reinforce learning through interactive problem-solving.

2. NumPy basics#

NumPy (Numerical Python) is a fundamental library for numerical computing in Python. It is widely used in scientific computing, machine learning, and data analysis.

If you haven’t installed NumPy, you can do so using:

pip install numpy

Then, import it in your Python script:

import numpy as np

2.1 Creating NumPy Arrays#

Creating a 1D Array#

A NumPy array, called ndarray, can be created from a Python list using np.array().

# Creating a one-dimensional array
a = np.array([1, 2, 3, 4, 5])
print("One-dimensional array:\n", a)
One-dimensional array:
 [1 2 3 4 5]

Creating a 2D Array#

A two-dimensional array (matrix) can be created by passing a list of lists.

# Creating a two-dimensional array
b = np.array([[1, 2, 3], [4, 5, 6]])
print("Two-dimensional array:\n", b)
Two-dimensional array:
 [[1 2 3]
 [4 5 6]]

Creating Arrays with specific Values#

NumPy provides functions to create special arrays.

A = np.zeros((2, 3))  # 2x3 array of zeros
print("Zeros:\n", A)
Zeros:
 [[0. 0. 0.]
 [0. 0. 0.]]
B = np.ones((3, 3))  # 3x3 array of ones
print("Ones:\n", B)
Ones:
 [[1. 1. 1.]
 [1. 1. 1.]
 [1. 1. 1.]]
C = np.eye(3)  # 3x3 identity matrix
print("Identity Matrix:\n", C)
Identity Matrix:
 [[1. 0. 0.]
 [0. 1. 0.]
 [0. 0. 1.]]
D = np.random.rand(2, 2)  # 2x2 array of random numbers
print("Random Array:\n", D)
Random Array:
 [[0.24884241 0.33598379]
 [0.41051721 0.3514549 ]]

2.2 Array Attributes#

NumPy arrays have some important attributes that can help you understand their structure.

print("Array Shape:", D.shape) 
Array Shape: (2, 2)
print("Number of Dimensions:", D.ndim)
Number of Dimensions: 2
print("Number of Elements:", D.size) 
Number of Elements: 4
print("Data Type of Elements:",D.dtype)
Data Type of Elements: float64

2.3 Indexing and Slicing#

NumPy allows you to access elements and slices of arrays easily.

### Accessing Elements
E = np.array([[1, 2, 3], [4, 5, 6], [7, 8, 9]])
print("Element at (1,2):", E[2, 2])  # Acess the third row, third column
Element at (1,2): 9
### Slicing
print("Second column:", E[:, 1])  # Select second column
Second column: [2 5 8]
print("Sub-matrix:\n", E[0:2, 2:3])  # Select sub-matrix
Sub-matrix:
 [[3]
 [6]]

2.4 Mathematical Operations#

NumPy supports element-wise operations.

a = np.array([1, 2, 3])
b = np.array([4, 5, 6])
print("Addition:", a+ b)
Addition: [5 7 9]
print("Multiplication:", a * b)
Multiplication: [ 4 10 18]

2.5 Reshaping and Transposing Arrays#

NumPy allows changing the shape of arrays without modifying data.

### Reshaping Arrays
F = np.arange(1, 10)
G = np.arange(1, 10).reshape(3, 3)
print("Original Array:\n", F)
print("Reshaped Array:\n", G)
Original Array:
 [1 2 3 4 5 6 7 8 9]
Reshaped Array:
 [[1 2 3]
 [4 5 6]
 [7 8 9]]
### Transposing Arrays
print("Transposed Array:\n", G.T)
Transposed Array:
 [[1 4 7]
 [2 5 8]
 [3 6 9]]

2.6 Statistical Functions#

NumPy provides functions for basic statistics.

print("Maximum Value:", G.max())
Maximum Value: 9
print("Minimum Value:", G.min())
Minimum Value: 1
print("Mean Value:", G.mean())
Mean Value: 5.0
print("Sum of Elements:", G.sum())
Sum of Elements: 45
print("Column-wise Sum:", G.sum(axis=0))
Column-wise Sum: [12 15 18]
print("Row-wise Sum:", G.sum(axis=1))
Row-wise Sum: [ 6 15 24]

2.7 Stacking and Concatenation#

NumPy allows combining multiple arrays in different ways.

### Horizontal Stacking
a = np.array([[1, 2], [3, 4]])
b = np.array([[5, 6], [7, 8]])
hstacked = np.hstack((a, b))
print("Horizontally Stacked:\n", hstacked)
Horizontally Stacked:
 [[1 2 5 6]
 [3 4 7 8]]
### Vertical Stacking
a = np.array([[1, 2], [3, 4]])
b = np.array([[5, 6], [7, 8]])
vstacked = np.vstack((a, b))
print("Vertically Stacked:\n", vstacked)
Vertically Stacked:
 [[1 2]
 [3 4]
 [5 6]
 [7 8]]
### Concatenation
a = np.array([[1, 2], [3, 4]])
b = np.array([[5, 6], [7, 8]])
concatenated = np.concatenate((a, b), axis=0)  # Concatenate along rows
print("Concatenated:\n", concatenated)
Concatenated:
 [[1 2]
 [3 4]
 [5 6]
 [7 8]]

Conclusion#

Mastering NumPy is essential for data science, machine learning, and numerical computing, providing an efficient way to handle large datasets.

3. Pandas basics#

Pandas is a powerful and flexible data analysis and manipulation library for Python. It is widely used in data science and any scenario where structured data processing is needed.

Installing Pandas#

If you haven’t installed Pandas yet, you can do so using:

pip install pandas

Then, import it in your Python script:

import pandas as pd

3.1 Creating Data Structures#

Creating a Pandas Series#

The two primary data structures in Pandas are:

  • Series: A one-dimensional labeled array.

  • DataFrame: A two-dimensional table-like structure with labeled rows and columns.

A Pandas Series is similar to a column in an Excel spreadsheet. It consists of an array of data with an associated index.

# Creating a pandas Series
data = [10, 20, 30, 40]
series = pd.Series(data, index=['a', 'b', 'c', 'd'])
print(series)
a    10
b    20
c    30
d    40
dtype: int64

Creating a Pandas DataFrame#

A DataFrame is a two-dimensional table with labeled rows and columns.

# Creating a pandas DataFrame
data = {
    'Name': ['Alice', 'Bob', 'Charlie'],
    'Age': [25, 30, 35],
    'Salary': [50000, 60000, 70000]
}
df = pd.DataFrame(data)
print(df)
      Name  Age  Salary
0    Alice   25   50000
1      Bob   30   60000
2  Charlie   35   70000

3.2 Reading and Writing Data#

Pandas can read from and write to various file formats such as CSV, Excel, etc.

### Reading from a CSV File
import os
current_dir = os.getcwd()
csv_path = os.path.join(current_dir, "numpy-pandas-lp-basics-data", "data.csv")

df = pd.read_csv(csv_path)
print(df)  
  Name   Age    Salary
0    A  25.0   50000.0
1    B  30.0   60000.0
2    C  25.0   70000.0
3    D  40.0   80000.0
4    E  30.0   55000.0
5    F  45.0   65000.0
6    G  45.0   90000.0
7    H  50.0  100000.0
8    I   NaN       NaN
### Writing to a CSV File
df = df.drop(columns=['Salary']) #Delete the last column
output_path = os.path.join(current_dir, "numpy-pandas-lp-basics-data", "output.csv")
df.to_csv(output_path, index=False) 
df = pd.read_csv(output_path)
print(df)  
  Name   Age
0    A  25.0
1    B  30.0
2    C  25.0
3    D  40.0
4    E  30.0
5    F  45.0
6    G  45.0
7    H  50.0
8    I   NaN

3.3 Data Inspection and Manipulation#

You can use head( ), tail( ), info( ), describe( ) to show the First 5 rows, Last 5 rows, Summary of DataFrame, and Statistical summary.

print(df.head())   # First 5 rows
  Name   Age
0    A  25.0
1    B  30.0
2    C  25.0
3    D  40.0
4    E  30.0
print(df.tail())   # Last 5 rows
  Name   Age
4    E  30.0
5    F  45.0
6    G  45.0
7    H  50.0
8    I   NaN
print(df.info())   # Summary of DataFrame
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 9 entries, 0 to 8
Data columns (total 2 columns):
 #   Column  Non-Null Count  Dtype  
---  ------  --------------  -----  
 0   Name    9 non-null      object 
 1   Age     8 non-null      float64
dtypes: float64(1), object(1)
memory usage: 272.0+ bytes
None
print(df.describe())  # Statistical summary
             Age
count   8.000000
mean   36.250000
std     9.910312
min    25.000000
25%    28.750000
50%    35.000000
75%    45.000000
max    50.000000

You can select specific data:

print(df['Name'])  # Select a column
0    A
1    B
2    C
3    D
4    E
5    F
6    G
7    H
8    I
Name: Name, dtype: object
print(df[['Name', 'Age']])  # Select multiple columns
  Name   Age
0    A  25.0
1    B  30.0
2    C  25.0
3    D  40.0
4    E  30.0
5    F  45.0
6    G  45.0
7    H  50.0
8    I   NaN
print(df.loc[0])  # Select a row by index
Name       A
Age     25.0
Name: 0, dtype: object
print(df.iloc[1])  # Select a row by numerical position
Name       B
Age     30.0
Name: 1, dtype: object

You can also filter data easily:

filtered_df = df[df['Age'] > 30]  # Select rows where Age > 30
print(filtered_df)
  Name   Age
3    D  40.0
5    F  45.0
6    G  45.0
7    H  50.0

3.4 Data Cleaning#

Pandas has useful functions to clean data, such as Handling Missing data, Changing Data Types, etc.

### Handling Missing Values
df.dropna(inplace=True)  # Remove rows with NaN values
print(df)
  Name   Age
0    A  25.0
1    B  30.0
2    C  25.0
3    D  40.0
4    E  30.0
5    F  45.0
6    G  45.0
7    H  50.0
### Changing Data Types
df['Age'] = df['Age'].astype(int)  # Convert Age column to integer
print(df)
  Name  Age
0    A   25
1    B   30
2    C   25
3    D   40
4    E   30
5    F   45
6    G   45
7    H   50

3.5 Modifying Data#

By leveraging Pandas, you can easily add new columns, rename colums, sort data, etc.

### Adding a New Column
import os
current_dir = os.getcwd()
csv_path = os.path.join(current_dir, "numpy-pandas-lp-basics-data", "data.csv")
df = pd.read_csv(csv_path)

df['Bonus'] = df['Salary'] * 0.10  # Add a new column
print(df)
  Name   Age    Salary    Bonus
0    A  25.0   50000.0   5000.0
1    B  30.0   60000.0   6000.0
2    C  25.0   70000.0   7000.0
3    D  40.0   80000.0   8000.0
4    E  30.0   55000.0   5500.0
5    F  45.0   65000.0   6500.0
6    G  45.0   90000.0   9000.0
7    H  50.0  100000.0  10000.0
8    I   NaN       NaN      NaN
### Renaming Columns
df.rename(columns={'Name': 'Employee Name'}, inplace=True)
print(df)
  Employee Name   Age    Salary    Bonus
0             A  25.0   50000.0   5000.0
1             B  30.0   60000.0   6000.0
2             C  25.0   70000.0   7000.0
3             D  40.0   80000.0   8000.0
4             E  30.0   55000.0   5500.0
5             F  45.0   65000.0   6500.0
6             G  45.0   90000.0   9000.0
7             H  50.0  100000.0  10000.0
8             I   NaN       NaN      NaN
### Sorting Data
df.sort_values(by='Age', ascending=False, inplace=True)
print(df)
  Employee Name   Age    Salary    Bonus
7             H  50.0  100000.0  10000.0
5             F  45.0   65000.0   6500.0
6             G  45.0   90000.0   9000.0
3             D  40.0   80000.0   8000.0
1             B  30.0   60000.0   6000.0
4             E  30.0   55000.0   5500.0
0             A  25.0   50000.0   5000.0
2             C  25.0   70000.0   7000.0
8             I   NaN       NaN      NaN

3.6 Merging and Joining Data#

Pandas allows you to combine data from multiple DataFrames.

### Merging DataFrames
df1 = pd.DataFrame({'ID': [1, 2, 3], 'Name': ['Alice', 'Bob', 'Charlie']})
df2 = pd.DataFrame({'ID': [1, 2, 3], 'Salary': [50000, 60000, 70000]})
merged_df = pd.merge(df1, df2, on='ID')
print(merged_df)
   ID     Name  Salary
0   1    Alice   50000
1   2      Bob   60000
2   3  Charlie   70000
### Concatenating DataFrames
df1 = pd.DataFrame({'A': [1, 2], 'B': [3, 4]})
df2 = pd.DataFrame({'A': [5, 6], 'B': [7, 8]})
concatenated_df = pd.concat([df1, df2])
print(concatenated_df)
   A  B
0  1  3
1  2  4
0  5  7
1  6  8
### 3.7 Pivot Tables and Crosstabs

Pandas allows for powerful data summarization using pivot tables.

### Creating a Pivot Table
pivot = df.pivot_table(values='Salary', index='Age', aggfunc='mean')
print(pivot)
        Salary
Age           
25.0   60000.0
30.0   57500.0
40.0   80000.0
45.0   77500.0
50.0  100000.0
### Creating a Crosstab
crosstab = pd.crosstab(df['Age'], df['Salary'])
print(crosstab)
Salary  50000.0   55000.0   60000.0   65000.0   70000.0   80000.0   90000.0   \
Age                                                                            
25.0           1         0         0         0         1         0         0   
30.0           0         1         1         0         0         0         0   
40.0           0         0         0         0         0         1         0   
45.0           0         0         0         1         0         0         1   
50.0           0         0         0         0         0         0         0   

Salary  100000.0  
Age               
25.0           0  
30.0           0  
40.0           0  
45.0           0  
50.0           1  

Conclusion#

This section introduced the fundamental concepts of Pandas, covering data structures, reading and writing files, data manipulation, cleaning. Pandas is an essential tool for anyone working with structured data in Python, making data analysis more efficient and intuitive.

4. Linear Programming basics#

Linear Programming (LP) is a mathematical technique used to optimize a linear objective function, subject to linear equality and inequality constraints. It is widely used in various fields.

Introduction to LP Optimization Models:

LP problems generally take the form:

The objective function:

\[\text{maximize or minimize } f(x) = b_1x_1 + b_2x_2 + ... + b_nx_n\]

Subject to Constraints:

\[ Kx \leq a \]

where:

  • \(f(x)\) is the objective function to be optimized,

  • \(x_1, x_2, ..., x_n\) are decision variables,

  • \(b_1, b_2, ..., b_n\) are coefficients in the objective function,

  • \(K\) is a matrix of constraint coefficients,

  • \(x\) is a vector of decision variables,

  • \(a\) is a vector of constraint bounds.

Using Gurobi for Linear Programming#

Gurobi [1] is a powerful optimization solver widely used for solving optimization problems. The Gurobipy [2] Python library provides a user-friendly interface to define and solve optimization models. The Gurobipy package includes a trial license, which allows users to solve problems of limited size [2]. However, students and faculty affiliated with academic institutions are eligible for a free, full-featured license [2].

Installing Gurobi#

If you haven’t installed Gurobipy yet, use:

pip install gurobipy

Implementing a Linear Program with Gurobi#

Problem Definition:#

Consider a simple LP problem:

Linear Programming Problem#

Maximize:

\[f = 2x + 3y\]

Subject to:

\[3x + 2y \leq 11\]
\[x + y \leq 4\]
\[x \geq 0, \quad y \geq 0\]

We will solve this problem using Gurobi in Python.

First, you need to import the entire gurobipy library and import the GRB constants from gurobipy.

import gurobipy as grb
from gurobipy import GRB

Then, you need to create a new optimization model instance, define the decision variables, and set up the objective function.

# Create a new optimization model
model = grb.Model("LP")
# Define decision variables
x = model.addVar(name="x", vtype=GRB.CONTINUOUS, lb=0)
y = model.addVar(name="y", vtype=GRB.CONTINUOUS, lb=0)
# Set objective function
model.setObjective(2*x + 3*y, GRB.MAXIMIZE)
Set parameter Username
Academic license - for non-commercial use only - expires 2026-02-10

Next, you can define the constraints:

# Add constraints
model.addConstr(3*x + 2*y <= 11, "Constraint 1")
model.addConstr(x + y <= 4, "Constraint 2")
<gurobi.Constr *Awaiting Model Update*>

Now, you can invoke Gurobi to solve this optimization problem:

# Optimize the model
model.optimize()
Gurobi Optimizer version 12.0.1 build v12.0.1rc0 (win64 - Windows 11.0 (26100.2))

CPU model: 12th Gen Intel(R) Core(TM) i5-12500H, instruction set [SSE2|AVX|AVX2]
Thread count: 12 physical cores, 16 logical processors, using up to 16 threads

Optimize a model with 2 rows, 2 columns and 4 nonzeros
Model fingerprint: 0x9496ece8
Coefficient statistics:
  Matrix range     [1e+00, 3e+00]
  Objective range  [2e+00, 3e+00]
  Bounds range     [0e+00, 0e+00]
  RHS range        [4e+00, 1e+01]
Presolve time: 0.01s
Presolved: 2 rows, 2 columns, 4 nonzeros

Iteration    Objective       Primal Inf.    Dual Inf.      Time
       0    5.0000000e+30   3.250000e+30   5.000000e+00      0s
       1    1.2000000e+01   0.000000e+00   0.000000e+00      0s

Solved in 1 iterations and 0.01 seconds (0.00 work units)
Optimal objective  1.200000000e+01
# Display results
if model.status == GRB.OPTIMAL:
    print(f"Optimal solution: x = {x.x}, y = {y.x}")
    print(f"Optimal objective value: Z = {model.objVal}")
else:
    print("No optimal solution found.")
Optimal solution: x = 0.0, y = 4.0
Optimal objective value: Z = 12.0

After running the above script, Gurobi will output the optimal values of \(x\) and \(y\), along with the optimal objective function value.

Conclusion#

Linear Programming is a mathematical optimization technique used to find the best outcome in a given system. Gurobi is a powerful solver that can efficiently handle optimization problems. This section serves as a foundation for more advanced optimization problems, including real-world applications such as grid optimization.

References#

[1] Gurobi Optimization, LLC, “Gurobi Optimizer Reference Manual,” 2024.

[2] Gurobi Optimization, LLC, “GurobiPy: Python interface for the Gurobi Optimizer,” PyPI, [Online]. Available: https://pypi.org/project/gurobipy/