[TOC]
Guide to the code
This section introduces how to set up and run the prediction model..
Installation
You can either fork the code or download the latest .zip to your working folder.
Then, open the command at the working folde and install the FloGen with:
python setup.py install
If you want to make some modification to the code, use
python setup.py develop
Dataset establishment
The first thing to do before training is to establish a dataset. The dataset is slightly different from the ordinary one, a series dataset.
This means we have a database of many different prior flowfields
(for example, the flowfield under design cruise condition of many different airfoils: \(foil_1, foil_2, \cdots, foil_{N_f}\) ).
Meanwhile, in order to train the model supervised, for each prior flowfield \(f\), we also need many target flowfields.
(for example, the flowfield of each airfoil under many different operating conditions (AoA) \(c_1, c_2, \cdots, c_{N_c(f)}\).)
The complete database can be written as:
where the flowfields for the same \(f\) is called one series, or one group.
Notice that the last subscript \(N_c\) is a function of \(f\), which means the number can be different for different airfoils.
Prepare data
The data should be stored in two .npy files in advance. (there are some examples in section flowfield dataset).
data
The first data file contains the flowfields, which should be named data<yourname>.npy, where <yourname> can be replaced to any legal string. Its shape should be
The first dimension is the total number of the prior and target flowfields.
The second dimension is the number of channels. In flowfield reconstructing tasks, each channel can represent a mesh coordinate (i.e., x, y) or flow variables (i.e., p, T, u, v).
The third dimension and the rest dimensions are for each flowfield. If the input data is a profile (like a pressure profile on the airfoil surface), then it should be only the third dimension. If the input data is a field, then there should be the third and the fourth dimension.
index
The first data file contains the index information of the flowfields, which should be named index<yourname>.npy, where <yourname> is the same as in data<yourname>.npy. Its shape should be
For each flowfield, \(N_\text{info}\) values can be stored. They should obey the format below:
index |
note |
description |
|---|---|---|
0 |
\(i_f\) |
Index of the prior flowfield |
1 |
\(i_c\) |
Index of the current flowfield’s condition among its series |
2 |
Index of the prior flowfield’s condition among its series |
|
3 ~ 3+DC |
\(c\) |
The condition values of the current flowfield (The length of this part depends on the dimension of a condition code) |
4+DC~4+2DC |
The condition values of the prior flowfield of its series (The length of this part depends on the dimension of a condition code) |
|
more |
Auxilary data |
Import dataset with ConditionDataset
The dataset should be constructed before training with the following code:
from flowvae.dataset import ConditionDataset as Cd
fldata = Cd(file_name='<yourname>', d_c=1, c_mtd='all', n_c=None, c_map=None, c_no=1, test=100, data_base='data/', is_last_test=True, channel_take=None)
Remark for saving the index Sometimes we need to save the list of which flowfields we have chosen for training. The
ConditionDatasetis designed to do so: whenc_mtdis other than'load', an index list will be saved in thedata_basefolder with the name<yourname>_<c_no>dataindex.txt. So next time when you want to use this index map, simply usec_mtd='load'and assign the desiredc_no. TheConditionDatasetis designed to do so: whenc_mtdis other than'load', an index list will be saved in thedata_basefolder with the name<yourname>_<c_no>dataindex.txt. So next time when you want to use this index map, simply usec_mtd='load'and assign the desiredc_no.
The arguments of the ConditionDataset is:
argument |
type |
description |
|---|---|---|
|
|
name of the data file |
|
|
dimension of the condition values |
|
|
for each series, sometimes we want to choose some of it for training. This argument decides how to choose the conditions used in training |
|
|
Default: |
|
|
Default: |
|
|
Default: |
|
|
Default: |
|
|
Default: |
|
|
Default: |
|
|
Default: |
Several useful functions of ConditionDataset
During training, the ConditionDataset act like the origin Dataset in Pytorch. There are several extra functions you may interested in for post-process or other circumstances:
get_series
fldata.get_series(idx, ref_idx=None)
function: Return the whole series flowfield of the assigned series index ( \(i_f\) ).
arguments::
idx:intthe series index ( \(i_f\) )
return: Dict with {'flowfields': flowfield, 'condis': condis, 'ref': ref, 'ref_aoa': ref_aoa}
'flowfield':np.ndarray(size: \(N_c(f) \times C \times H \times W\)), the flowfields of the assigned series'condis':np.ndarray(size: \(N_c(f) \times D_c\)) , the operating condition values of the assigned series'ref':np.ndarray(size: \(C \times H \times W\)), the prior flowfields of the assigned series'ref_aoa':np.ndarray(size: \(D_c\)), the operating condition values of the prior flowfield in the assigned series
get_index_info
fldata.get_index_info(i_f, i_c, i_idx)
function: Return the value in a specific position in index.npy.
arguments::
i_f:intthe series index ( \(i_f\) )i_c:intthe condition index in the series ( \(i_c\) )i_idx:intthe index of the information vector
return: the value
Construct the model
The backbone of the FloGen is the Encoder-Decoder model or its variational version, VAE. To predict a series of off-design flowfields, the operating condition for the target flowfield needs to be introduced. The model can be divided into three procedures as shown in the figure:

The Encoder extracts latent variables (also can be seen as the statistic features) from the input prior flowfield.
The Concatenetor combines the latent variables (only related to the prior flowfield) and the operating condition of the target flowfield. There are several strategies to do the above combination, both deterministic and stochastic, and will be introduced in the sections below.
The Decoder takes the concatenated vector as input, and generates the target flowfield.
Set the encoder and decoder
To construct the model, we first need to assign the proper encoder and decoder:
from flowvae.base_model import convEncoder_Unet, convDecoder_Unet
_encoder = convEncoder_Unet(in_channels=2, last_size=[5], hidden_dims=[64, 128, 256])
_decoder = convDecoder_Unet(out_channels=1, last_size=[5], hidden_dims=[256, 128, 64, 64], sizes = [24, 100, 401], encoder_hidden_dims=[256, 128, 64, 2])
The table below shows the available encoder and decoder classes.
Type |
Encoder |
Decoder |
|---|---|---|
base class |
|
|
dense connected |
|
|
convolution (1D/2D) |
|
|
Unet convolution (1D/2D) |
|
|
ResNet |
|
|
For more details on the encoder and decoder, please see here
Set the model
Then we can construct the predicting model with the assigned encoder and decoder. This is done by constructing a frameVAE or Unet class. They are arguments of them are the same, like:
vae_model = frameVAE(latent_dim=12, encoder=_encoder, decoder=_decoder, code_mode='ved1', dataset_size=None, decoder_input_layer=0, code_dim=1, code_layer=[], device = 'cuda:0')
The arguments of the frameVAE and Unet are:
argument |
type |
description |
|---|---|---|
|
|
the total dimension of the latent variable (include code dimension) |
|
the encoder |
|
|
the decoder |
|
|
|
the mode to introduce condition codes in the concatenator. See the table below. |
|
|
Default |
|
Default |
|
|
|
Default |
|
|
Default |
|
|
Default |
The available concatenator mode can be found in the following table, see Concatenetion strategy for details.
perspective |
probabilistic |
implicit |
semi-implicit |
explicit |
none |
|---|---|---|---|---|---|
ae |
d. |
|
|||
ae |
vf. |
|
|
|
|
ed |
d. |
|
|||
ed |
v. |
|
|||
ed |
vf. |
|
ae = auto-encoder perspective, ed = encoder-decoder perspective d. = deterministic, v. = variational, vf. = only variational with flow features
Training
Set operator
To train the model, we need another class, the operator AEOperator, to conduct the training process:
op = AEOperate(opt_name='<new>', model=vae_model, dataset=fldata,
recon_type='field',
input_channels=(None, None),
recon_channels=(1, None),
num_epochs=300, batch_size=8,
split_train_ratio=0.9,
recover_split=None,
output_folder="save",
shuffle=True, ref=False,
init_lr=0.01)
The arguments of the AEOperator are:
argument |
type |
description |
|---|---|---|
|
|
name of the problem |
|
|
the model to be trained |
|
|
The dataset to train the model |
Save & Load |
||
|
|
Default |
Training |
||
|
|
Default |
|
|
Default |
Dataset |
||
|
|
Default |
|
|
Default |
|
|
Default |
|
|
Default |
Model |
||
|
|
Default |
|
|
Default |
|
|
Default |
|
|
Default |
Set loss parameters
Then, we also need to set some parameters for the loss calculation. There are several loss terms involved in the training process. They are basic loss terms (include the reconstruction, code, and index loss) and the physics-based loss terms (include the NS loss and the aerodynamic loss).
When using different code concatenate modes, the involved basic loss terms are not the same and are concluded in the following table.
loss term |
|
|
|
|
|
|
|
|---|---|---|---|---|---|---|---|
reconstruction |
√ |
√ |
√ |
√ |
√ |
√ |
√ |
index |
KL-p |
KL-p |
KL-p |
MSE-p |
KL-n |
KL-n |
|
code |
MSE |
MSE |
KL-p: Kullback-Leibler divergence (KL divergence) to the prior latent variables \(z_r\).
MSE-p: Mean square error to the prior latent variables \(z_r\).
KL-n: KL divergence to the standard normal distribution \(\mathcal N(0,1)\).
The physics-based loss terms can be added as you want. It is only related to the reconstructed flowfield.
To manipulate the loss parameters, you can use the following sentence:
op.set_lossparas()
All of the parameters have default values, so only need to put the parameters you want to change into the arguments. The arguments related to the code and index loss are:
key |
type |
default |
description |
|---|---|---|---|
|
|
|
the weight of code loss, better for 0.1 |
|
|
|
the weight of index KLD loss |
|
|
|
the epoch to start counting the index KLD loss |
|
|
|
the method to get avg_latent value |
The arguments related to the physics-based loss are:
key |
type |
default |
description |
|---|---|---|---|
|
|
|
the mode to calculate smooth loss |
|
|
|
the epoch to start counting the smooth loss |
|
|
|
the weight of the smooth loss |
|
|
|
( |
|
|
|
( |
|
|
|
the weight of aerodynamic loss |
|
|
|
the epoch to start counting the aero loss |
Set the optimizer and scheduler
Then we need to set the optimizer and the scheduler to train the model. They are based on torch.optim, and assigned to the operator as follows:
op.set_optimizer(<optimizer_name>, <keyword arguments of the optimizer>)
op.set_scheduler(<scheduler_name>, <keyword arguments of the scheduler>)
Both the set_optimizer and set_scheduler have the first argument to be the Class name in torch.optim and torch.optim.lr_scheduler. Then the rest arguments should be the keyword arguments of the assigned optimizer and scheduler. Here is an example:
op.set_optimizer('Adam', lr=0.01)
op.set_scheduler('ReduceLROnPlateau', mode='min', factor=0.1, patience=5)
Recommended setting
The optimizer is default set to 'Adam' with the init_lr assigned when initializing the AEOperator. If you want to change the optimizer, set it explicitly after constructing the AEOperator.
The scheduler is recommended to use the warmup strategy. It first increases the learning rate from a smaller value, and then reduces it. The FloGen provides two warming-up strategies in flowvae.utils
Warmup with exponent
from flowvae.utils import warmup_lr op.set_scheduler('LambdaLR', lr_lambda=warmup_lr)
\[\begin{split}\mathrm{LR} = \mathrm{LR}_\text{init} \times \left\{\begin{array}{lc} 1+0.5 \times \text { epoch, } & \text { epoch }<20 \\ 10 \times 0.95^{\text {epoch-20, }}, & \text { epoch }\ge 20 \end{array}\right.\end{split}\]Warmup with plateau
from flowvae.utils import warmup_plr_lr op.set_scheduler('LambdaLR', lr_lambda=warmup_plr_lr)
\[\begin{split}\mathrm{LR} = \mathrm{LR}_\text{init} \times \left\{\begin{array}{ll} 1+0.5 \times \text { epoch, } & \text { epoch }<20 \\ 10 , & 20 \le \text { epoch }\le 50 \\ 1 , & 50 \le \text { epoch }\le 100 \\ 0.1 , & 100 \le \text { epoch }\le 150 \\ 0.01 , & 150 \le \text { epoch }\le 200 \\ 0.001 , & 200 \le \text { epoch }\le 250 \\ 0.0001 , & 250 \le \text { epoch }\end{array}\right.\end{split}\]
Train the model
To train a model, use:
op.train_model(save_check=100, save_best=True, v_tqdm=True)
The arguments
save_check: (int) the interval between the checkpoint file is saved to the given pathoutput_pathsave_best: (bool) default:Truewhether to save the best modelv_tqdm: (bool) default:Truewhether to usetqdmto display progress. When writing the IO to files,tqdmmay lead to fault.
Post process
After the model is trained, the FloGen provides several useful functions to post-process the reconstructed flowfield data. Most of them aim to obtain the aerodynamic coefficients of the surfaces in the flowfield.
Use the model to predict
First, we need to call the model to predict the flowfield of the new airfoil and/or under new operating conditions. This can be done with the encode and sample functions in the frameVAE class. The encode is used to obtain the latent variables (mu and log_var) from the new prior flowfield, and the sample generate the new flowfield with the given operating condition (code), and the latent variables. Here is an example:
#* construct the prior flowfield and prior condition
aoa_ref = torch,from_numpy(np.array([1.0])) # prior condition is 1.0
data_ref = torch.from_numpy(np.concatenate(geom, field_ref), axis=0) # prior flowfield
data_ref = data_ref.float().unsqueeze(0).to(device) # add the batch channel and move to device
#* use the encoder to obtain latent variables (or its distribution)
mu, log_var = vae_model.encode(data_r)
#* generate the airfoil's new profiles under other operating conditions with the model
for aoa in aoas:
aoa_residual = aoa - aoa_ref # get residual operating conditions
field_residual = vae_model.sample(num_samples=1, code=aoa_residual, mu=mu, log_var=log_var) # sample the latent variables and get the residual field
field_reconstruct = field_residual + field_ref
There are some remarks to the above code:
The function
encodetakes the batch version of the input. So don’t forget to add the batch channel withunsqueeze(0)if the input don’t have that channel.During prediction, no matter what the concatenation strategy is, the decoder only need the latent varialbes of the prior field. So
muandlog_varcan obtained in advance, and no need for update during the prediction of one airfoil.The
vae_model.sampleis automatically adaptive to the concatenation strategy. It means that:if the strategy is stochmatic, several latent variables will be sampled from the distribution of the l.v. and be input to the decoder, and each of the latent variables will lead to a result reconstruct field. This gives a way to evaluate the uncertainty of the reconstruct field. The number of the samples can be assigned by
num_sample.if the strategy is deterministic, no sampling process takes place, and the argument
log_varwill be ignored. It is also no need to assign anum_samplegreater than one.
The Unet decoder needs feature maps from the encoder, but the batch dimension of the feature maps (usually is 1) may be different from the decoder (equals
num_sample). The FloGen provide a function to multiplize the encoder’s feature map:vae_model.repeat_feature_maps(num_sample)
Calculate aerodynamic coefficients from the reconstruct fields
There are some functions to obtain the angle of attack, lift, and drag from a 2D flowfield. They are listed in the table below:
name |
description |
arguments |
returns |
|---|---|---|---|
|
extract the angle of attack(AoA) from the far-field velocity field |
- |
( |
|
extract pressure values at the airfoil surface from the pressure field |
- |
Tuple( |
|
get the geometry variables on the airfoil surface |
- |
|
|
integrate the force on x and y direction |
- |
|
|
get the lift and drag |
|
|
There are some functions to obtain the x,y direction force from a 1D pressure profile. They are listed in the table below:
name |
description |
arguments |
returns |
|---|---|---|---|
|
integrate the force on x and y direction |
- |
|
|
integrate the lift and drag |
- |
|
|
obtain the mass and momentum flux through a line |
- |
|
To speed up calculation for series data, it has a batch version, where the input and output both add the first channel for batch size.
origin name |
batch version name |
|---|---|
|
|
|
|