Now you should have DeepFaceLab(DFL below) ready to play following the steps in Part 1. At the top level of the directory of DeepFaceLab_Mac_M1( or the name you set for installation), there are sub-directories that work as.
- workspace: the default workspace, all scripts search the required data from this one. All trained models go to this place.
- DeepFaceLab: the original one with macOS modifications.
- scripts: script files starting with the prefix of numbers which implies the order to run. The secondary number after the dot means optional to run. Usually, you only have to pick one (but not limited to one) in the process.
Here is a quick start:
- Drop the target movie file, to workspace/data_dst.mp4, and the source movie file, to workspace/src_dst.mp4.
- In the script directory, run the scripts. 2_ -> 3_ -> 4_1 -> 5._ -> 6_ -> 7_ -> 8_
- Step 6 takes significant time. The final movie is shown in the workspace/
Sounds easy, in a perfect world yes. Actually, you still need extra effort to get an acceptable result. There are many tutorials and discussion boards for it. Now we focus on replacing 2, 3, 4, and 8 with Scene Selector’s Collection Manager. Though you can modify the scripts to let DFL work on the file exported, we just keep the copy-file method so you can use the native method at any time. The following test was done on M1 Max 32GB with 32-core GPU. Scene Selector Version 4.5 or later.
Preparing the Destination Movie
Open a movie file in Scene Selector. The movie file must be in the format macOS can recognize, which says if it cannot be opened by Quicktime player, it won’t be opened by Scene Selector. If the file is not compatible, I would recommend using Handbrake and using the ‘video toolbox’ coder to utilize the hardware encoder. For the MV( 4K, 24fps 3:52) I tested, transcode to H265, 10bit, 4K took about 1 min.
To make a quick test, I selected a clip with only one singer in it by the anchors. Copy( command-c or from the Edit menu) it.
Open a new Collection Manager( Tools-> Collection Manager). Select the Movie tab and + to create a new collection. Click the name of the collection and rename it. Click somewhere else or press the tab key to make sure the name changed. Click the Go button to start a new collection.
For first-time-run, it would be better to drop an image with a face to workspace/ data_dst/ to generate the face used by DFL by “./5.1_data_dst_extract_faces_S3FD”. Drag the face image from workspace/data_dst/aligned/(face image file) to the Collection Manager’s Alignment Pane to capture the base face orientation and type used by DFL before you paste the movie.
Now paste(command-v ) to Collection Manager. The option panel will ask how you want to import the movie. Because it’s the destination, let’s not skip any frame and keep blurry faces. Face vector is optional. It helps sort multiple faces and identify frames that failed to detect a face.
Total 141 frames. Detected 104 faces in 33 seconds. Missed 2 faces in extreme cases like the image at left required manual adjustment.
DFL detected 112 faces. 8 were incorrect(not face), and 2 were missed. It took 2: 25 for GPU and 2:41 for CPU. GPU utilization was not good (90% at half speed). I think it could be the inference tasks are not large enough to fill the pipeline.
The GPU utilization is also low for Scene Selector. I think most have been offloaded to ANE( Apple Neural Engine). When face vector was enabled with an extra CoreML model involved, it only took extra 3 seconds.
ANE power racked up at importing started.
The next step is to export movie frames. Click “Export Originals”. Choose a location and use the default setting. Copy all files in the movie folder to workspace/data_dst/. Then, delete the movie folder. DFL uses the jpeg data chunks to save the face features. We need to run the script to make the face image recognized by DFL. Run the script “./5.1_data_dst_bless_from_json.sh”.
Preparing the Source
Open Collection Manager again. Choose the face collection tab and create a new one. For the source, you can use import movies and images without limitation. Actually, for training purposes, you can prepare the destination data this way especially if you want to target the same face in different videos. Importing from a video could obtain different angles of the face, but expects some blur images due to motion and different light conditions.
Click ‘Export Faces’, choose the group to export, and check the alignment file which is required to convert to DFL.
Now copy the aligned folder to workspace/data_src. Run the command ‘./4.1_data_src_bless_from_json.sh’
If have a downloaded one or trained one XSeg model, use the script ‘ 5_XSeg_data_(dst or src)_trained_mask_apply.sh’ to apply XSeg on the faces. It is required for some train options. Remember to check if XSeg was corrected applied by ‘5_XSeg_data_(dst or src)_mask_edit.sh’, so as preparing data to train your own XSeg models.
Training the Model
Now you are good to start a training. Run ‘./6_train_SAEHD.sh’. If you have downloaded a pertained model and copied to workspace/model/, it will become an option as the point to train. Training from scratch takes a lot more time.
Now you are good to start training. Run ‘./6_train_SAEHD.sh’. If you have downloaded a pertained model and copied it to workspace/model/, it will become an option as the point to train. Training from scratch takes a lot more time.
When you starts with pretrained model, copy *.dat only to the model folder. Start the training, select the model and press return key to option setup. To the last option, change pretraining model to ‘no’. Let it start with preview window. Press return to stop training. Now copy all *.npy files to replace the files in the model directory. Without this procedure, training always starts from the scratch.
There are many options to set, but it is beyond the scope of this blog. Sometimes you will encounter the error of “too many open files” if you train from many images. In the command line, set ‘unlimit -n 1024’ or “unlimited”.
Training is a time/resource-consuming process. To run on Mac, it would be better to use lower resolution even if you have plenty of RAM. Training from the default setting( resolution at 320) takes 7s per iteration at batch=8. The example here is 160. Time per iteration ranges from 0.5-1.5 depending on how many additional options are enabled for this training. For the serious producer, you can try the setting on Mac and migrate to Cloud Computing.