When it is trained enough (judged from the preview), we can start to try the model on. Run the script with “./7_merge_SAEHD.sh”. Choose the model. We don’t have the setting to fit all scenarios, so remember to use interactive to get a better combination to fit yours. If you have an occlusion in front of the face, XSeg is recommended to produce a more realistic image.
When all frames were processed, copy all the merged images from workspace/data_dst/merged/ to the origin folder of your exported movie( the folder with timing.json and audio.m4a). Back to Scene Selector. Run the Image movie generator from the Tools menu. Click the folder icon and choose the folder you just copy files to and click Start. The new movie should be ready momently in the same folder.
Now you should have DeepFaceLab(DFL below) ready to play following the steps in Part 1. At the top level of the directory of DeepFaceLab_Mac_M1( or the name you set for installation), there are sub-directories that work as.
workspace: the default workspace, all scripts search the required data from this one. All trained models go to this place.
DeepFaceLab: the original one with macOS modifications.
scripts: script files starting with the prefix of numbers which implies the order to run. The secondary number after the dot means optional to run. Usually, you only have to pick one (but not limited to one) in the process.
Here is a quick start:
Drop the target movie file, to workspace/data_dst.mp4, and the source movie file, to workspace/src_dst.mp4.
In the script directory, run the scripts. 2_ -> 3_ -> 4_1 -> 5._ -> 6_ -> 7_ -> 8_
Step 6 takes significant time. The final movie is shown in the workspace/
Sounds easy, in a perfect world yes. Actually, you still need extra effort to get an acceptable result. There are many tutorials and discussion boards for it. Now we focus on replacing 2, 3, 4, and 8 with Scene Selector’s Collection Manager. Though you can modify the scripts to let DFL work on the file exported, we just keep the copy-file method so you can use the native method at any time. The following test was done on M1 Max 32GB with 32-core GPU. Scene Selector Version 4.5 or later.
Preparing the Destination Movie
Open a movie file in Scene Selector. The movie file must be in the format macOS can recognize, which says if it cannot be opened by Quicktime player, it won’t be opened by Scene Selector. If the file is not compatible, I would recommend using Handbrake and using the ‘video toolbox’ coder to utilize the hardware encoder. For the MV( 4K, 24fps 3:52) I tested, transcode to H265, 10bit, 4K took about 1 min.
To make a quick test, I selected a clip with only one singer in it by the anchors. Copy( command-c or from the Edit menu) it.
Open a new Collection Manager( Tools-> Collection Manager). Select the Movie tab and + to create a new collection. Click the name of the collection and rename it. Click somewhere else or press the tab key to make sure the name changed. Click the Go button to start a new collection.
For first-time-run, it would be better to drop an image with a face to workspace/ data_dst/ to generate the face used by DFL by “./5.1_data_dst_extract_faces_S3FD”. Drag the face image from workspace/data_dst/aligned/(face image file) to the Collection Manager’s Alignment Pane to capture the base face orientation and type used by DFL before you paste the movie.
Now paste(command-v ) to Collection Manager. The option panel will ask how you want to import the movie. Because it’s the destination, let’s not skip any frame and keep blurry faces. Face vector is optional. It helps sort multiple faces and identify frames that failed to detect a face.
Total 141 frames. Detected 104 faces in 33 seconds. Missed 2 faces in extreme cases like the image at left required manual adjustment.
DFL detected 112 faces. 8 were incorrect(not face), and 2 were missed. It took 2: 25 for GPU and 2:41 for CPU. GPU utilization was not good (90% at half speed). I think it could be the inference tasks are not large enough to fill the pipeline.
The GPU utilization is also low for Scene Selector. I think most have been offloaded to ANE( Apple Neural Engine). When face vector was enabled with an extra CoreML model involved, it only took extra 3 seconds. ANE power racked up at importing started.
The next step is to export movie frames. Click “Export Originals”. Choose a location and use the default setting. Copy all files in the movie folder to workspace/data_dst/. Then, delete the movie folder. DFL uses the jpeg data chunks to save the face features. We need to run the script to make the face image recognized by DFL. Run the script “./5.1_data_dst_bless_from_json.sh”.
Preparing the Source
Open Collection Manager again. Choose the face collection tab and create a new one. For the source, you can use import movies and images without limitation. Actually, for training purposes, you can prepare the destination data this way especially if you want to target the same face in different videos. Importing from a video could obtain different angles of the face, but expects some blur images due to motion and different light conditions. Click ‘Export Faces’, choose the group to export, and check the alignment file which is required to convert to DFL.
Now copy the aligned folder to workspace/data_src. Run the command ‘./4.1_data_src_bless_from_json.sh’
If have a downloaded one or trained one XSeg model, use the script ‘ 5_XSeg_data_(dst or src)_trained_mask_apply.sh’ to apply XSeg on the faces. It is required for some train options. Remember to check if XSeg was corrected applied by ‘5_XSeg_data_(dst or src)_mask_edit.sh’, so as preparing data to train your own XSeg models.
Training the Model
Now you are good to start a training. Run ‘./6_train_SAEHD.sh’. If you have downloaded a pertained model and copied to workspace/model/, it will become an option as the point to train. Training from scratch takes a lot more time.
Now you are good to start training. Run ‘./6_train_SAEHD.sh’. If you have downloaded a pertained model and copied it to workspace/model/, it will become an option as the point to train. Training from scratch takes a lot more time. When you starts with pretrained model, copy *.dat only to the model folder. Start the training, select the model and press return key to option setup. To the last option, change pretraining model to ‘no’. Let it start with preview window. Press return to stop training. Now copy all *.npy files to replace the files in the model directory. Without this procedure, training always starts from the scratch.
There are many options to set, but it is beyond the scope of this blog. Sometimes you will encounter the error of “too many open files” if you train from many images. In the command line, set ‘unlimit -n 1024’ or “unlimited”.
Training is a time/resource-consuming process. To run on Mac, it would be better to use lower resolution even if you have plenty of RAM. Training from the default setting( resolution at 320) takes 7s per iteration at batch=8. The example here is 160. Time per iteration ranges from 0.5-1.5 depending on how many additional options are enabled for this training. For the serious producer, you can try the setting on Mac and migrate to Cloud Computing.
The previous article introduced how to do a face swap with the Deepfake repository. Deepfake evolutes very fast, previous tricks depend on the alignment file that does not work anymore.
A more intuitive tool -DeepFaceLab which claims 90% of face swapping works on the internet were done by the tool.
In the previous article, the acceleration depends on PlaidML which used OpenCL. As Apple started supporting TensorFlow and PyTorch metal, making the tool compatible with macOS is easier.
What we provide here:
An introduction to how to prepare the environment from ground-up for M1 Mac. A simple script is provide to install all required packages. The script does not work on Intel platform. It needs some modifications.
How to import faces, correct them ,and export to DeepFaceLab for their process.
How to prepare a movie, and throw the extracted frames to DeepFaceLab. How to merge it back after the materials are processed.
Apple’s MPS support is not complete and there are bugs to be fixed. I have to move some operations to the CPU. It surely slows down the process. Running on GPU is only about 5-6 times faster than it on CPU. It’s a sign that GPU is not fully optimized( or the CPU is too fast?)
Only 32-bit data is supported in MPS, which slows down the training process. More ram is required to support higher resolution. Sometimes, 32G of system memory is barely enough.
Though GPU performance for M1 Pro and M1 Max looks good in some tests, M1 Max 32 cores GPU is only comparable to the card like RTX3060 at the SAME power level. M1 Max GPU consumes 40W at its peak performance, and RTX could reach 8 times more.
Begin the Journey
I tried to make the installation process as simple as possible. It should work without much interaction even starting from a fresh system, but you still need to look out for some errors during the installation.
Fast clone the main script, and open Terminal(launch from Launchpad). Change the directory via the ”cd” command to where you want it to be installed and type
You can add a folder name if you don’t want to use the default name “DeepFaceLab_Mac_M1”. If git is not available, the system will ask to install the Xcode Command Line tool. Follow the instruction and install it. After successfully cloning the source, enter the directory
Run the script.
You may need to respond to the terminal if brew is not previously installed. There are >1GB of files to be downloaded and installed. Be patient. If the terminal shows any critical errors, such as cloning interrupted for network problems, just re-run the script to fix it.
Thanks for the the efforts of contributors at Deepfakes(https://github.com/deepfakes/faceswap). Scene Selector is not required in generating face-swapped videos. It’s just a helper for first and last step described in part 1 (https://scene-selector.com/2019/10/03/play-deepfake-on-mac/). Though you might not a fan of keying commands in a terminal as a Mac user, I found there are some problems in Faseswap GUI running on my Mac. I will use the command line commands in this example.
Assuming you have set up the environment in part 1. Start Scene Selector, please use Version 3.6.3 or later. To make it quick, I choose the target clip from Blackpink’s MV and swap the face of Jennie to Rose.
First step is to prepare data for Faceswap.
Call out Menu->Collection Manager and choose the “Face” segment.
Add a new Collection. I named it “faceswap”.
Before importing, change the alignment method to Center.
You need 2 sets of faces for swapping from and to. I was lazy to collect more high-resolution photos for them. The suggestion from Faceswap is thousands of them including different expressions and orientations. I used the faces from the video instead.
Create new groups for each person, and relocate the faces to them. Just like this. Drop the low quality and incorrectly recognized ones. I got around 300 faces for each from this video.
Next step is to export the faces for training.
In your terminal, change directory to “faceswap”. Set Option “-A”, “-B” to the folder you just exported and “-M” to the location model will be created. You can continue training any time later with the same command for previous interrupted training. I decreased the batch size to 32( default is 64 I think) because the computer became irresponsive at high batch size. I used the following command to start the training.
* faceswap used to take alignments.json as default alignment file. The latest version has changed it to “alignments.fsa” by default. If you would like to use mask to train the model, you have to generate mask and modify “mask_type” in the config/train.ini to the types have been generated. Here is the example to generated mask based on the alignment.json generated from Scene Selector.
python tools.py mask -p all -M components -it faces -i /Volumes/Extended/faceswap/faceswap-jennie/ -a /Volumes/Extended/faceswap/faceswap-jennie/alignments.json
Wait for more. My AMD RX580, though much faster than running CPU, still takes 5-7 seconds for an iteration. The training won’t converge. General speaking, the loss below 0.02 implies good results. You should judge by looking at the preview.
Start to work on the video clips. Use Collection Manager to create a new Movie collection. In movie collection, you can only use one video clip copied from Scene Selector movie player. Select one clip. Copy it and use “Paste Copied Video Clip” to Movie Collection.
Once you have the collection manager showing only the face and some “no need to change” frames, choose “Export Originals”. You will get a folder of images of original frames and a alignment file. Also, the Timing file and audio file will be used later.
We have to generate mask file again on the output images. This time we use “frames” for “-it” as the alignments was made in coordinate of extire frame.
You can use FFMPEG option to merge movie frames to a video or GIF. Alternately use Scene Selector’ Movie Generator to merge them into a video. When you use Movie Generator, move the audio and timing file to the output folder and select it or remove other image files in the top level folder.
The video will pop out in a new window. Enjoy.
Faceswap is not the promoting function of Scene Selector because the major task depends on the Faceswap open source project. However, it provides an easy way to prepare the data – takes advantage of the improving face landmark recognization technology from Apple and manual adjustment for extreme cases. It is recommended to upgrade your Mac to Catalina which comes with version 3 of face tool that will save your time in adjusting faces. You spend 1% of the time in importing video and filtering the faces and computer do the rest for you. It still meets our target.
Extract faces of two different faces. You will need many of them for different expressions and positions in order to get a good result.
Train the model to learn how to transform from face A to face B from the data of step 1.
Convert the face on each movie frame and merge back to a movie.
Every step involves the heavy usage of GPU, though it can run with CPU. It is about 1/10 of speed comparing to GPU. Especially for step 2, if you started from a pretrained model, it takes about 4-6 hours to generate a new good model ( on AMD RX580 eGPU). It takes significant time running it on CPU.
Here is the step by step instruction if you don’ t have python virtual environment ready.
Go to https://brew.sh and install Homebrew with the terminal command on the page.
Create your isolated environment with command. You can use your preferred location.
Your plaidml might not be set to the device of the most powerful GPU you plan to use. Use plaidml-setup command to set it correctly. If you are running nVidia card on MacOS 10.13.*, you can also use plaidml instead of building tensorflow for GPU by yourself.
You may already know there are more pop singers in Japanese and South Korea show up in a team. There are several reasons: hitting more audiences with minimal efforts and somehow gathering the power of top performers to bring up the exposure of inexperienced or junior ones.
If you are one of a fan of one member from some group, you might be curious about the ranking of the one from entire members of the group. It’s can be easily searching the web to get the ranking from the general poll, but actually, there is a cue how the companies behind these groups see the weighting of each member.
Let’s choose one team for this. First, we need to prepare the movie file of the target team performs. It can be a live performance or MV. Downloading it is not difficult from youtube. I chose OH MY GIRL’s SSFWL for the first example. It is a live performance broadcasted from TV. JDownloader is here for help.
The next step is to prepare the model. The face recognition model in Scene Selector is trained from supervised learning. Each model identifies faces only what have trained. Every face which is not in the original dataset will be identified as the closest one unless you enable the L2 filter and it’s apart from the dataset enough. It should be fine in this use case. Preparing model is very easy but preparing data might need some effort. If you are familiar all members you about to work on, extracting faces from the video working on gives the best accuracy . The con is you have to separate some of them yourself. Please notice, Apple’s face recognition on the live stage does not work very well on live stages with complicated light conditionals.
Building up a model from a known source maybe the easiest way to collect celebrity headshots. Some would collect them from popular search engines like Google or Bing, however, from the real experiences show the search results might not be so good especially for the ones that are not very popular. The better idea is to retrieve the photo from their fans. Fans rarely mistake their idols with others. I prefer to download them from the boards of Pinterest created especially for one specific person. Scene Selector provides an import option for the organized folder and you can ignore “group pictures” that need you manually pick the correct one.
You are ready to use the “Create Face Classification Model” to create one. Before that, it would be better to use “Reduce Samples” to assign the collective samples together. We have found the following steps would work best to create a general model:
Train with “Use Unassigned For Validation”.
Save the model and use it to identifier all faces at 25% confidence.
Assign mismatched faces with “Assign Prediction Mismatch”.
Train with “Use Unassigned For Validation” again.
For the first step, you can let the trainer create validation data from 15% of each group if you have enough samples. You always want the deviation to be assigned for training in the last step.
OK, it’s the moment of the truth. Run the classification one time with the newly created model. We got this for the statistic button.
That’s it. It should be noticed it is impossible to avoid the error caused by motion blur for this type of video.