Playing Deepfake on Mac part 2, example

Thanks for the the efforts of contributors at Deepfakes(https://github.com/deepfakes/faceswap). Scene Selector is not required in generating face-swapped videos. It’s just a helper for first and last step described in part 1 (https://scene-selector.com/2019/10/03/play-deepfake-on-mac/). Though you might not a fan of keying commands in a terminal as a Mac user, I found there are some problems in Faseswap GUI running on my Mac. I will use the command line commands in this example.

Assuming you have set up the environment in part 1. Start Scene Selector, please use Version 3.6.3 or later. To make it quick, I choose the target clip from Blackpink’s MV and swap the face of Jennie to Rose.

First step is to prepare data for Faceswap.

  • Call out Menu->Collection Manager and choose the “Face” segment.
  • Add a new Collection. I named it “faceswap”.
  • Before importing, change the alignment method to Center.
  • You need 2 sets of faces for swapping from and to. I was lazy to collect more high-resolution photos for them. The suggestion from Faceswap is thousands of them including different expressions and orientations. I used the faces from the video instead.
Skip the vector and manually sort the face collection if you don’t have recognition model to sort them or it composes a few different faces. It saves a lot of time in importing.
  • Create new groups for each person, and relocate the faces to them. Just like this. Drop the low quality and incorrectly recognized ones. I got around 300 faces for each from this video.
  • Next step is to export the faces for training.
Check “Alignment File” only when you use models like “dfaker” with “-warp-to-landmarks” options. It needs to access original files to map the alignments to different coordinate systems and calcute hash for each one. The process significantly slows down the exporting speed.
  • In your terminal, change directory to “faceswap”. Set Option “-A”, “-B” to the folder you just exported and “-M” to the location model will be created. You can continue training any time later with the same command for previous interrupted training. I decreased the batch size to 32( default is 64 I think) because the computer became irresponsive at high batch size. I used the following command to start the training.
python faceswap.py train -A /Volumes/Extended/faceswap/faceswap-jennie/ -B /Volumes/Extended/faceswap/faceswap\ rose/ -m /Volumes/Extended/dfaker_model_jennie_rose/ -t dfaker -ss 500 -bs 32 -p

* faceswap used to take alignments.json as default alignment file. The latest version has changed it to “alignments.fsa” by default. If you would like to use mask to train the model, you have to generate mask and modify “mask_type” in the config/train.ini to the types have been generated. Here is the example to generated mask based on the alignment.json generated from Scene Selector.

python tools.py mask -p all -M components -it faces -i /Volumes/Extended/faceswap/faceswap-jennie/ -a /Volumes/Extended/faceswap/faceswap-jennie/alignments.json

  • Wait
  • Wait for more. My AMD RX580, though much faster than running CPU, still takes 5-7 seconds for an iteration. The training won’t converge. General speaking, the loss below 0.02 implies good results. You should judge by looking at the preview.
This is the preview window after 20000 iterations.
Faceswap generates tensorboard logs, use tensorboard –logdir=” your model log directory” to visualized the loss trends. I found It did not improve much after certain iteration. Not sure if a carefulIy chosen face set will help, but I think it’s time to stop.
  • Start to work on the video clips. Use Collection Manager to create a new Movie collection. In movie collection, you can only use one video clip copied from Scene Selector movie player. Select one clip. Copy it and use “Paste Copied Video Clip” to Movie Collection.
Face recognition is not as smart as you(yet). For example, I found 2 frames showing no faces between 2 similar frames. I have to add them back by manual aligning. Focusing on fitting face contour here is enough.
  • Once you have the collection manager showing only the face and some “no need to change” frames, choose “Export Originals”. You will get a folder of images of original frames and a alignment file. Also, the Timing file and audio file will be used later.
  • We have to generate mask file again on the output images. This time we use “frames” for “-it” as the alignments was made in coordinate of extire frame.
python tools.py mask -M components -it frames -i /Volumes/Extended/faceswap/Others/ -a /Volume/Extended/faceswap/Others/alignments.json

There are many options to convert, I used the default setting as following.

python faceswap.py convert -i /Volumes/Extended/faceswap/Others/ -o /Volumes/Extended/faceswap/output/ -m /Volumes/Extended/dfaker_model_jennie_rose -M components -s arg-color

You can use FFMPEG option to merge movie frames to a video or GIF. Alternately use Scene Selector’ Movie Generator to merge them into a video. When you use Movie Generator, move the audio and timing file to the output folder and select it or remove other image files in the top level folder.

  • The video will pop out in a new window. Enjoy.

Faceswap is not the promoting function of Scene Selector because the major task depends on the Faceswap open source project. However, it provides an easy way to prepare the data – takes advantage of the improving face landmark recognization technology from Apple and manual adjustment for extreme cases. It is recommended to upgrade your Mac to Catalina which comes with version 3 of face tool that will save your time in adjusting faces. You spend 1% of the time in importing video and filtering the faces and computer do the rest for you. It still meets our target.

Playing Deepfake on Mac, part 1: Setup

This a quick update on how to use Scene Selector to prepare and merge movie frames for a face swapped movie. You still need code from https://github.com/deepfakes/faceswap.

In short, face-swapping a movie includes 3 steps:

  1. Extract faces of two different faces. You will need many of them for different expressions and positions in order to get a good result.
  2. Train the model to learn how to transform from face A to face B from the data of step 1.
  3. Convert the face on each movie frame and merge back to a movie.

Every step involves the heavy usage of GPU, though it can run with CPU. It is about 1/10 of speed comparing to GPU. Especially for step 2, if you started from a pretrained model, it takes about 4-6 hours to generate a new good model ( on AMD RX580 eGPU). It takes significant time running it on CPU.

Here is the step by step instruction if you don’ t have python virtual environment ready.

Go to https://brew.sh and install Homebrew with the terminal command on the page.

  1. Create your isolated environment with command. You can use your preferred location.
brew install pyenv-virtualenv
pyenv install 3.6.6
pyenv virtualenv 3.6.6 faceswap-env

Add following lines to enviroment(pre-10.15: ~/.bash_profile, 10.15 if you use zsh: ~/.zshrc) so pyenv can work properly.

eval "$(pyenv init -)"
eval "$(pyenv virtualenv-init -)"

Restart the shell or execute the command above. Then use following command to start the working environment.

pyenv activate faceswap-env
  1. Now change the directory to the place you want Faceswap program to be downloaded.
cd ~/Downloads
git clone https://github.com/deepfakes/faceswap.git
  1. Activate your working environment (for every new terminal session you have to do this. )
  2. Change the directory to faceswap directory
  3. Run the install script and follow the instructions.
  4. setup plaidml(if you use GPU)
# setup plaidml to choose the GPU.
python setup.py
plaidml-setup

Now you can test if your setup works with a directory of images or a video file.

python faceswap extract -i INPUT_DIRECTORY -o OUTPUT_DIRECTORY

Your plaidml might not be set to the device of the most powerful GPU you plan to use. Use plaidml-setup command to set it correctly. If you are running nVidia card on MacOS 10.13.*, you can also use plaidml instead of building tensorflow for GPU by yourself.

Who is more popular?

You may already know there are more pop singers in Japanese and South Korea show up in a team. There are several reasons: hitting more audiences with minimal efforts and somehow gathering the power of top performers to bring up the exposure of inexperienced or junior ones.

If you are one of a fan of one member from some group, you might be curious about the ranking of the one from entire members of the group. It’s can be easily searching the web to get the ranking from the general poll, but actually, there is a cue how the companies behind these groups see the weighting of each member.

Let’s choose one team for this. First, we need to prepare the movie file of the target team performs. It can be a live performance or MV. Downloading it is not difficult from youtube. I chose OH MY GIRL’s SSFWL for the first example. It is a live performance broadcasted from TV. JDownloader is here for help.

The next step is to prepare the model. The face recognition model in Scene Selector is trained from supervised learning. Each model identifies faces only what have trained. Every face which is not in the original dataset will be identified as the closest one unless you enable the L2 filter and it’s apart from the dataset enough. It should be fine in this use case. Preparing model is very easy but preparing data might need some effort. If you are familiar all members you about to work on, extracting faces from the video working on gives the best accuracy . The con is you have to separate some of them yourself. Please notice, Apple’s face recognition on the live stage does not work very well on live stages with complicated light conditionals.

Building up a model from a known source maybe the easiest way to collect celebrity headshots. Some would collect them from popular search engines like Google or Bing, however, from the real experiences show the search results might not be so good especially for the ones that are not very popular. The better idea is to retrieve the photo from their fans. Fans rarely mistake their idols with others. I prefer to download them from the boards of Pinterest created especially for one specific person. Scene Selector provides an import option for the organized folder and you can ignore “group pictures” that need you manually pick the correct one.

Name the folders to the identifiers you want to use and put them in the same location. So they can be imported and with section automatically created.

You are ready to use the “Create Face Classification Model” to create one. Before that, it would be better to use “Reduce Samples” to assign the collective samples together. We have found the following steps would work best to create a general model:

  1. Train with “Use Unassigned For Validation”.
  2. Save the model and use it to identifier all faces at 25% confidence.
  3. Assign mismatched faces with “Assign Prediction Mismatch”.
  4. Train with “Use Unassigned For Validation” again.

For the first step, you can let the trainer create validation data from 15% of each group if you have enough samples. You always want the deviation to be assigned for training in the last step.

In a small group, reaching 100% validation accuracy is not difficult. If you find some never fit in, you can add duplicate copy to increase the weight on the sample. Beware of overtraining. 90% is usually good enough for non-serious tasks. You should not rely on the model for serious tasks anyway.

OK, it’s the moment of the truth. Run the classification one time with the newly created model. We got this for the statistic button.

That’s it. It should be noticed it is impossible to avoid the error caused by motion blur for this type of video.