Audio and video recording

The Quobis Communication Platform supports the recording of the audio and video streams. Call recording can be used in a number of situations such as to meet regulatory requirements, verify accuracy, review quality of the customer service, employee training, etc… This page describes the different configuration options in order to choose what to record, how to record, who has the right to start a recording and and the characteristics of the recorded files.

What can be recorded

Recording of one to one calls and multi-party calls is supported both for video and audio streams. Likewise, if you are developing an application using the SDKs, you can record any conference room that has been created with the invitation method. Recording of meetings public rooms and is not supported in the current version.

Who can record: recording permisssions

Administrators can set up domain-wide and per-user policies in order to decide which users are able to record their outbound calls they make and/or setup mandatory recording by default. In addition, administrators can decide whether incoming calls from PSTN or a SIP trunk are recorder with a system-wide configuration.

Permissions for recording of internal calls

The ability to make a recording of a one to one call or multiparty call relies on a set of two permissions, named recordingProfile and recordingType, which are grouped into the recording permission named RecordingPermissions. The following table explains the available values of hese two fields:

Values of the RecordingPermission set

Field

Meaning

Values

recordingType

Indicates which media streams can be recorded

The following values are available (only of them)
  • all: both audio and video streams will be recorded

  • video: only the video stream will be recorded

  • audio: only the audio stream will be recorded

  • none: neither the audio nor the video stream will be recorded

recordingProfile

Indicates the recording policy for this user or domain

The following values are available (only of them)
  • always: every call will be recorded by default, regardless of the caller choice

  • optional: let the caller choose whether to record or not

  • skip: calls made from this caller will never be recorded, regardless of the user choice

Please note that these permissions are related to the user who starts the call (the caller) and to the domain that this user belongs to. In other words, the decision about whether a call can be recorded or not has nothing to do with the rest of participants involved in the call, it only involves the caller who initiates the call.

The RecordingPermissions are set, updated and retrieved via the following REST API endpoints:

  • /permissions: endpoint to manage user permissions. Restricted to the user performing the request

  • /permissions/domain/{id}: endpoint to manage domain permissions. Restricted to the domain administrator

The following code snippet shows and example of a “GET” query answer of the permissions of a user, which are returned in a JSON file format

{
  "recording": {
        "recordingType": "all",
        "recordingProfile": "always"
}

Please note that the system does not have any permissions created by default.

Permissions for recording of SIP calls

The permissions for incoming calls explained above does not apply for the recording of incoming calls. There is a parameter which controls whether incoming SIP calls can be recorded (from PSTN or from a SIP trunk) or not. This parameter (recordIncomingPSTN) can be found in the QSS.trunk service. If this parameter is set to true, every incoming SIP call will be recorded regardless of the user permissions. If this parameter is set to false, incoming calls from PSTN won’t be recorded. Its default value is set to false.

Please note that this is a system-wide parameter and only affects incoming calls, not outgoing calls. It’s configuration is explained in the upgrade section.

Permissions precedence

In addition to the permissiones explained above, there is a generic system-wide parameter in the QSS.rooms services which defines whether the system can record or not in the abscense of permissions (in other words, it only applies when there are no permissions configured). This parameter is configured in the config.json file of the QSS configuration as explained here

... other parameters
record = none, all, video, audio
...

with the following available fields:

  • none: no call will be recorded

  • all: every call will be recorded

  • video: video stream will be recorded

  • audio: audio stream will be recorded

Internal calls

The recording permissions apply both to the user and to the domain, being the user permission the most restrictive one. That means that, in case of conflict between the user and domain permissions, the user permissions are the ones that apply. On the other hand, when the user has no recording permissions, the domain permissions apply as a fallback. Beyond that, if there are neither user nor domain permissions, there is a generic configuration that applies to the entire system in the QSS configuration (see section “Configuration” below).

In summary, the precedence for internal calls is as follows: user permissions –> domain permissions —> system configuration. As an example, if a system is provisioned with the following domain permissions, every call made by Alice will be automatically recorded. On the other hand, calls started from other users without any permissions won’t be recorded.

  • Domain permissions: [recordingType = “none”, recordingProfile = “skip”]

  • User “Alice” permissions: [recordingType = “all”, recordingProfile = “always”]

External SIP calls

In this case, there is no precedence as the only valid configuration parameter is recordIncomingPSTN. Once a call is set to be recorded, this information is populated in the “createRoom” message and the parameter “record” is set to TRUE.

What is recorded

Once a call is set to be recorded, it is recorded from the beginning (“start time”) until the end (“end time”). Stopping the recording once it has started it’s not possible in the current version. The start time is computed when the callee answer the call. The end time is calculated as follows:

  • In one-to-one calls, where there is only two participantes, the end time is when any of them hangs up.

  • In multiparty calls, the end time when one of the last two remaining users leaves the call (which will mean that the system hangs up the remaining user automatically).

The recording includes the actual audio and video streams and metadata such as start time, end time and conference room ID.

Output format

Once the call is over, the system will process the audio and video streams and generate the recording files according to the Service configuration.. It may take more or less time depending on the configuration options that are explained below. Developers can check that the recordings are ready as a event is generated and sent into the queue as configured in the processedQueue parameter.

The system can provide the recording in two output formats:

  • Split: each video stream is provided separately

  • Merged: all video streams are provided mixed into a single matrix view

Please note that these output formats are mutually exclusive, which means that only one configuration can be active. In addition, this is a system-wide configuration which means that

The default output format, when no one is configured by the administrator, is “merged”.

Split

This is the most straightforward option, where the video of each participant is provided in a separate video file. These files will have different size in the case of multiparty calls, as not every participant might had spent the same amount of time in the conference. On the audio side, all the audio streams come mixed from the audiomixer so its provided into a single audio file. Additionally, a JSON file is provided to relate all these files together along with the conference metadata. In summary, in a multiparty call with N participants, the system will generate:

  • One WAV audio file, containing all the mixed audio streams

  • N videos files (webm or mp4), each of them containing the video stream from each participant

  • One JSON file with conference metadata

The format of the JSON file is as follows, where we can find the list of video files and the audio file along with its corresponding start and end time.

 1{
 2 "uuid": "61408347",
 3 "startTime": "2020-05-25T07:42:28.633Z",
 4 "stopTime": "2020-05-25T07:42:43.826Z",
 5 "shouldRecordAudio": true,
 6 "shouldRecordVideo": true,
 7 "videos": [
 8     {
 9     "startTime": "2020-05-25T07:42:29.126Z",
10     "stopTime": "2020-05-25T07:42:43.756Z",
11     "filename": "/recording/postprocessed/61408347-c48eb251b24289c951345fa3c57f123e4ee2/rec-20200525-074229-5dde2b2fbd53a5c690.webm",
12     "size": "0.123412412341MB",
13     "userID": "5dde2b2fbd53a5c690"
14     },
15     {
16     "startTime": "2020-05-25T07:42:36.126Z",
17     "stopTime": "2020-05-25T07:42:43.756Z",
18     "filename": "/recording/postprocessed/61408347-c48eb251b24289c951345fa3c57f123e4ee2/rec-20200525-074229-074229-5f2av4415c2e44f681.webm",
19     "size": "0.323412412341MB",
20     "userID": "5f2av4415c2e44f681"
21     },
22 ],
23 "audio": {
24     "filename": "/recording/postprocessed/roomId-confbridgeId/rec-20200525-074229-roomId-confbridgeID.wav",
25     "startTime": "2020-05-25T07:42:28.633Z",
26     "stopTime": "2020-05-25T07:42:43.826Z",
27     "size" : "0.23412412341MB"
28  }
29}

The selection of this output format is done by setting split_recording_output to true in the recording service configuration.

Merged

When this output format is selected, the server will generate a single video file containing a matrix that will show all the participants in the screen, adding and removing them according to the time when they have joined or left the conference. This output format requires a post-processing of the raw recording files so it might not be suitable in case that the recordings need to be available right after the call has finished.

The maximum number of participants than this matrix can show is 9 participants (which gives a matrix of 3x3 participants). The number of participants in the matrix has an upper bound that is set by the mix_limit parameter. If a conference has more participants than this value, or its value is higher that 9, then the output format will be split instead of merged (even if split_recording_output parameter is set to false). Also note that, if split_recording_output is set to true, the output format will always be splitted.

The following video shows an actual recording of a call with four participants, where you can see how the new participants are added to the matrix:



Play the above video to see an example of a recording with four participants in merged mode


The selection of this output format is done by setting split_recording_output to false in the recording service configuration (this is the default configuration).

Recording quality

All the supported audio and video codecs for making calls are also available for recording. The quality of the recorded files can be configured in order to suit the recording requirements and also taking into account the resulting file size, which can have impact on the storage requirements.

  • On the audio side, files are stored as WAV file (128 kb/s, 8 bits per sample) quality. As an example, one minute of audio recording generates a 1Mb WAV audio file.

  • On the video side, an important point that needs to be taken into account is that VP9 provides a better quality when recorded and consumes less disk space, but it processing takes longer that VP8. The recording quality can be configured by choosing three quality levels: low, medium and high. As a rule of thumb, a one minute video processed with low quality takes 2 MBytes of disk space, while the same video with high quality takes 4Mbytes of disk space.

The following images compare a screenshot of a the same conference call, one recorded with low quality (left) and another one recorded with high quality, using VP8 codec:

../_images/recording_quality_low_high_comparision.png

Screenshot of recording with low quality (left) and high quality (right), using VP8 codec (click to enlarge)

Note

Please note that the video size is fixed to 640x480 pixels, regardless of the video quality.

Encryption

Recordings can also encrypted for better security and integrity using GPG encryption. In order to use the GPG encryption, the administrator will need to provide a public GPG key. This can be done by using the open source software available at GPG

Decryption is done by using the gpg command with the --decrypt option The private key to which the message was encrypted is needed:

gpg --output video_unencrypted --decrypt video_encripted.gpg

The encrypted recordings are protected even if the servers are compromised and an attacker is able to download the recording files. The encryption takes places immediately after the call is finished. The private key must be protected in a secure device.

Service configuration

The configuration of this service is described in the upgrade notes.

Available logs

These are the main log messages generated by this service:

  • debug: File encryption is disabled (enabled)

  • info: FilenameGenerator configured with: {"filenameFormat":"rec-{{uuid}}-{{confbridgeid}}"}

  • debug: Recording started

  • debug: Received a process recording event with id: confbridgeID and recording info recording_info. This log is printed when a conference ends, an event is broadcasted and the postprocessing starts.

  • debug: Processing video: /sippo-recording/raw_files/test/janus-test-userId1-sfuvideo-1615975667821-video.mjr. For each video in the conference, there will be a log indicating the preprocess of the videos (conversion from .mjr to .webm or .mp4)

  • debug: Trying to encrypt file: /sippo-recording/postprocessed/rec-20200525-074229-test-test.webm

  • debug: Recording with ID 23423432 processed correctly.

Known limitations

Once a call has started, recording configuration cannot be changed and recording can not be activated or deactivated.

If a user call is recorded, the complete call is recorded: from first user join to last leave is recorded. No matter that a rec-skip user will be involved.

Implementation details

This service generates media recordings from the media captured in the SFU and audiomixer. When a conference room is created with the recording activated, two things happen:

  • the audiomixer will save all the audios mixed in a single file (by default, the audio file will be stored in /var/spool/asterisk/monitor in asterisk machine with name recording-${confbridgeId}.wav).

  • the SFU will save all the videos flows from all the participants who joined the conference and published video (from a webcam or by sharing the screen).

These files will be persisted in .mjr format (this is a custom format where each file then basically just contains a structured dump of the RTP packets exactly as they arrived). When a conference with recording activated ends, wrapper service will emit an event to the message broker with all the conference information.

Example of such an event:

{
  "conferenceId":"conferenceId",
  "recordingInfo":
  {
    "uuid":"conferenceUuid",
    "completed":"2020-05-25T07:42:43.826Z",
    "video":
    {
      "video1":
      {
        "sfuvideo":
        {
          "1590392556275":
          {
            "filename":"/test/test-video-1.mjr",
            "stopTime":"2020-05-25T07:42:43.826Z",
            "startTime":"2020-05-25T07:42:36.277Z"
        }
      }
    },
      "video2":
      {
        "sfuvideo":
        {
          "1590392549123":
          {
            "filename":"/test/test-video-2.mjr",
            "stopTime":"2020-05-25T07:42:43.756Z",
            "startTime":"2020-05-25T07:42:29.126Z"
            }
          }
        }
      },
    "audio":
    {
      "asterisk":
      {
        "sipcall":
        {
        "1590392548633":
        {
          "filename":"audio.wav",
          "stopTime":"2020-05-25T07:42:43.826Z",
          "startTime":"2020-05-25T07:42:28.633Z"
        }
      }
    }
  }
}

When the recording services receives this event, it will start processing the recordings. This processs is divided into three steps:

Preprocessing

This step is in charge of preprocessing all the .mjr files and converting them to the corresponding media file. To do this, the service will use a binary file offered by janus, called janus-pp-rec. Janus-pp-rec actually only dumps the RTP frames it receives to a file in a structured way, so that they can be post-processed later on to extract playable media files. This utility allows you to process those files, in order to get a working media file you can playout with an external player. The tool will generate a .webm if the recording includes VP8 frames, an .opus if the recording includes Opus frames, an .mp4 if the recording includes H.264 frames, and a .wav file if the recording includes G.711 (mu-law or a-law) frames.

After this step, all .mjr files will be converted to .webm files or .mp4 files (in our case, conference could use vp8 or h264 as video codecs).

Processing Data

This step is in charge of process all data received in the event emitted by wrapper and generating all the info needed in the postprocessoring to crete the final file. This will generate a list of intervals, where each interval ends when a display layout changes (participant enters or leaves the conference will produce a change in this layout).

Postprocessing

This step will received the intervals calculated for the previous step and:

  • Create a partial video for each interval received.

  • Once all partial videos are created, the postprocessor concatenates them in a single video file.

  • Then mix concatenated video with the audio generating the final file.

  • Move final file to destination folder.

  • If clean_temporary_files is activated, all temporary files related with processing will be deleted.

  • An event will be emmited to rabbitmq indicating the processed conference.

  • Ack to rabbitmq to remove this task from it.

Things to keep in mind

Some things to keep in mind:

  • If the recording services fails in any of the previous steps, it will emit an event to rabbitmq (in the failure_queue) indicating the error and the conference that fails. In this case, although the clean_temporary_files is active, the temporary files will not be deleted, in order to do not lose these files without having a processed final file.

  • If conference has no audio (or some problem ocurred saving the audio by the asterisk), a final file will be generated with all the concatenated videos but without audio.

  • The processed file will have this format: recording-${confbridgeId}-${roomUuid}.webm

  • For the recording to work correctly, it will need to have access to the media files persisted by the audiomixer and by the SFU (this can be done created shared volumes between containers, using a NFS network…).