Outils pour utilisateurs

Outils du site


[ 2014 ] [ Last update : December 14, 2018 ]

by Yann Ics



Cite this article:
Yann Ics, Documentation of the executable script enkode, 2014
Last update: December 14, 2018
Date retrieved: 15/12/18
Permanent link: http://experimental.mus-ics.net/wiki/doku.php?id=article_1

Some modifications to the initial proposal have led to a restructuring of the initial text.
The idea was to clarify as much as possible the script and its documentation in order to make it more easy to use.
So, some procedures have simply been removed, and some others rationalized.


enkode is an executable script bash and it can be used as command line under Unix or Linux. This executable consists to analysis a signal defined by a sound file according to some modalities that will be described here. This script was designed more specifically for undetermined pitch percussion music. Then, the modalities concern the analytic discrimination in terms of duration (sample segmentation), dynamic, and timbre (defined by the relative pitch, the salience of bass frequencies and f0) in such a way each segment is analyzed as an array event. Thus, enkode generates data on the stdout for structural or formal analysis and sound synthesis as parameters.

This is an experimental procedure destinated to illustrate one way to extract relevant characteristics of a sonic segment.

enkode needs two main programs. The first one is PRAAT1) devoted to the analysis sound files, and the second one is the lisp compiler SBCL2) for data processing. The Shell script enkode insures the mediation.

Event segmentation

Dynamic profile

Beforehand, we use the analysis of PRAAT software that has the particularity to process each analysis with a simple script. This script generates initially cochleagram analysis based on the perception of sound by the ear. This allows considering the same encoding of the inner ear to the brain. This translates into a bark scale3) ranged in 24 critical bands of hearing, and the perceived sound level expressed in phone. The phonie is a weighted expression of the sound level according to an equal loudness curve (see figure 1) which reflect the sensitivity of the human auditory system. The equal loudness curves have been empirically established in 1933 by Fletcher and Munson, and revised in 1956 by Robinson and Dadson.

Figure 1 – Equal loudness curves.

To complete this representation of our sound perception, the phones of analysis PRAAT are converted into sones 4) such as:

Thus, by adding the loudness of each frame (designating the timbre profile) we get a relevant dynamic profile.

Figure 2 – Dynamic profile of a sample generated by PRAAT.

To get the profile data, just add the option +profile.
The time step needed for PRAAT analysis can be set with the option -t, --time-step (default value is equal to 0.01 seconde).


The TextGrid is a connected sequence of labeled intervals, with boundaries in between. Currently, these intervals correspond to an event.

From figure 2, we have to make a segmentation to discriminate each event inside the sample in order to write an appropriate TextGrid file that we will use for further analysis in praat. For that we have to select each peak and each valley from the profile and fill the following conditions in the case where:

  • the sample would start on a peak – for example, due to an inappropriate « cutting » – it will be deleted;
  • the peak's level is lower than a minimal loudness, it is deleted by the valley which precedes;
  • the differential gap under a threshold loudness between consecutive peak and valley or valley and peak implies their removal.

The minimal loudness and the threshold loudness are estimated as the mean value of the absolute values of the difference between peaks and valleys. However, it is possible to set independently the minimal value of loudness in sone units to validate an event with the option --loudness-min-threshold and the value of loudness in sone unit between two events to validate discrimination with the option --loudness-diff-threshold.

To get the TextGrid file, just add the option +textgrid.

Extracting the required values

Now, with the sample and its associate textgrid, a script PRAAT allows to get values for each event as duration, loudness, centroid, bass loudness and f0.


The duration of each event is estimated from the ∆t between two valleys of the profile as described above. The duration of the last peak is estimated from the last valley in relation to the total duration of the sample.


The loudness of a sound event is expressed in sone. This is defined for each event as:

with e(f) as the excitation in phon unit.


The relative pitch is estimated – currently in the case of an inharmonic sound – by the centroid of the timbre profile as spectrum for a given sound event expressed in Hertz.
The spectral centroid represents the frequency center of gravity of a signal. The center of gravity is the average of f over the entire frequency domain, weighted by the power spectrum. The centroid is defined as follow:

with P(f) as the power spectrum.

Bass loudness

In order to isolate the salience of bass frequencies as presence, we have to realize a filtering type low pass filter and then get the loudness in the same way as seen before.

Figure 3 – Low Pass Filter showing the width (fixed to +/- 50 Hz) of the region between pass and stop according to the cut off frequency.

The cut off frequency can be set with the option --cutoff-frequency (default value is 100 Hz).


From each segment, we deduct the timbre profile by smoothing spectrum by the cepstrum method.

The signal can be viewed as a superposition of a short wave vibration with the “period” of F0 and long wave vibrations representing the course of the transmission function.
Then, smoothing means low-pass filtering the signal in such a way that the present short-wave vibration with the “period” of F0 is removed and the envelope remains. If F0 is known, a band-stop filter can be used instead of the low-pass, i.e., a filter that blocks only the undesired oscillations and their immediate vicinity, but lets everything else pass. 5)

The value of F0 in PRAAT is estimated to 500 Hz and can be set with the option --smooth-frequency.
Then, only the first peak or partial from the smoothed profile is retained as f0.

Concerning the algorithms and the parameters used in PRAAT, you can get all information you might need at this web address: http://www.fon.hum.uva.nl/praat/manual/Intro.html

Discrimination in class

The values generated by the previous analysis are destinated to be discriminated into n classes in order to get a numerical score, mainly for structural analysis and as synthesis parameters.
This is done by a recursive discrimination based on the mean of the overall values.

Then, with the option -I, --as-int, the result is a data list of positive integers with by line an event, and by column respectively the class number of duration, loudness, centroid, bass loudness and f0. Also, the result can be converted as a thrifty code6) with the option -T, --as-tc, or as a gray code with the option -G, --as-gc.
All these options take as argument a positive number which be applied as a number of recursivity for all different parameters, or a list of positive numbers as number of recursivity with a cardinal equal to 5, that is to say one number by values of events.
Each class as positive integer means a value of an attractor. This ordered alist is written on the 5 first lines of the info file.

This way to discriminate data consists to determinate a number of classes relative to attractors. These attractors are computed by the arithmetic mean of the data in terms of the set according to a coefficient of recursivity determining the number of attractors. When the argument n is an integer, the number of attractors is equal to 2n minus one and when the argument n is a float number, the number of attractors is equal to 2n.
See the description of this algorithm on figures 4 to 7.

Figure 4 – [ coefficient of recursivities : 1 ] In this example is represented by a cluster of bass loudness. The attractor A1 is the arithmetic mean of all points, creating in this way two subsets: E1L and E1H.

Figure 5 – [ coefficient of recursivities : 2 ] From the two subsets of figure 4 (E1L and E1H), we determine in the same manner two new attractors (respectively A2 and A3) generating in this way two subsets for each attractor, respectively E2L, E2H, E3L, and E3H.

Figure 6 – [ coefficient of recursivity : 3 ] Just repeat the process described above (see Figures 4 and 5) to obtain 4 new attractors (A4, A5, A6 and A7) calculated from subsets of Figure 5. This gives us a total of 7 attractors (or classes).

Figure 7 – When the argument is a float number – for example with a value of 2.5 – the number of attractors is equal to a number of attractors with a coefficient of recursivity of 3 (ceiling value of 2.5) – see figure 6 – minus the number of attractors with a coefficient of recursivity of 2 (floor value of 2.5) – see figure 5. Thus, this gives us the 4 following attractors: A4, A5, A6, and A7.

Note that the recursivity stops when all data are captured by an attractor. Then, the number of discrimination can be smaller than the value of the argument.
Also, mind that the number of discrimination is exponential, then you have to choose carefully this or these number(s).

Command line use

Install enkode

  • Create a personal bin directory (for example: /Users/.../bin)
  • Put encode in this folder.
  • Add the following to file ~/.profile :
    export PATH=/Users/.../bin:$PATH

To create man page:

  • Create a personal man directory (for example: /Users/.../man) and do:
    $ cd /Users/.../man
    $ mkdir man1
  • Then put enkode.1 in the folder /Users/.../man/man1
  • Add the following to file /etc/man.conf :
    MANPATH /Users/.../man

Using enkode

  • Default behavior:
    $ enkode test.wav 
    0.17                 1.6176831311550595   134.9666063915343    1.6176863819702374   2449.40185546875    
    0.17999999999999997  2.369882360726847    242.79367466776372   1.6336070600730854   226.0986328125      
    0.10999998           2.682344554691973    165.7555933370044    1.630174271225802    193.798828125       
    0.16000002000000002  2.409505328983807    490.23522589942763   1.621555159203991    506.0302734375      
    0.12                 2.8189283150414046   376.76391524755496   1.6099590539158921   360.68115234375     
    0.15000000000000002  2.9952563588370276   208.1729851715329    1.61750299823566     199.18212890625     
    0.15000000000000002  2.706413086859603    200.54168093278764   1.6137858225795005   209.94873046875     
    0.26                 2.6257251443919585   335.7016569817828    1.618495489201809    282.623291015625    
    0.2999999            2.4622193818243      560.9312325024231    1.617397509720567    360.68115234375     
    0.10999999999999988  2.924126996894437    418.98597422663266   1.6160059775148008   457.58056640625     
    0.1700001            3.1781678297237383   294.2101641423233    1.6194272625364172   231.48193359375     
    0.16999999999999993  3.2936782771572073   285.3289201281076    1.6197114664876744   258.3984375 
  • Using some options:
    $ enkode --as-gc=3 +textgrid test.wav
    1 1 1 0 0 1 0 0 1 0 1 0 1 0 1  
    1 1 1 0 0 1 0 1 1 1 1 1 0 0 1  
    0 1 1 0 1 0 0 0 1 1 1 1 0 0 1  
    1 1 1 0 0 1 1 0 1 0 1 0 1 1 1  
    0 1 0 0 1 0 1 1 1 0 0 1 1 1 0  
    1 1 0 1 1 0 0 0 1 0 1 0 0 0 1  
    1 1 0 0 1 0 0 0 1 0 1 1 0 0 1  
    1 0 1 0 1 1 1 1 0 0 1 0 0 1 1  
    1 0 1 0 1 1 1 0 0 0 1 0 1 1 0  
    0 1 1 1 1 0 1 1 1 0 1 1 1 1 1  
    1 1 1 1 1 1 0 1 0 0 1 0 0 0 1  
    1 1 1 1 0 1 0 1 1 0 1 0 0 1 1      
    $ cd ~/Documents/enkode/test/
    $ ls
    $ head -5 info
    0.08555549 0.10299997 0.11727274 0.1457143 0.16400005 0.27999997  
    2.2972732 2.5919032 2.7392185 2.9693668 3.138432 3.3690338 3.698465  
    201.36325 256.70795 318.20212 346.7401 385.8473 453.65323 540.83215  
    1.6098472 1.6150945 1.6182427 1.6255808 1.6327201 1.690434  
    221.12943 267.19806 313.2667 370.75565 447.48688 2449.4019 
  • Error behavior:
    $ enkode -I '(2 3.5 6)' test.wav
    ... error during process, check error in ~/Documents/enkode/error.log ...


November 23, 2018: enkode 5.0 released.

New features and improvements
• Removed options:
-n, --as-numbers
-D, --duration, --pts
-d, --dynamic
-p, --relative-pitch
-b, --brightness, --angle
-B, --bass
+info, +duration, +loudness, +brightness, +centroid, +f0, +bass
-s, --score, --rep, --midi, --merge
-M, --M2T, --ram-threshold, --max-timeout
-o, --out
• New options with as argument one (or more as a list) discrimination number(s):
-I --as-int as class number
-T --as-tc class number converted to thrifty code
-G --as-gc class number converted to gray code
• Added option: -S, --spectrum allow to get the spectrum analysis with duration on the stdout (→ see article Melody to Tone).
• The options --loudness-min-threshold and --loudness-diff-threshold are estimated by default as the mean value of the Δx between peaks and valleys.
  • Command line version 5.0 enkode

Just subscribe to this RSS feed to be notified about new features and improvements of enkode.

1) PRAAT is a free software for the analysis of speech in phonetics. It was designed, and continues to be developed, by Paul Boersma and David Weenink of the University of Amsterdam. It can be downloaded from this link: http://www.fon.hum.uva.nl/praat/.
2) SBCL means Steel Bank Common Lisp. It is a free software and a mostly-conforming implementation of the ANSI Common Lisp standard. It can be downloaded from this link: http://www.sbcl.org/.
3) The Bark scale is a psychoacoustical scale proposed by Eberhard Zwicker in 1961
4) This unit can estimate a sound twice as loud by a double value of loudness.
5) Source of definitions cited: Smoothing Spectra by the Cepstrum Method.
6) The thrifty code consists to write one information on one bit of n digits. This involves a discrimination of the order of card(An) = n. This translates for instance for a value of n = 7 digits by the following matrix:
article_1.txt · Dernière modification: 2018/12/13 19:51 (modification externe)