Outils pour utilisateurs

Outils du site


article_1

[ 2014 ] [ Last update June 20, 2018 ]

by Yann Ics

by.cmsc@gmail.com

http://experimental.mus-ics.net

Cite this article:
Yann Ics, Documentation of the executable script ​enkode, 2014
Last update: June 20, 2018
Date retrieved: 17/07/18
Permanent link: http://experimental.mus-ics.net/wiki/doku.php?id=article_1

Make sure you have the latest version of enkode.
Check the chapter download or subscribe to this RSS feed.


Presentation

enkode is an executable script bash and it can be used as command line under Unix. This executable allows to encode a sound file according modalities that will be describe here. This script was designed for undetermined pitch percussion music. Thus, these modalities concern analytic discrimination in terms of duration, dynamic and timbre (defined by relative pitch, brightness and salience of bass frequencies). Then, enkode generates data on the stdout – in this way, it is easy to manage the result through the command line – expressed as thrifty code or numbers for analysis and sound synthesis.

enkode needs two softwares. The first one is PRAAT1) devoted to the analysis sound files, and the second one is the lisp compiler SBCL2) for data processing. The Shell script enkode insures the mediation.


Encoding

For convenience, we will call sample the sound file.

Analytical discrimination3) is translated by encoding as thrifty code4). This consists to write one information on one bit of n digits. This involves a discrimination of the order of card(An) = n.

This translates to a value of n = 7 digits by the following matrix:

This is set with the following options:

  • -D, --duration: default value equal to 31;
  • -d, --dynamic: default value equal to 11;
  • -p, --relative-pitch: default value equal to 6.

And also (See the subsection Number or list to know more concerning the discrimination of the two following options):

  • -b, --brightness : default value equal to 4;
  • -B, --bass : default value equal to 7.

The thrifty code is the default result and it can be change with option -n, --as-number.

Event segmentation

Dynamic profile

Beforehand, we will use the analysis of PRAAT software that has the particularity to process each analysis with a simple script. This script will generate initially cochleagram analysis based on the perception of sound by the ear. This allows to consider the same encoding of the inner ear to the brain. This translates into a bark scale5) ranged in 24 critical bands of hearing, and the perceived sound level expressed in phone. The phonie is a weighted expression of the sound level according an equal loudness curves (see figure 1) which reflect the sensitivity of the human auditory system. The equal loudness curves have been empirically established in 1933 by Fletcher and Munson, and revised in 1956 by Robinson and Dadson.

Figure 1 – Equal loudness curves.

To complete this representation of our sound perception, the phones of analysis PRAAT are converted into sones 6) such as:

Thus, by adding the loudness of each frame (designating the timbre profile) we get a relevant dynamic profile.
To get the profile data, just add the option +profile.

Figure 2 – Dynamic profile of a sample generated by PRAAT.


This is done by the following praat script.

form Get loudness profil
	sentence File ...
        positive time_step ...
endform

	Read from file... 'file$'
	current_sound$ = selected$ ("Sound")
             td = Get total duration
             writeFileLine: "total-duration", 'td'
	select Sound 'current_sound$'
	
# Cochleagram ... 
# arg1: time_step(seconde) 
# arg2: frequency_resolution(bark) 
# arg3: window_length(seconde) 
# arg4: forward-masking_time(seconde)
	do ("To Cochleagram...", 'time_step', 0.1, 0.03, 0.03)
	do ("To Matrix")

ncolumn = do("Get number of columns")

filedelete 'defaultDirectory$'/'current_sound$'.profil
	for a from 1 to ncolumn
		selectObject: "Cochleagram 'current_sound$'"
		To Excitation (slice): a*'time_step'
                dat = do("Get loudness")
		fileappend 'defaultDirectory$'/profile 'dat' 'newline$'
		selectObject: "Excitation 'current_sound$'"
                Remove
	endfor

select all
Remove

The time step needed for PRAAT analysis can be set with the option -t, --time-step (default value is equal to 0.01 seconde).

TextGrid

The TextGrid is – in PRAAT used for annotation (segmentation and labeling) – to label certain intervals characteristics of a sequence, with boundaries in between. Currently, these intervals correspond to an event.

From the figure 2, we have to make a segmentation to discriminate each event inside the sample in order to write an appropriate TextGrid file that we will use for further analysis in praat. For that we have to select each peak and each valley from the profile and fill the following conditions in case where:

  • the sample would start on a peak – for example due to an inappropriate « cutting » – it will be deleted;
  • the peak's level is lower than a given minimal loudness, it is deleted with the valley which precedes;
  • the differential gap under a given threshold loudness between consecutive peak and valley or valley and peak implies their removal.

This is done with the following lisp functions.

;; read loudness profile file
 
(defun read-file (file)
  (mapcar #'read-from-string
	  (with-open-file (in-stream file
				     :direction :input
				     :element-type 'character)
	    (loop with length = (file-length in-stream)
	       while (< (file-position in-stream) length)
	       collect (read-line in-stream)))))
 
;; format data to a valid sequence
 
(defun mk-seq (profil tspa) 
  (let (r) (dotimes (i (length profil) r) (push (* tspa i) r))
       (mapcar #'list (reverse r) profil)))
 
;; get primitive
 
(defun minmax (seq)
  (let ((r (list (cons 0 (car seq)))))
    (loop for i in (cdr seq)
       do
	 (cond ((= (caddar r) (cadr i)) (push (cons 0 i) r))
	       ((< (caddar r) (cadr i)) (push (cons 1 i) r))
	       ((> (caddar r) (cadr i)) (push (cons -1 i) r))))
    r))
 
;; get peaks and valleys
 
(defun stream-minmax (seq)
  (let ((r (list (car seq))))
    (loop for e in (cdr seq)
       do
	 (when (not (or (equalp (car e) (caar r)) (equalp (car e) 0))) (push e r))) (append (last seq) r)))
 
(defun peaks (seq)
  (let ((r)
	(m-lst (stream-minmax (minmax seq))))
    (loop for i in m-lst do (when (= 1 (car i)) (push i r)))
    (if (= -1 (caadr m-lst))
	(cons (cdar m-lst) (mapcar #'cdr (reverse r)))
        (mapcar #'cdr (reverse r)))))
 
(defun valleys (seq)
  (let ((r)
	(m-lst (stream-minmax (minmax seq))))
    (loop for i in m-lst do (when (= -1 (car i)) (push i r)))
    (if (= 1 (caadr m-lst))
	(cons (cdar m-lst) (mapcar #'cdr (reverse r)))
        (mapcar #'cdr (reverse r)))))
 
(defun stream-v (seq)
  (mapcar #'cdr (stream-minmax (minmax seq))))
 
;; filter the streaming sequence according the conditions:
;;   1 - first item has to be a valley
;;   2 - each peak has to be superior to the threshold defined by int
;;   3 - the absolute value of the difference between two consecutive items has to be superior to the threshold defined by diff
 
(defun filter-seq (seq-v int diff)
  (let* ((seq (if (< (- (cadar seq-v) (cadadr seq-v)) 0) seq-v (cdr seq-v)))
	 (peaks-lst (peaks seq))
	 (thelast (car (last seq-v)))
	 (r (list (car seq))))
    (loop for i in (cdr seq) do
	 (if (null r) 
	     (push i r)
	     (if (member i peaks-lst :test #'equalp)
		 (if (and (>= (cadr i) int)
			  (>= (abs (- (cadr i) (cadar r))) diff))
		     (push i r) 
		     (setf r (cdr r)))
		 (if (>= (abs (- (cadr i) (cadar r))) diff)
		     (push i r)
		     (setf r (cdr r))))))
    (if (equalp thelast (car r))
	(reverse r)
	(reverse (cons thelast r)))))
 
(defun get-end-time (seq-v int)
  (if (>= (cadar (last seq-v)) int) seq-v (get-end-time (butlast seq-v) int)))
 
;; write duration file 
 
(defun x->dx (lst)
  (mapcar #'(lambda (x) (apply #'- (reverse x))) (butlast (maplist #'(lambda (x) (list (car x) (cadr x))) lst))))
 
(defun get-duration (seq total-duration)
  (let ((l (mapcar #'car (peaks seq))))
    (x->dx (nconc l (list total-duration)))))
 
(defun mk-file (lst str-dir &optional string-comment)
  (with-open-file (stream (make-pathname :directory (pathname-directory str-dir)
					 :name (pathname-name str-dir)
					 :type (pathname-type str-dir))
			  :direction :output
			  :if-exists :supersede
			  :if-does-not-exist :create)
    (when (not (null string-comment))
       (apply #'format stream (list string-comment))
       (format stream "~&"))
    (loop for i in lst
          do (format stream "~{~D ~} ~&" (if (listp i) i (list i)))))) 
 
;; write TextGrid file
 
(defun subgroup (lst)
  (let (r)
    (loop for i from 0 to (- (length lst) 2)
	 do (push (list (1+ i) (nth i lst) (nth (1+ i) lst)) r)) 
    (if (zerop (cadar (last r))) 
	(reverse r)
	(cons (list nil 0.0 (cadar (last r))) (reverse r)))))
 
(defun mk-praat-file-textgrid (sub-lst td str-dir)
  (with-open-file (stream (make-pathname :directory (pathname-directory str-dir)
					 :name (pathname-name str-dir)
					 :type (pathname-type str-dir))
			  :direction :output
			  :if-exists :supersede
			  :if-does-not-exist :create)
    (format stream "File type = \"ooTextFile\"~&Objet class = \"TextGrid\"~2&")
    (format stream "~D~&~D~&<exists>~&1~&\"IntervalTier\"~&\"cons\"~&~D~&~D~&~D~&" 0.0 td (cadar sub-lst) (caddar (reverse sub-lst)) (length sub-lst))
    (loop for i in sub-lst
       do
	 (if (eq (car i) nil)
	     (format stream "~D~&~D~&\"\"~&" (cadr i) (caddr i))
	     (format stream "~D~&~D~&\"~D\"~&" (cadr i) (caddr i) (car i))))))
 
;;;; written in the script shell
 
;;(defparameter *tspa* <tspa>) ; time step
;;(defparameter *mlt* <mlt>)   ; minimal loudness threshold
;;(defparameter *dlt* <dlt>)   ; differential loudness threshold
;;(defparameter *profil* (read-file <path_to_profile_file>))
;;(defparameter *total-duration* (car (read-file <path_to_total-duration_file>)))
;;(defparameter *seq* (filter-seq (get-end-time (stream-v (mk-seq *profil* *tspa*)) *mlt*) *mlt* *dlt*))
;;(mk-file (get-duration *seq* *total-duration*) <path_to_duration_file>)
;;(mk-praat-file-textgrid (subgroup (mapcar #'car (valleys *seq*))) *total-duration* <path_to_TextGrid_file>)

It is possible to set the mimimal value of loudness in sone units to validate an event with the option --loudness-min-threshold (default value is 2.5 sones) and to set the value of loudness in sone unit between two events to validate discrimination with the option --loudness-diff-threshold (default value is 0.5 sone).
To get the TextGrid file, just add the option +textgrid.

Duration

In the same time, a file called duration is generated. The duration of each event is estimated from the ∆x between two valleys. The duration of the last peak is estimated from the last valley in relation to the total duration of the sample.
To get the duration data, just add the option +duration.
To encode and retrieve the real time, a value of perception time smear7) is needed and can be set with the option --pts (default value is 0.05 seconde).

Extracting the required values

Now, with the sample and its associate textgrid, a script PRAAT will allow to get values as loudness, centroid, brightness, f0 and bass loudness.

Loudness

The loudness of a sound event is expressed in sone. This is calculated for each event in PRAAT as described in chapter dynamic profile.
To get the loudness data, just add the option +loudness.

Relative pitch

The relative pitch will be estimated – currently in the case of a sound inharmonic – by the centroid of the timbre profile for a given sound event, and will be expressed in hertz.

Centroid

The spectral centroid represents the frequency center of gravity of a signal. The center of gravity is the average of f over the entire frequency domain, weighted by the power spectrum. The spectral centroid is calculated in the following way; let P(f) be a power spectrum :

To get the centroid data, just add the option +centroid.

Brightness

The evaluation of the brightness of a sound is provided by the value of the spectral centroid relative to its fundamental frequency, which is – in the case of a sound inharmonic – equal to the value of the first partial.
This brightness – then estimated by fc divided by f0 – implies a sound more or less « bright » for a value greater than one, and a sound more or less « flat » for a value less than one.
Thus, we have to estimated the centroid of the sound event and its first partial called f0.

To get the brightness data, just add the option +brightness.

f0

From each segment, we will deduct the timbre profile by smoothing spectrum by the cepstrum method.

The signal can be viewed as a superposition of a short wave vibration with the “period” of F0 and long wave vibrations representing the course of the transmission function.
Then, smoothing means low-pass filtering the signal in such a way that the present short-wave vibration with the “period” of F0 is removed and the envelope remains. If F0 is known, a band-stop filter can be used instead of the low-pass, i.e., a filter that blocks only the undesired oscillations and their immediate vicinity, but lets everything else pass. 8)

The value of F0 in PRAAT is estimated to 500 Hz and can be set with the option --smooth-frequency.
Then, only the first peak or partial from the profile will be retained. This one will be called f0.

To get the f0 data, just add the option +f0.

Bass loudness

Sometime, especially for polyphonic music, we feel deep bass frequencies which is not estimated by the relative pitch, which is expressed by the centroid as seen before.
To correct this we will realize a filtering type low pass filter and then get the loudness also as seen before.

Figure 3 – Low Pass Filter showing the width (fixed to +/- 50 Hz) of the region between pass and stop according the cut off frequency.

To get the bass data, just add the option +bass.
The cut off frequency can be set with the option --cutoff-frequency (default value is 100 Hz).


The following script PRAAT allows to get data centroid, f0, brightness, loudness and bass loudness for each event defined by the TextGrid.

form Get values
    sentence soundfile ...
    sentence textgrid ...
    positive fcut ...
    positive foss ...
    positive time_step ...
endform

procedure get_first_partial 
	i = 1
        up = 0
	f0 = 0
        nbs = Get number of bins
	repeat
		val1 = Get real value in bin: 'i'
		val2 = Get real value in bin: 'i'+1
		i = i + 1
	        if 'val1' - 'val2' > 0 and 'up' = 0  
		   up = 0
                else
                   up = 1
                endif
		if  'val1' > 'val2' and 'up' = 1 and 'f0' = 0
		   f0 = Get frequency from bin number: 'i'-1
		endif
	until 'i' = 'nbs'
endproc

Read from file... 'soundfile$'
current_sound$ = selected$("Sound") 
Read from file... 'textgrid$'
current_textgrid$ = selected$("TextGrid")

filedelete 'defaultDirectory$'/centroid
filedelete 'defaultDirectory$'/f0
filedelete 'defaultDirectory$'/brightness
filedelete 'defaultDirectory$'/loudness
filedelete 'defaultDirectory$'/bass

select TextGrid 'current_textgrid$'
n = Get number of intervals: 1  
n = Get label of interval: 1, n 
    	for nn from 1 to n

	select Sound 'current_sound$'
	plus TextGrid 'current_textgrid$'
	Extract intervals where: 1, "no", "is equal to", "'nn'"
        	do ("To Spectrum...", "yes")
        	centroid = Get centre of gravity... 2
		if (centroid = undefined)
			centroid = 'fcut'/2
		endif	
      		appendFileLine("centroid", 'centroid')
		Cepstral smoothing: 'foss'
		call get_first_partial 
		if (f0 = 0)
			f0 = 'fcut'/2 
		endif
		appendFileLine("f0", 'f0')
		select Spectrum 'current_sound$'_'nn'_1
		Remove	
		brightness = 'centroid'/'f0'
		appendFileLine("brightness", 'brightness')

      	select Sound 'current_sound$'_'nn'_1

        nocheck To Cochleagram: 'time_step', 0.1, 0.03, 0.03
	nocheck To Excitation (slice): 0
	loudness = nocheck Get loudness

        if (loudness = undefined)
                loudness = 1.6
		bass = 1.6
		appendFileLine("loudness", 'loudness')
	      	appendFileLine("bass", 'bass')
	select Sound 'current_sound$'_'nn'_1
	Remove
        else
	appendFileLine("loudness", 'loudness')

	select Cochleagram 'current_sound$'_'nn'_1
	plus Excitation 'current_sound$'_'nn'_1
	Remove
	
	select Sound 'current_sound$'_'nn'_1
      	do ("Filter (stop Hann band)...", 'fcut', 0, 50)

      	select Sound 'current_sound$'_'nn'_1_band

	To Cochleagram: 'time_step', 0.1, 0.03, 0.03
	To Excitation (slice): 0
	bass = Get loudness

      	appendFileLine("bass", 'bass')

	select Spectrum 'current_sound$'_'nn'_1
      	plus Sound 'current_sound$'_'nn'_1
      	plus Sound 'current_sound$'_'nn'_1_band
        plus Cochleagram 'current_sound$'_'nn'_1_band
        plus Excitation 'current_sound$'_'nn'_1_band
      	Remove
	endif
	endfor
select all
Remove

Concerning the algorithms and the parameters used in PRAAT, you can get all information you might need at this web address: http://www.fon.hum.uva.nl/praat/manual/Intro.html

Discrimination in class

Integer

The value of the options dynamic and relative pitch means the number of digits or classes.

Duration

To encode the duration ∆x, we will use absolute scale according the value of the perception time smear pts, then the duration will be rounded:

Thus, the number of digits determines the maximum duration to encode, and the discrimination depents of the value of the perception time smear. Note that if duration is inferior to pts divided by 2, the duration will be encode as « all-zeros ». Then this will be interpreted as appogiatura or ignored. In some case, notably during the cochleagram analysis, when the duration of a segment is too small (from around 0.1 second), the segment is simply ignored. Also, the number of output digits is equal to the maximum duration. The default value is the duration ceiling or the clipping value in term of duration in order to limit the number of the output digit.

Proportional scaling

To encode dynamic and relative pitch, a proportional scaling will be used as followed: { arg min L, λ, arg max L } ⇒ { 1, [k.λ], n } with L as dataset and n as number of digits or classes. This will be done respectively in linear way and in logarithmic way. This is a relative scale according the maximum and the minimal value of the considered sequence.

  • Linear : this is the case for dynamic between maximum and minimum value of loudness.
  • Logarithmic : this is the case for the relative pitch between maximum and minimum value of centroid, according the human perception of musical intervals which is approximately logarithmic.

Number or list

Concerning the options brightness and bass level, it is possible to define a list of « borders » or a recursive discrimination based on the mean of the overall values.

List of borders

This one consists to list n borders according the ordinates of the origin. In this way, the number of digits or classes is equal to n + 1. The estimation of the borders is empirical.

In the case of the brightness, an angle theta – expressed in radian – is estimated according the centroid (see figure 5).

Figure 4 – Representation of the brightness according the centroid of every events of a given sample.

Figure 5 – Same as figure 4 with logarithmic scale. B1, B2 and B3 are the borders delimitting 4 classes with their respective ordinates of the origin b1, b2 and b3.

Number of recursivity

This way to discriminate data consists to determinate a number of classes relative to attractors. These attractors are calculated by the arithmetic mean of the data in terms of set according a coefficient of recursivity determining the number of attractors. When the argument n is an integer, the number of attractors is equal to 2n minus one and when the argument n is a float number, the number of attractors is equal to 2n.
See the description of this algorithm on figures 6 to 9.

Figure 6 – [ coefficient of recursivity : 1 ] In this example is represented a cluster of bass loudness. The attractor A1 is the arithmetic mean of all points, creating in this way two subsets: E1L and E1H.

Figure 7 – [ coefficient of recursivity : 2 ] From the two subsets of the figure 6 (E1L and E1H), we determine in the same manner two new attractors (respectively A2 and A3) generating in this way two subsets for each attractor, respectively E2L, E2H, E3L and E3H.

Figure 8 – [ coefficient of recursivity : 3 ] Just repeat the process described above (see Figures 6 and 7) to obtain 4 new attractors (A4, A5, A6 and A7) calculated from subsets of Figure 7. This gives us a total of 7 attractors (or classes).

Figure 9 – When the argument is a float number – for example with a value of 2.5 – the number of attractors is equal to a number of attrators with a coefficient of recursivity of 3 (ceiling value of 2.5) – see figure 8 – minus the number of attractors with a coefficient of recursivity of 2 (floor value of 2.5) – see figure 7. Thus, this gives us the 4 following attractors: A4, A5, A6 and A7.

To apply this algorithm to the brightness, we have to take account the angle theta in order to rotate the plane to satisfy the recursive discrimination. This is done as followed:

The angle theta can be set with the option --angle expressed in radian.

Note that the recursivity stops when all data are captured by an attractor. Then, the number of discrimination can be smaller than the value of the argument.


This is done with the following lisp functions.

;; read data file
 
(defun read-file (file)
  (mapcar #'read-from-string
	  (with-open-file (in-stream file
				     :direction :input
				     :element-type 'character)
	    (loop with length = (file-length in-stream)
	       while (< (file-position in-stream) length)
	       collect (read-line in-stream)))))
 
;; convert to thrifty code
 
(defun replace-a (new n lst)
  (mapcar #'(lambda (a) (if (= (setq n (1- n)) -1) new a)) lst))
 
(defun >thrifty-code (a digit)
  (replace-a 1 (1- a) (make-list digit :initial-element 0)))
 
;; discrimination
;; of duration
 
(defvar *ndur* nil)
 
(defun >duration (lst pts digit &optional ctc)
  (let* ((tmp (loop for i in (mapcar #'(lambda (x) (round (/ x pts))) lst) 
		 if (> i digit) 
		 collect digit 
		 else collect i))
	 (dur-digit (reduce #'max tmp)))
    (setf *ndur* dur-digit)
    (if ctc (loop for i in tmp collect (>thrifty-code i dur-digit)) tmp)))
 
;; of dynamic
 
(defun scaling (lst minout maxout minin maxin) 
  (let ((ratio (/ (- maxout minout) (- maxin minin))))
    (mapcar #'(lambda (x) (+ minout (* ratio (- x minin)))) lst)))
 
(defun >dynamic (lst digit &optional ctc)
  (let ((tmp (mapcar #'round (scaling lst 1 digit (reduce #'min lst) (reduce #'max lst)))))
    (if ctc (loop for i in tmp collect (>thrifty-code i digit)) tmp)))
 
;; of relative pitch
 
(defun range (item alst)
  (length (loop for i in alst for pos from 0 until (> i item) collect pos)))
 
(defun >rel-pitch (centroid digit &optional ctc)
  (let* ((maxi (log (reduce #'max centroid) 10))
         (mini (log (reduce #'min centroid) 10))
         (int-digit (/ (- maxi mini) digit))
         (alst (reverse (cons (1+ maxi) (let ((r (list (1- mini)))) (dotimes (i (1- digit) r) (push (+ mini (* (1+ i) int-digit)) r)))))) 
         (lst (mapcar #'(lambda (x) (log x 10)) centroid))
	 (tmp (loop for i in lst collect (range i alst))))
    (if ctc (loop for i in tmp collect (>thrifty-code i digit)) tmp)))
 
;; of brightness and bass according
;; a list of borders
 
(defun mat-trans (lst)
  (apply #'mapcar #'list lst))
 
(defun mk-boundary-class (dat oob-lst &optional ctc (theta -0.785398))
  (let ((oblst (sort (copy-tree oob-lst) #'<)))
    (if (listp (car dat))
	(let ((tmp (loop for i in dat collect
			(cond ((< (log (cadr i) 10) (+ (* -1 (tan theta) (car oblst)) (log (car i) 10))) 1)
			      ((>= (log (cadr i) 10) (+ (* -1 (tan theta) (car (reverse oblst))) (log (car i) 10))) (1+ (length oob-lst)))
			      (t (let ((aobl (loop for k from 0 to (- (length oblst) 2) collect (list (nth k oblst) (nth (1+ k) oblst)))))
				   (car (loop for j in aobl for pos from 1
					   when (and (>= (log (cadr i) 10) (+ (* -1 (tan theta) (car j)) (log (car i) 10))) (< (log (cadr i) 10) (+ (* -1 (tan theta) (cadr j)) (log (car i) 10))))
					   collect pos))))))))
	  (if ctc (loop for i in tmp collect (>thrifty-code i (1+ (length oob-lst)))) tmp))
	(let ((tmp (loop for i in dat collect
			(cond ((< i (car oblst)) 1)
			      ((>= i (car (last oblst))) (1+ (length oob-lst)))
			      (t (let ((aobl (loop for k from 0 to (- (length oblst) 2) collect (list (nth k oblst) (nth (1+ k) oblst)))))
				   (car (loop for j in aobl for pos from 1
					   when (and (>= i (car j)) (< i (cadr j)))
					   collect pos))))))))
	  (if ctc (loop for i in tmp collect (>thrifty-code i (1+ (length oob-lst)))) tmp)))))
 
;; a recursive discrimination
 
(defun nearest (a alst)
  (let ((al (sort (copy-tree alst) #'>)))
    (caar (sort (loop for i in al for pos from 1 collect (list pos (abs (- i a)) i)) #'< :key #'cadr))))
 
(defun flat-once (lst)
  (let (r) (loop for i in lst do
		(if (listp i) (dolist (e i r) (push e r)) (push nil r)))
       (reverse r)))
 
(defun mean (lst)
  (if (listp (car lst))
      (list (mean (mapcar #'car lst)) (mean (mapcar #'cadr lst)))
      (/ (reduce #'+ lst) (length lst))))
 
(defun split-mean-lst (lst)
  (if (= 1 (length (remove-duplicates lst)))
      (list lst (list lst))
      (let* ((m (mean lst))
	     (subsup (loop for i in lst when (> i m) collect i)) 
	     (subinf (loop for i in lst when (<= i m) collect i))
	     (msup (when subsup (mean subsup)))
	     (minf (when subinf (mean subinf))))
	(if (or (null msup) (null minf))
	    (list lst (list lst))
	    (list (list minf m msup) (list subinf subsup))))))
 
(defun mk-attract (lst n &optional r)
  (if (or (equalp r (split-mean-lst lst)) (>= (length (car r)) (1- (expt 2 n)))) (sort (car r) #'>)
      (if (null r) (mk-attract lst n (split-mean-lst lst))
	  (let ((ll (loop for i in (cadr r) collect (split-mean-lst i))))
	    (mk-attract lst n (list (remove-duplicates (union (flat-once (mapcar #'car ll)) (car r))) (flat-once (mapcar #'cadr ll))))))))
 
(defun mat-rotate (xy &optional rad)
  (let ((th (if rad rad (/ pi -4))))
    (list (- (* (log (car xy) 10) (cos th)) (* (log (cadr xy) 10) (sin th))) (+ (* (log (car xy) 10) (sin th)) (* (log (cadr xy) 10) (cos th))))))
 
(defun mk-attract-class (data coef &optional ctc (rad -0.785398))
  (let* ((dat (if (listp (car data)) (mapcar #'cadr (loop for i in data collect (mat-rotate i (if rad rad (/ pi -4))))) data))
	 (ma1 (mk-attract dat (if (floatp coef) (ceiling coef) coef)))
	 (ma2 (if (floatp coef) (loop for pos from 0 to (1- (length ma1)) when (evenp pos) collect (nth pos ma1)) ma1))
	 (tmp (loop for i in dat collect (nearest i ma2))))
    (if ctc (loop for i in tmp collect (>thrifty-code i (length ma2))) tmp)))
 
;; write file
 
(defun flatten (lst)
  (if (endp lst)
      lst
      (if (atom (car lst))
	  (append (list (car lst)) (flatten (cdr lst)))
	  (append (flatten (car lst)) (flatten (cdr lst))))))
 
(defun mk-file (str-dir ctc &rest dat) 
  (let ((lst (mat-trans dat)))
    (with-open-file (stream (make-pathname :directory (pathname-directory str-dir)
					   :name (pathname-name str-dir))
			    :direction :output
			    :if-exists :supersede
			    :if-does-not-exist :create)
      (loop for i in lst
	     do 
	       (if ctc
		   (format stream "~{~D ~} ~&" (flatten i))
		   (format stream "~{~D ~} ~&" i))))))
 
;;;; written in the script shell under optional conditions
 
;;(defparameter *ctc* <ctc>)     ; thrifty code  [t/nil]
;;(mk-file <path_to_out_file> *ctc*
;; 	  (>duration (read-file <path_to_duration>) <$pts> <$dur> *ctc*) 
;;   	  (>dynamic (read-file <path_to_loudness>) <$dyn> *ctc*)
;;	  (>rel-pitch (read-file <path_to_centroid>) <$relp> *ctc*)
;;	  (if (listp <$brght>)
;;	      (mk-boundary-class (mat-trans (list (read-file <path_to_centroid>) (read-file <path_to_brightness>))) <$brght> *ctc* <$rad>)
;;	      (mk-attract-class (mat-trans (list (read-file <path_to_centroid>) (read-file <path_to_brightness>))) <$brght> *ctc* <$rad>))
;;	  (if (listp <$bass>)
;;	      (mk-boundary-class (read-file <path_to_bass>) <$bass> *ctc*)
;;	      (mk-attract-class (read-file <path_to_bass>) <$bass> *ctc*)))

Command line use

Install enkode

  • Create a personal bin directory (for example: /Users/.../bin)
  • Put encode in this folder.
  • Add the following to file ~/.profile :
    export PATH=/Users/.../bin:$PATH

To create man page:

  • Create a personal man directory (for example: /Users/.../man) and do:
    $ cd /Users/.../man
    $ mkdir man1
  • Then put enkode.1 in the folder /Users/.../man/man1
  • Add the following to file /etc/man.conf :
    MANPATH /Users/.../man

Using enkode

  • Default behavior:
    $ enkode test.wav 
    0 0 0 1 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 1 0 0 0 1 0 0 0 0 0 0 1
    0 0 1 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 1 0 0 0 0 0 0 0 1 0  
    0 0 1 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 1 0 1 0 0 0 0 0 0 0 0 1  
    1 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 1 0 0 0 0 1 0 0 0 1 0 0 0 0 0 0  
    0 1 0 0 0 0 0 0 0 1 0 0 0 0 0 0 1 0 0 0 0 1 0 0 0 0 1 0 0 0 0 0  
    0 0 0 1 0 0 0 0 0 0 1 0 0 0 0 0 0 0 1 0 0 0 0 0 1 0 1 0 0 0 0 0  
    0 0 1 0 0 0 0 0 0 0 0 1 0 0 0 1 0 0 0 0 0 1 0 0 0 1 0 0 0 0 0 0  
    0 0 0 1 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 1 0 0 0 0 0 1 0  
    0 0 1 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 1 1 0 0 0 0 0 0 0 0 0 1  
    0 0 0 1 0 0 0 0 0 0 1 0 0 0 0 0 1 0 0 0 0 1 0 0 0 0 1 0 0 0 0 0  
    0 0 1 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 1 1 0 0 0 0 0 0 0 1 0 0  
    0 0 1 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 1 0 1 0 0 0 0 0 0 1 0 0  
    0 0 0 1 0 0 0 0 0 0 0 1 0 0 0 1 0 0 0 0 0 0 0 0 1 1 0 0 0 0 0 0  
    ...
  • Using some options:
    $ enkode --pts=0.02 --as-numbers +textgrid +brightness +info test.wav
    10 11 6 4 7
    6 2 5 2 6  
    6 5 6 2 7  
    3 11 2 1 1  
    6 6 2 1 2  
    9 7 4 4 2  
    7 8 1 1 1  
    9 5 5 4 6  
    7 5 6 1 7  
    9 7 2 1 2  
    8 4 6 1 5  
    6 5 6 2 5  
    9 8 1 4 1  
    ...  
    $ cd ~/Documents/enkode/test/
    $ ls
    info
    brightness	
    test.TextGrid
  • Error behavior:
    $ enkode --bass=0 test.wav
    ... error during process, check error in ~/Documents/enkode/error.log ...

Download

June 20, 2018: enkode 4.5 released.

• Fix f0 procedure in praat script for profile without peak.
  • Command line version 4.5 enkode

Version history

May 16, 2018: enkode 4.4

• Fix agreement between number of segments with the label of the last segment from TextGrid in praat script.

March 04, 2018: enkode 4.3

• From now on, it is possible to do the analysis for a persistent duration. In this case, the value of the duration has to be a floating point number, expressing the duration of each segment, equal to this value in second unit.
Note that if required, the last segment – only on the result (that means not for computation) – is extended until the value of the duration.
• The intensity analysis was replaced with loudness analysis – this is now even more relevant according our own aural perception. This relates to the whole frequency range and the filtered frequency range for the bass of each segment. Therefore the option +intensity was replaced with +loudness and the unit is now the sone instead of the dB SPL. Mind that the limit of the aural perception is around 0.1 second, and below this value the relevance of the analysis is deteriorated.
• Fix add spectrum data file with option +all.
• Fix script values.praat concerning the cochleagram analysis.
• Fix scripts ofp.praat and values.praat for long sequence.
• Fix the functions mk-attract and split-mean-lst concerning the recursivity with disparate data in encode.lisp.

October 07, 2017: enkode 4.2

• Fix compatibility with Linux.
• Remove 'rapport process' – useless.
The two following new options are only convertion tools (for more info see the article Convert a midi file to a SuperCollider score).
• Add new option --midi allowing to transpose midi file to score + its histogram.
• Add new option --merge allowing to merge all tracks of a given midi file.

January 27, 2017: enkode 4.1

• Fix energy profile for long sequence with the possibility of increasing the dynamic space size of the RAM with the option --ram-threshold initially defined by (sb-ext:dynamic-space-size), and limiting the process time with the option --max-timeout to 480 seconds by default using the command line timeout. In case of one of the latter is reached, the result is set to 0.0.
• Fix long argument – add ${args} before the associated short argument.
• Fix zip name as its own timestamp.
• Add 'rapport process' to info.

December 30, 2016: enkode 4.0

• From now on the first line of the stdout (standard output) is reserved to the length of the respective discriminations (to remove it pipe with sed '1d' or to get it pipe with head -n 1).
• Fix the function mk-attract (about the number of recursivity and disparate data) plus error message info.
• Remove arg -i --info
• Add +info
• Add +spectrum
• Add +all
• New stdout (standard output) options: --score and --M2T (see man page for more info and the article Writing score for SuperCollider).
• Also, the location of the info files is henceforth in the Documents folder where an archive zip of the previous call of enkode is created for history.

April 18, 2016: enkode 3.0

• From now on the data result is displayed on the stdout (standard output). Thus, the option --lisp and --none has been removed.
• A new option --smooth-frequency allows to set the bandwitdh frequency to apply the cepstral smoothing.
• A info file is written with the option -i, --info (this option replaces the previous -o, -out) with the usual extra data files (see man enkode) into the INFO folder. The info file displays some new information such as time, date and duration of the script to run.
[ Be careful, the INFO folder are updated each time the command enkode runs. So all the previous info will be lost. ]

July 20, 2015: enkode 2.1

• An info file is written to the output directory.
• The result directory is created in the working directory.
• The name of the result directory path indicates the name of the original file without extension within which the last folder indicates the discrimination numbers, respectively duration, dynamic, relative pitch, brightness and bass level.
[ Be careful when changing some parameters (eg. --pts, --loudness-min-thres, --loudness-diff-thres, --cutoff-frequency, etc…), the files (if they exist) inside the last folder are updated that means erased or overwritten. ]

December 12, 2014: enkode 2.0

• Rationalize code.
• Rationalize arguments.
• Man page.

NEW FEATURE(S) IN VERSION 04.14.3 (enkode 1.2)

• Version in error.log;
• Options -b et -l integrate list (borders) and number (coeff).
• Option -g add bass loudness for gnuplot.
• Option -a angle of rotation plane for brightness conversion.
• Add for float recursivity 2^(floor coeff).
• Modification and adaptation of the code in init2.lisp.

NEW FEATURE(S) IN VERSION 02.14.2 (enkode 1.1)

• Reformate information help without less.
• Remove last element from *CENTROIDBRIGHTNESS* and *BINT* in init2.lisp.
• Remove disbass and redefine bass to recursive coefficient; involving in init2.lisp: remove threscut and reg; add flat-once, mean, split-mean-lst and mk-attract, redefine encode-bass with bint and coef as args.
• Change date (deprecated %N) for script timing to: perl -MTime::HiRes -e 'print Time::HiRes::time(),“\n”'.
• Add formate function to remove if needed some last line from binary file.

NEW FEATURE(S) IN VERSION 12.13.1 (enkode 1.0)

First version tested.
1) PRAAT is a free software for the analysis of speech in phonetics. It was designed, and continues to be developed, by Paul Boersma and David Weenink of the University of Amsterdam. It can be downloaded from this link: http://www.fon.hum.uva.nl/praat/.
2) SBCL means Steel Bank Common Lisp. It is a free software and a mostly-conforming implementation of the ANSI Common Lisp standard. It can be downloaded from this link: http://www.sbcl.org/.
3) Analytical discrimination consists to analysis the signal in a specific way in order to get the expected data, which will be discriminate to n class (see section Discrimination in class).
4) The choice of this kind of encoding allows to standardize data as well in writing than in reading.
5) The Bark scale is a psychoacoustical scale proposed by Eberhard Zwicker in 1961
6) This unit can estimate a sound twice as loud by a double value of loudness.
7) Area of uncertainty where two sounds are no longer perceived as separate, estimated at about 1/20th of second (see Alain Daniélou, Sémantique musicale, Hermann, Paris 1978).
8) Source of definitions cited: Smoothing Spectra by the Cepstrum Method.
article_1.txt · Dernière modification: 2018/07/02 09:07 (modification externe)