article_5

[ 2018 ]

by *Yann Ics*

http://experimental.mus-ics.net

:Cite this articleYann Ics, Analytical modeling, 2018Date retrieved: 14/11/18 Permanent link: http://experimental.mus-ics.net/wiki/doku.php?id=article_5 |

This is one proposition for automatic analysis as modeling. Some steps are inevitably empirical, and the modeling is somehow more about optimization of computation as resolutions of combinatorial issues.

To illustrate this analytical modeling, this will be applied on the extract of an interpretation of a *padjembel* Gwoka rhythm.

This work falls within the margin of the project *Neuromuse3* and it has been inspired by the research report *Morphologie* developed by Frédéric Voisin and Jacopo Baboni Schilingi en 1999.

The aim of the encoding is to normalize data from a sound file using the analytic tools of the software Praat managed by the command line *enkode*, defined as a multidimensional array of 5 dimensions. This array contents the duration, the loudness, the relative pitch as the centroid, the brightness and the salience of bass frequencies as filtered loudness.

$ enkode -n +textgrid padj.mp3 > padj.dat

*Illustration 1*: Waveform of the 5 first seconds of the sample with its associated segmentation according to the *textgrid* generated by the previous command *enkode*. [ For the record, this illustration has been generated with the script bash in the annex. ]

The hierarchical clustering of the previous multidimensional data is built inside the artificial neural network *neuromuse3* context – called CAH – according to the position of the neurons
as events inside a 5-dimensional *Euclidean* space.

CL-USER> (require 'N3) CL-USER> (in-package :N3) N3> (create-mlt 'padj 10 10 :carte (list 'data-map (read-file "padj.dat"))) ;; note that the number of neurons is ignored because it is set with the (length (remove-duplicates (cdr data))) ;; also the input number is ignored because the computation is done according to the coordinates of the neurons N3> (dendrogram padj 3 :with-data t) ;; the second argument is the aggregation type (Ward’s method in this case) -196.01764+

The function `dendrogram`

generates a data file with the number of nodes according to the trimming distance associated with the minimum distance of the parent node and the sum of the intra-class inertia of the children nodes of the parent node.

Now, the idea is to get the optimum number of classes according to the distance from the parent node and the intra-class inertia. There are no rules, therefore the choice is empirical and estimated with the degree of accuracy analysis required. All it needs to be know is that the distance has to be maximum and the inertia minimum.

However, the optimization can be done according to the number of the current*fanaux* in order to be more accurate by increasing their number.

*Illustration 2*: The curve is the sum of the intra-class inertia by trimming. The lines are the peaks of the curve of minimum distance from the parent node (the number is the number of classes at this point) and the red number is the number of classes for the minimum value of the intra-class inertia. Note that in this graph, the inertia curve is scaled with the impulse segments in order to fit on the same diagram.

In this case, this is a segmentation of 5 classes – referred to as A, B, C, D, and E – which is retained as result.

N3> (alpha-seq padj -196.01764+ 5) (E E E B B E E C C B E A E B B A E C C B E B D D D D D A D A D D A C B E B D A D D D A D A D D B B C B E B A D B A B C E D D D D A B B B B D B C C B A E A B E A E C C B A C B B A E C B B B A D A B B A B C C D A B E B B A B C C E E A A B D B A A C C B A C B B A A C B C A D A B D D E C E D)

The contrastive analysis consists to segment an array of symbols according to a marker defined by the number of occurrences of this marker as a smallest sub-structure, or in other words according to the number of repetitions and for a short sequence which is more focused by the brain as a relevant marker to memorize. This is done recursively until all symbols are different. The side effect of this algorithm is, in case of strict equality between different occurrences of sequences, the choice is done according to the sorting algorithm of the lisp implementation, which we retain the first item. The function `structure-s`

takes as argument the symbolic sequence previously computed.

Also, when the concatenation affects only 2 adjacent items, the algorithm merges all local repetitions with the possible head of the next item – that is to say only when one item is equal to the head of the following item. For instance the sequence AB AB ABC becomes ABABABC.

N3> (structure-s (list (alpha-seq padj -196.01764+ 5)) :result :last) (EEEB BEEC CBEAEB BAEC CBEBDD DDDADADDA CBEBDADDDA DADDB BCBEBADBA BCED DDDABBBBD BCCBA EA BEAEC CBA CBBAECB BBADA BBABC CDA BEBBABC CEEA ABD BAA CCBA CBBAA CBC ADA BDDE CED)

The paradigmatic analysis allows to observe typological variations within an object or a corpus.

There are no rules either for the paradigmatic discrimination, but an analysis by hierarchical clustering with the single linkage or the complete linkage as aggregation can offer some guidelines.

With this approach, we can accurate the proximity between 'sub-structures' according to the current musical context. The main idea is to use the Levenshtein distance algorithm with some preliminary algorithms which are respectively defined by the distance according to the local repetition, and the distance between two bijective sequences as patterns *a* and *b* according to the decomposition into permutation cycle^{1)} – called σ – defined by *c* = | *Ο*_{σ}(*x*) |, such as δ(*a*,*b*) = | *lcm*(*c*_{a}) – *lcm*(*c*_{b}) |.

Let A and B be sub-sequences, the distance between A and B is computed as follow :

- Remove common local duplicate(s) such as A → A' and B → B'

Then the*'repetition distance'*is*d*_{1}= | A \ A' | + | B \ B' | - Remove pattern such as A'' = A' \ C and B'' = B' \ C with C = A' ∩ B'

Then the*'transposition distance'*is applied to the pattern C as*d*_{2} - Apply
*'Levenshtein distance'*between A'' and B'' as*d*_{3}

Then the total distance is *d*_{1} × *w*_{1} + *d*_{2} × *w*_{2} + *d*_{3} × *w*_{3} with *w _{i}* as weight respectively 1/2, 1/2 and 1 by default.

The setting – that is to say the aggregation type (single or complete linkage) and the weight of each algorithm applied to estimate proximity – remains empirical but the investigation field is significantly reduced and rather intuitive to integrate this modeling into an automatic process.

*Illustration 3*: Single-linkage clustering.

*Illustration 4*: Complete-linkage clustering.

The two previous illustrations were generated with the online application iTOL – with the display mode set to the unrooted tree –, from their respective Newick files computed from the contrastive analysis as a dendrogram.

N3> (dendrogram '(EEEB BEEC CBEAEB BAEC CBEBDD DDDADADDA CBEBDADDDA DADDB BCBEBADBA BCED DDDABBBBD BCCBA EA BEAEC CBA CBBAECB BBADA BBABC CDA BEBBABC CEEA ABD BAA CCBA CBBAA CBC ADA BDDE CED) 1|2)

Both the single and the complete linkages offer the possibility to define five paradigmatic fields – more apparent on the second tree. In any case, this is the teleologic object which determines the setting, both for the number of discrimination and the way the discrimination is done in term of distance.

Defined as a set of relations that maintain elements between them allowing the constitution of a coherent system. Thus, form and structure are two interrelated notions that determine the immanent or transcendent view of the system.

In this current work, some characteristics involve for a morphogenesis point of view as a dynamic system. Morphogenesis is an 'in time' analytic system for observing formal variations according to identified structural processes. Indeed, this kind of traditional musical event is not a piece of music with a determined form, but rather a music that evolves 'in time' according to the feelings and some codification^{2)} in term of proclivity involving each participant.

So, the current systemic analysis will focus on the relationship between adjacent sub-structure defined by the contrastive analysis as derivative according to the distance defined for the paradigmatic analysis, and the process of successivity in terms of probability of the elements constituting the sub-structures.

According to the distance between two adjacent sequences, the derivative clustering consists to segment the whole sequence into parts 'in time' delimited by the mean distance.

*Illustration 5*: The first letter of each sequence marks the distance level – on the y axis – to the next sequence. The horizontal line is the mean distance involved all combination of two different sequences.

;; Thus, the initial sequence is segmented as follow : N3> (part-s '(EEEB BEEC CBEAEB BAEC CBEBDD DDDADADDA CBEBDADDDA DADDB BCBEBADBA BCED DDDABBBBD BCCBA EA BEAEC CBA CBBAECB BBADA BBABC CDA BEBBABC CEEA ABD BAA CCBA CBBAA CBC ADA BDDE CED)) ((EEEB BEEC CBEAEB BAEC) (CBEBDD DDDADADDA CBEBDADDDA DADDB BCBEBADBA BCED DDDABBBBD) (BCCBA EA BEAEC CBA) (CBBAECB) (BBADA BBABC) (CDA BEBBABC) (CEEA ABD BAA CCBA CBBAA CBC ADA BDDE))

Note that the last sub-sequence CED is omitted because there is no distance defined from this sequence, but this sequence is of course implicitly associated with the last sequence BDDE as distance.

In this analysis, the approach consists to evaluate the probability of an event occurs according to *n* previous events.

For instance, the probability of events succeeding the sub-sequence BE with the sample of this article analyzed in chapter Symbolisation is :

N3> (next-event-probability '(B E) (alpha-seq padj -196.01764+ 5)) B => 57.143 % A => 28.571 % E => 14.286 %

Note that the sum of the probabilities is equal to 100 % – or very close according to some rounding error caused by computer systems^{3)}.

During this article, we proceeded to a 'deconstruction' of a sample as a sound file – according to the discriminative analysis of *enkode* as events, and more over as symbols and as sub-structures and their relationship – with a view to or in the perspective of a 'reconstruction' according to a formal grammar defined as a *musical L-system*, or a *Markov chain*, which could be weighted as a Developmental process as seen previously in the chapter Systemic analysis.

Here is one way to experiment with a *Markov* chain according to the transition probability matrix of the function `next-event-probability`

:

- let
*S*be the initial sequence according to the alphabet*e =*{*a, b, c, …*} such as*e*⊂*S* - let
*P*(*e*) be the probability of an occurrence*e*in_{n}*S* - let
*w*be the sub-sequence as the previous state and set with an initial element such as*w = P*(*e*) or*w = e*then the next event is_{n}*P*(*e*|*w*) - if
*P*(*e*|*w*) does not exist or if*P*(*e*|_{n}*w*) = 1^{*}with |*w*| > 1 then minimize*i*∈ ]0, |*w*[*i*]| = 1] such as ∃*i*∈ ℕ:*P*(*e*|*w*[*i*]) ∈*e*knowing that*w*[*i*] is the position of and from the beginning of the reduced*w*as a tail sub-sequence, or in other words as a suffix.

^{*}In this case and from this event, we have to solve the*max order problem*^{4)}. Indeed, the sequence generated will strictly be a copy of the initial sequence and does not allow any variation of the latest. This behavior is obviously not interesting in this context.

Also, even if this work is done on a portion of piece of music or on the whole musical work, this analytic process is done *a posteriori* and more about structural relationship – and according to the principle of immanence –, in other words 'out time', a bit like a background process in an artificial intelligence context – especially in reference to the brain activity during sleep for instance …

The '*a posteriori*' structural analysis is naturally different from the idea that could be done in real time. During the temporal flux, several factors interfere:

- The marker delimiting two sub-sequences, this one can evolve over time and be different for each discrimination (the incoming information can change or consolidate the probabilities of the acquired);
- The concentration and the type of focusing – which can be versatile – during listening;
- and the passive of the subject, notably about his/her musical education and his/her own experience of the sound phenomenon.

It probably exists some more factors, but this is not the point here to list exhaustively them. Instead, the aim is to illustrate the elusive character of an 'objective' analysis. In practice, this consists to minimize these factors in order to reach a formalism proposing a convincing modeling. This can be done with repeated listenings of the work allowing a holistic analysis – at least for the previous two first factors –, or with an algorithmic analysis or a synthesis of different types of analysis but *a posteriori*. In both cases, this takes time; and the result remains dependent of the axioms (or prerequisites) – i.e. the formalization step – and the teleological object – i.e. the modeling step.

#!/bin/bash # $1 = soundfile # $2 = textgrid # $3 = duration name=`basename "$1" | cut -d. -f1` dur=$3 sox $1 1.wav trim 0 $dur dsr=`soxi -s 1.wav` # convert sound file to data text nc=`soxi -c 1.wav` if [ "$nc" -eq 1 ] then sf=`sox 1.wav 1.dat` elif [ "$nc" -eq 2 ] then sox 1.wav 2.wav remix 1,2 sf=`sox 2.wav 1.dat` else echo "Accept only mono (1 channel) or stereo (2 channels) sound file." fi > 2.dat # the number of bin sample divided by n allows to reduce the number of data n=10 tail -n +3 1.dat > 3.dat value=0 while read line do if [ $(( $value % $n )) -eq 0 ] ; then echo -e "$line" | xargs >> 2.dat fi let value=value+1 done < 3.dat # write timing segmentation l=`cat $2 |wc -l` ll=`expr $l - 11` tail -n $ll $2 > 4.dat awk 'NR == 1 || NR % 3 == 0' 4.dat > 5.dat while read p; do if [ 1 -eq "$(echo "${p} < ${dur}" | bc)" ] then echo `awk "BEGIN {printf \"%.3f\n\", $p}"` >> 4.dat fi done < 5.dat # write gnuplot file echo "set terminal png size 1200,300" > 1.pl echo "set output '$name.png'" >> 1.pl echo "unset border;unset xtics;unset ytics" >> 1.pl echo "plot \"4.dat\" every ::0::$dsr using 1:(\$1 <=$dur ? 2 : 0) title '' with impulses lc rgb \"#DDDDDD\", \"2.dat\" every ::0::$dsr using 1:(\$2+1) with lines lc rgbcolor \"#a0a0b0\" title \"\"" >> 1.pl gnuplot 1.pl rm 1.pl 1.dat 2.dat 3.dat 4.dat 5.dat 1.wav 2.wav

article_5.txt · Dernière modification: 2018/10/30 10:38 (modification externe)