article_6

[ 2018 ]

by *Yann Ics*

http://experimental.mus-ics.net

:Cite this articleYann Ics, Analytical modeling, 2018Date retrieved: 20/08/18 Permanent link: http://experimental.mus-ics.net/wiki/doku.php?id=article_6 |

This is one proposition for automatic analysis as modeling. Some steps are inevitably empirical, and the modeling is somehow more about optimisation of computation as resolutions of combinatorial issues.

To illustrate this analytical modeling, this will be applied on the extract of an interpretation of a *padjembel* Gwoka rhythm.

This work falls within the margin of the project *Neuromuse3* and it has been inspired by the research report *Morphologie* developped by Frédéric Voisin and Jacopo Baboni Schilingi en 1999.

The aim of the encoding is to normalise data from a sound file using the analytic tools of the software Praat managed by the command line *enkode*, defined as a multidimensional array of 5 dimensions. This array contents the duration, the loudness, the relative pitch as centroid, the brightness and the salience of bass frequencies as filtered loudness.

$ enkode -n +textgrid padj.mp3 > padj.dat

*Illustration 1*: Waveform of the 5 first seconds of the sample with its associated segmentation according the *textgrid* generated by the previous command *enkode*. [ For the record, this illustration has been generated with the script bash in annex. ]

The hierarchical clustering of the previous multidimentional data is built inside the artificial neural network *neuromuse3* context – called CAH – according the position of the neurons
as events inside a 5-dimensional *Euclidean* space.

CL-USER> (require 'N3) CL-USER> (in-package :N3) N3> (create-mlt 'padj 10 10 :carte (list 'data-map (read-file "padj.dat"))) ;; note that the number of neurons is ignored because it is set with the (length (remove-duplicates (cdr data))) ;; also the input number is ignored because the computation is done according the coordinates of the neurons N3> (dendrogram padj 3 :with-data t) ;; the second argument is the aggregation type (ward’s method in this case) -196.01764+

The function `dendrogram`

generates a data file with the number of nodes according the trimming distance associated with the minimum distance of the parent node and the sum of the intra-class inertia of the children nodes of the parent node.

Now, the idea is to get the optimum number of classes according the distance from the parent node and the intra-class inertia. There is no rules, therefore the choice is empirical and estimated with the degree of accuracy analysis required. All it needs to be know is that the distance has to be maximum and the inertia minimum.

Note that in the following graph, the inertia curve is scaled with the impulse segments in order to fit on the same graph.

*Illustration 2*: The curve is the sum of the intra-class inertia by triming. The lines are the peaks of the curve of minimum distance from the parent node (the number is the number of classes at this point) and the red number is the number of classes for the minimum value of the intra-class inertia.

In this case, this is a segmentation of 5 classes – referred to as A, B, C, D and E – which is retained as result.

N3> (alpha-seq padj -196.01764+ 5) (E E E B B E E C C B E A E B B A E C C B E B D D D D D A D A D D A C B E B D A D D D A D A D D B B C B E B A D B A B C E D D D D A B B B B D B C C B A E A B E A E C C B A C B B A E C B B B A D A B B A B C C D A B E B B A B C C E E A A B D B A A C C B A C B B A A C B C A D A B D D E C E D)

The contrastive analysis consists to segment an array of symbol according to a marker defined by the number of occurrences of this marker as a smallest sub-structure, or in other words according to the number of repetitions and for a short sequence which is more focused by the brain as a relevant marker to memorize. This is done recursively until all symbols are different. The side effect of this algorithm is, in case of strict equality between different occurrences of sequences, the choice is done according the sorting algorithm of the lisp implementation, which we retain the first item. The function `structure-s`

takes as argument the symbolic sequence previously computed.

Also, when the concatenation affects only 2 adjacent items, the algorithm merges all local repetitions with the possibly head of the next item – that is to say only when one item is equal to the head of the following item. For instance the sequence AB AB ABC becomes ABABABC.

N3> (structure-s (list (alpha-seq padj -196.01764+ 5)) :result :last) (EEEB BEEC CBEAEB BAEC CBEBDD DDDADADDA CBEBDADDDA DADDB BCBEBADBA BCED DDDABBBBD BCCBA EA BEAEC CBA CBBAECB BBADA BBABC CDA BEBBABC CEEA ABD BAA CCBA CBBAA CBC ADA BDDE CED)

The paradigmatic analysis allows to observe typological variations within an object or a corpus.

There is no rules either for the paradigmatic discrimination, but an analysis by hierarchical clustering with the single linkage or the complete linkage as aggregation can offer some guidelines.

With this approach, we can accurate the proximity between 'sub-structures' according to the current musical context. The main idea is to use the levenshtein distance algorithm with some preliminary algorithms which are respectively defined by the distance according to the local repetition, and the distance between two bijective sequences as patterns *a* and *b* according to the decomposition into permutation cycle^{1)} – called σ – defined by *c* = | *Ο*_{σ}(*x*) |, such as δ(*a*,*b*) = | *lcm*(*c*_{a}) – *lcm*(*c*_{b}) |.

Let A and B be sub-sequences, the distance between A and B is computed as follow :

- Remove common local duplicate(s) such as A → A' and B → B'

Then the*'repetition distance'*is*d*_{1}= | A \ A' | + | B \ B' | - Remove pattern such as A'' = A' \ C and B'' = B' \ C with C = A' ∩ B'

Then the*'transposition distance'*is applied to the pattern C as*d*_{2} - Apply
*'levenshtein distance'*between A'' and B'' as*d*_{3}

Then the total distance is *d*_{1} × *w*_{1} + *d*_{2} × *w*_{2} + *d*_{3} × *w*_{3} with *w _{i}* as weight respectively 1/2, 1/2 and 1 by default.

The setting – that is to say the aggregation type (single or complete linkage) and the weight of each algorithm applied to estimate proximity – remains empirical but the investigation field is significantly reduced and rather intuitive to integrate this modelling into any automatic process.

*Illustration 3*: Single-linkage clustering.

*Illustration 4*: Complete-linkage clustering.

The two previous illustrations are generated from the newick files into the online application iTOL with the display mode set to unrooted tree.

The two last newick files are computed from the contrastive analysis as dendrogram.

N3> (dendrogram '(EEEB BEEC CBEAEB BAEC CBEBDD DDDADADDA CBEBDADDDA DADDB BCBEBADBA BCED DDDABBBBD BCCBA EA BEAEC CBA CBBAECB BBADA BBABC CDA BEBBABC CEEA ABD BAA CCBA CBBAA CBC ADA BDDE CED) 1|2)

Both the single and the complete linkages offer the possibility to define five paradigmatic fields – more apparent on the second tree. In any case, this is the teleologic object which determines the setting, both for the number of discrimination and the way the discrimination is done in term of distance.

Defined as a set of relations that maintain elements between them allowing the constitution of a coherent system. Thus, form and structure are two interrelated notions that determine the immanent or transcendent view of the system.

In this current work, some characteristics involves for a morphogenesis point of view as dynamic system. Morphogenesis is an 'in time' analytic system for observing formal variations according to identified structural processes. Indeed, this kind of traditional musical event is not a piece of music with a determinated form, but rather a music that evolves 'in time' according the feelings and some codification^{2)} in term of proclivity involving each participant.

So, the current systemic analysis will focus on the relationship between adjacent sub-structure defined by the contrastive analysis as derivative according the distance defined for the paradigmatic analysis, and the process of successivity in terms of probability of the elements constituting the sub-structures.

According the distance between two adjacent sequences, the derivative clustering consists to segment the whole sequence into parts 'in time' delimited by the mean distance.

*Illustration 5*: The first letter of each sequence marks the distance level – on the y axis – to the next sequence. The horizontal line is the mean distance involved all combination of two different sequences.

;; Thus, the initial sequence is segmented as follow : N3> (part-s '(EEEB BEEC CBEAEB BAEC CBEBDD DDDADADDA CBEBDADDDA DADDB BCBEBADBA BCED DDDABBBBD BCCBA EA BEAEC CBA CBBAECB BBADA BBABC CDA BEBBABC CEEA ABD BAA CCBA CBBAA CBC ADA BDDE CED)) ((EEEB BEEC CBEAEB BAEC) (CBEBDD DDDADADDA CBEBDADDDA DADDB BCBEBADBA BCED DDDABBBBD) (BCCBA EA BEAEC CBA) (CBBAECB) (BBADA BBABC) (CDA BEBBABC) (CEEA ABD BAA CCBA CBBAA CBC ADA BDDE))

Note that the last sub-sequence CED is omitted because there is no distance defined from this sequence, but this sequence is of course implicitly associated with the last sequence BDDE as distance.

In this analysis, the approach consists to evaluate the probability of an event occurs according *n* previous events.

For instance, the probability of events succeeding the sub-sequence BE with the sample of this article analysed in the chapter Symbolisation is :

N3> (next-event-probability '(B E) (alpha-seq padj -196.01764+ 5)) B => 57.143 % A => 28.571 % E => 14.286 %

Note that the sum of the probabilities is equal to 100 % – or very close according some rounding error caused by computer systems^{3)}.

During this article, we proceeded to a 'deconstruction' of a sample as a sound file – according to the discriminative analysis of *enkode* as events, and more over as symbols and as sub-structures and their relationship – with a view to or in the perspective of a 'reconstruction' according to a formal grammar defined for instance as a *L-system* 'in time', which could be weighted as a Developmental process as seen previously in the chapter Systemic analysis.

Also, even if this work is done on a portion of piece of music or on the whole musical work, this analytic process is done *a posteriori* and more about structural relationship – and according the principle of immanence –, in other words 'out time', a bit like a background process in an artificial intelligence context – during sleep …

#!/bin/bash # $1 = soundfile # $2 = textgrid # $3 = duration name=`basename "$1" | cut -d. -f1` dur=$3 sox $1 1.wav trim 0 $dur dsr=`soxi -s 1.wav` # convert sound file to data text nc=`soxi -c 1.wav` if [ "$nc" -eq 1 ] then sf=`sox 1.wav 1.dat` elif [ "$nc" -eq 2 ] then sox 1.wav 2.wav remix 1,2 sf=`sox 2.wav 1.dat` else echo "Accept only mono (1 channel) or stereo (2 channels) sound file." fi > 2.dat # the number of bin sample divided by n allows to reduce the number of data n=10 tail -n +3 1.dat > 3.dat value=0 while read line do if [ $(( $value % $n )) -eq 0 ] ; then echo -e "$line" | xargs >> 2.dat fi let value=value+1 done < 3.dat # write timing segmentation l=`cat $2 |wc -l` ll=`expr $l - 11` tail -n $ll $2 > 4.dat awk 'NR == 1 || NR % 3 == 0' 4.dat > 5.dat while read p; do if [ 1 -eq "$(echo "${p} < ${dur}" | bc)" ] then echo `awk "BEGIN {printf \"%.3f\n\", $p}"` >> 4.dat fi done < 5.dat # write gnuplot file echo "set terminal png size 1200,300" > 1.pl echo "set output '$name.png'" >> 1.pl echo "unset border;unset xtics;unset ytics" >> 1.pl echo "plot \"4.dat\" every ::0::$dsr using 1:(\$1 <=$dur ? 2 : 0) title '' with impulses lc rgb \"#DDDDDD\", \"2.dat\" every ::0::$dsr using 1:(\$2+1) with lines lc rgbcolor \"#a0a0b0\" title \"\"" >> 1.pl gnuplot 1.pl rm 1.pl 1.dat 2.dat 3.dat 4.dat 5.dat 1.wav 2.wav

article_6.txt · Dernière modification: 2018/07/22 11:53 (modification externe)