Outils pour utilisateurs

Outils du site


article_6

[ 2018 ]

by Yann Ics

by.cmsc@gmail.com

http://experimental.mus-ics.net

Cite this article:
Yann Ics, Analytical modeling, 2018
Date retrieved: 17/10/18
Permanent link: http://experimental.mus-ics.net/wiki/doku.php?id=article_6

Introduction

This is one proposition for automatic analysis as modeling. Some steps are inevitably empirical, and the modeling is somehow more about optimization of computation as resolutions of combinatorial issues.

To illustrate this analytical modeling, this will be applied on the extract of an interpretation of a padjembel Gwoka rhythm.

This work falls within the margin of the project Neuromuse3 and it has been inspired by the research report Morphologie developed by Frédéric Voisin and Jacopo Baboni Schilingi en 1999.


Encoding – enkode analysis

The aim of the encoding is to normalize data from a sound file using the analytic tools of the software Praat managed by the command line enkode, defined as a multidimensional array of 5 dimensions. This array contents the duration, the loudness, the relative pitch as the centroid, the brightness and the salience of bass frequencies as filtered loudness.

$ enkode -n +textgrid padj.mp3 > padj.dat

Illustration 1: Waveform of the 5 first seconds of the sample with its associated segmentation according to the textgrid generated by the previous command enkode. [ For the record, this illustration has been generated with the script bash in the annex. ]


Symbolisation – hierarchical clustering

The hierarchical clustering of the previous multidimensional data is built inside the artificial neural network neuromuse3 context – called CAH – according to the position of the neurons as events inside a 5-dimensional Euclidean space.

CL-USER> (require 'N3)
CL-USER> (in-package :N3)
N3> (create-mlt 'padj 10 10 :carte (list 'data-map (read-file "padj.dat")))
;; note that the number of neurons is ignored because it is set with the (length (remove-duplicates (cdr data)))
;; also the input number is ignored because the computation is done according to the coordinates of the neurons
N3> (dendrogram padj 3 :with-data t)
;; the second argument is the aggregation type (Ward’s method in this case)
-196.01764+

The function dendrogram generates a data file with the number of nodes according to the trimming distance associated with the minimum distance of the parent node and the sum of the intra-class inertia of the children nodes of the parent node.

Now, the idea is to get the optimum number of classes according to the distance from the parent node and the intra-class inertia. There are no rules, therefore the choice is empirical and estimated with the degree of accuracy analysis required. All it needs to be know is that the distance has to be maximum and the inertia minimum.
Note that in the following graph, the inertia curve is scaled with the impulse segments in order to fit on the same graph.

Illustration 2: The curve is the sum of the intra-class inertia by trimming. The lines are the peaks of the curve of minimum distance from the parent node (the number is the number of classes at this point) and the red number is the number of classes for the minimum value of the intra-class inertia.

In this case, this is a segmentation of 5 classes – referred to as A, B, C, D, and E – which is retained as result.

N3> (alpha-seq padj -196.01764+ 5)
(E E E B B E E C C B E A E B B A E C C B E B D D D D D A D A D D A C B E B D A
 D D D A D A D D B B C B E B A D B A B C E D D D D A B B B B D B C C B A E A B
 E A E C C B A C B B A E C B B B A D A B B A B C C D A B E B B A B C C E E A A
 B D B A A C C B A C B B A A C B C A D A B D D E C E D)

Contrastive analysis – segmentation by marker

The contrastive analysis consists to segment an array of symbols according to a marker defined by the number of occurrences of this marker as a smallest sub-structure, or in other words according to the number of repetitions and for a short sequence which is more focused by the brain as a relevant marker to memorize. This is done recursively until all symbols are different. The side effect of this algorithm is, in case of strict equality between different occurrences of sequences, the choice is done according to the sorting algorithm of the lisp implementation, which we retain the first item. The function structure-s takes as argument the symbolic sequence previously computed.
Also, when the concatenation affects only 2 adjacent items, the algorithm merges all local repetitions with the possible head of the next item – that is to say only when one item is equal to the head of the following item. For instance the sequence AB AB ABC becomes ABABABC.

N3> (structure-s (list (alpha-seq padj -196.01764+ 5)) :result :last) 
(EEEB BEEC CBEAEB BAEC CBEBDD DDDADADDA CBEBDADDDA DADDB BCBEBADBA BCED
 DDDABBBBD BCCBA EA BEAEC CBA CBBAECB BBADA BBABC CDA BEBBABC CEEA ABD BAA CCBA
 CBBAA CBC ADA BDDE CED)         

Paradigmatic analysis – unrooted tree

The paradigmatic analysis allows to observe typological variations within an object or a corpus.
There are no rules either for the paradigmatic discrimination, but an analysis by hierarchical clustering with the single linkage or the complete linkage as aggregation can offer some guidelines.

With this approach, we can accurate the proximity between 'sub-structures' according to the current musical context. The main idea is to use the Levenshtein distance algorithm with some preliminary algorithms which are respectively defined by the distance according to the local repetition, and the distance between two bijective sequences as patterns a and b according to the decomposition into permutation cycle1) – called σ – defined by c = | Οσ(x) |, such as δ(a,b) = | lcm(ca) – lcm(cb) |.

Let A and B be sub-sequences, the distance between A and B is computed as follow :

  1. Remove common local duplicate(s) such as A → A' and B → B'
    Then the 'repetition distance' is d1 = | A \ A' | + | B \ B' |
  2. Remove pattern such as A'' = A' \ C and B'' = B' \ C with C = A' ∩ B'
    Then the 'transposition distance' is applied to the pattern C as d2
  3. Apply 'Levenshtein distance' between A'' and B'' as d3

Then the total distance is d1 × w1 + d2 × w2 + d3 × w3 with wi as weight respectively 1/2, 1/2 and 1 by default.

The setting – that is to say the aggregation type (single or complete linkage) and the weight of each algorithm applied to estimate proximity – remains empirical but the investigation field is significantly reduced and rather intuitive to integrate this modeling into an automatic process.

Illustration 3: Single-linkage clustering.

Illustration 4: Complete-linkage clustering.

The two previous illustrations were generated with the online application iTOL – with the display mode set to the unrooted tree –, from their respective Newick files computed from the contrastive analysis as a dendrogram.

N3> (dendrogram '(EEEB BEEC CBEAEB BAEC CBEBDD DDDADADDA CBEBDADDDA DADDB BCBEBADBA BCED DDDABBBBD BCCBA EA BEAEC CBA CBBAECB BBADA BBABC CDA BEBBABC CEEA ABD BAA CCBA CBBAA CBC ADA BDDE CED) 1|2)        

Both the single and the complete linkages offer the possibility to define five paradigmatic fields – more apparent on the second tree. In any case, this is the teleologic object which determines the setting, both for the number of discrimination and the way the discrimination is done in term of distance.


Systemic analysis – clustering proximity

Defined as a set of relations that maintain elements between them allowing the constitution of a coherent system. Thus, form and structure are two interrelated notions that determine the immanent or transcendent view of the system.

In this current work, some characteristics involve for a morphogenesis point of view as a dynamic system. Morphogenesis is an 'in time' analytic system for observing formal variations according to identified structural processes. Indeed, this kind of traditional musical event is not a piece of music with a determined form, but rather a music that evolves 'in time' according to the feelings and some codification2) in term of proclivity involving each participant.

So, the current systemic analysis will focus on the relationship between adjacent sub-structure defined by the contrastive analysis as derivative according to the distance defined for the paradigmatic analysis, and the process of successivity in terms of probability of the elements constituting the sub-structures.

Derivative clustering

According to the distance between two adjacent sequences, the derivative clustering consists to segment the whole sequence into parts 'in time' delimited by the mean distance.

Illustration 5: The first letter of each sequence marks the distance level – on the y axis – to the next sequence. The horizontal line is the mean distance involved all combination of two different sequences.

;; Thus, the initial sequence is segmented as follow :
N3> (part-s '(EEEB BEEC CBEAEB BAEC CBEBDD DDDADADDA CBEBDADDDA DADDB BCBEBADBA BCED DDDABBBBD BCCBA EA BEAEC CBA CBBAECB BBADA BBABC CDA BEBBABC CEEA ABD BAA CCBA CBBAA CBC ADA BDDE CED))
((EEEB BEEC CBEAEB BAEC)
 (CBEBDD DDDADADDA CBEBDADDDA DADDB BCBEBADBA BCED DDDABBBBD)
 (BCCBA EA BEAEC CBA) (CBBAECB) (BBADA BBABC) (CDA BEBBABC)
 (CEEA ABD BAA CCBA CBBAA CBC ADA BDDE))

Note that the last sub-sequence CED is omitted because there is no distance defined from this sequence, but this sequence is of course implicitly associated with the last sequence BDDE as distance.

Developmental process

In this analysis, the approach consists to evaluate the probability of an event occurs according to n previous events.
For instance, the probability of events succeeding the sub-sequence BE with the sample of this article analyzed in chapter Symbolisation is :

N3> (next-event-probability '(B E) (alpha-seq padj -196.01764+ 5))
B => 57.143 %
A => 28.571 %
E => 14.286 %

Note that the sum of the probabilities is equal to 100 % – or very close according to some rounding error caused by computer systems3).


Resolution

During this article, we proceeded to a 'deconstruction' of a sample as a sound file – according to the discriminative analysis of enkode as events, and more over as symbols and as sub-structures and their relationship – with a view to or in the perspective of a 'reconstruction' according to a formal grammar defined as a musical L-system, or a Markov chain, which could be weighted as a Developmental process as seen previously in the chapter Systemic analysis.

Here is one way to experiment with a Markov chain according to the transition probability matrix of the function next-event-probability:

  • let S be the initial sequence according to the alphabet e = {a, b, c, …} such as eS
  • let P(e) be the probability of an occurrence en in S
  • let w be the sub-sequence as the previous state and set with an initial element such as w = P(e) or w = en then the next event is P(e|w)
  • if P(e|w) does not exist or if P(en|w) = 1* with |w| > 1 then minimize i ∈ ]0, |w[i]| = 1] such as ∃i ∈ ℕ: P(e|w[i]) ∈ e knowing that w[i] is the position of and from the beginning of the reduced w as a tail sub-sequence, or in other words as a suffix.

    * In this case and from this event, we have to solve the max order problem4). Indeed, the sequence generated will strictly be a copy of the initial sequence and does not allow any variation of the latest. This behavior is obviously not interesting in this context.

Also, even if this work is done on a portion of piece of music or on the whole musical work, this analytic process is done a posteriori and more about structural relationship – and according to the principle of immanence –, in other words 'out time', a bit like a background process in an artificial intelligence context – especially in reference to the brain activity during sleep for instance …


Discussion

The 'a posteriori' structural analysis is naturally different from the idea that could be done in real time. During the temporal flux, several factors interfere:

  • The marker delimiting two sub-sequences, this one can evolve over time and be different for each discrimination (the incoming information can change or consolidate the probabilities of the acquired);
  • The concentration and the type of focusing – which can be versatile – during listening;
  • and the passive of the subject, notably about his/her musical education and his/her own experience of the sound phenomenon.

It probably exists some more factors, but this is not the point here to list exhaustively them. Instead, the aim is to illustrate the elusive character of an 'objective' analysis. In practice, this consists to minimize these factors in order to reach a formalism proposing a convincing modeling. This can be done with repeated listenings of the work allowing a holistic analysis – at least for the previous two first factors –, or with an algorithmic analysis or a synthesis of different types of analysis but a posteriori. In both cases, this takes time; and the result remains dependent of the axioms (or prerequisites) – i.e. the formalization step – and the teleological object – i.e. the modeling step.


Source code

Annex

#!/bin/bash
# $1 = soundfile
# $2 = textgrid
# $3 = duration
name=`basename "$1" | cut -d. -f1`
dur=$3
sox $1 1.wav trim 0 $dur
dsr=`soxi -s 1.wav`
 
# convert sound file to data text
nc=`soxi -c 1.wav`
if [ "$nc" -eq 1 ]
then sf=`sox 1.wav 1.dat`
elif [ "$nc" -eq 2 ]
then 
sox 1.wav 2.wav remix 1,2
sf=`sox 2.wav 1.dat`
else
echo "Accept only mono (1 channel) or stereo (2 channels) sound file."
fi
 
> 2.dat
# the number of bin sample divided by n allows to reduce the number of data 
n=10 
tail -n +3 1.dat > 3.dat
value=0
while read line
do
    if [ $(( $value % $n )) -eq 0 ] ; then
        echo -e "$line" | xargs >> 2.dat
    fi
        let value=value+1 
done < 3.dat
 
# write timing segmentation 
l=`cat $2 |wc -l`
ll=`expr $l - 11`
tail -n $ll $2 > 4.dat
awk 'NR == 1 || NR % 3 == 0' 4.dat > 5.dat
while read p; do
if [ 1 -eq "$(echo "${p} < ${dur}" | bc)" ]
then
echo  `awk "BEGIN {printf \"%.3f\n\", $p}"` >> 4.dat
fi
done < 5.dat
 
# write gnuplot file
echo "set terminal png size 1200,300" > 1.pl
echo "set output '$name.png'" >> 1.pl
echo "unset border;unset xtics;unset ytics" >> 1.pl
echo "plot \"4.dat\" every ::0::$dsr using 1:(\$1 <=$dur ? 2 : 0) title '' with impulses lc rgb \"#DDDDDD\", \"2.dat\" every ::0::$dsr using 1:(\$2+1) with lines lc rgbcolor \"#a0a0b0\" title \"\"" >> 1.pl
gnuplot 1.pl
rm 1.pl 1.dat 2.dat 3.dat 4.dat 5.dat 1.wav 2.wav
1) Marc Deléglise, Permutations et cycles, Université Lyon 1, Capes Externe Math, 2010–2011.
2) Yann Ics, Les rythmes du Gwoka, Article documentaire, 2012–2015.
3) Numerical Computation Guide, What Every Computer Scientist Should Know About Floating-Point Arithmetic, July 2001, pp. 171 to 264.
4) Alexandre Papadopoulos, Pierre Roy, François Pachet. Avoiding Plagiarism in Markov Sequence Generation, UPMC Paris 06, 2014.
article_6.txt · Dernière modification: 2018/10/13 19:31 (modification externe)