Automatic Induction of FrameNet lexical units in Italian

In this paper we investigate the applicability of automatic methods for frame induction to improve the coverage of IFrameNet, a novel lexical resource based on Frame Semantics in Italian. The experimental evaluations show that the adopted methods based on neural word embeddings pave the way for the assisted development of a large scale lexical resource for


Introduction
When dealing with large-scale lexical resources, such as FrameNet (Baker et al., 1998), PropBank (Palmer et al., 2005), VerbNet (Schuler, 2005) or VerbAtlas (Di Fabio et al., 2019), the semiautomatic association between predicates and lexical items (also known as Lexical Units or LUs) is crucial to improve the coverage of a resource while limiting the costs of its manual annotation. Several approaches to this semi-supervised task exist, as discussed in QasemiZadeh et al. (2019). In particular, Pennacchiotti et al. (2008) exploited distributional models of lexical meaning (Sahlgren, 2006;Croce and Previtali, 2010) to induce new LUs consistently with the Frame Semantics theory (Baker et al., 1998), representing words meaning and semantic frames through geometrical word spaces. As a result, this approach allows to induce new LUs when applied to the English version of FrameNet. However, this is a quite consolidated resource with many existing LUs connected to each semantic predicate, i.e., each frame. The applicability of this method in scenarios where only one or two LUs are available for each frame is still an open issue. At the same Copyright c 2020 for this paper by its authors. Use permitted under Creative Commons License Attribution 4.0 International (CC BY 4.0). time, since the work of Pennacchiotti et al. (2008), the application of neural approaches to the acquisition of word embeddings (Mikolov et al., 2013;Baroni et al., 2014;Ling et al., 2015) significantly improved in terms both of representation capability and scalability of geometrical models of lexical semantics.
In this paper we thus investigate the applicability of the method proposed in Pennacchiotti et al. (2008) to boost the coverage of a novel and still limited lexical resource based on Frame Semantics in Italian. This resource has been developed within the IFrameNet (IFN) project (Basili et al., 2017), which aims at creating a large coverage FrameNet-like resource for Italian and to come up with a complete dictionary in which every lexical entry 1 is linked to all the frames it can evoke (i.e., the frames for which it is a LU). At this moment, while the resource counts more than 7,700 lexical items associated to more than 1,048 frames, each lexical item is connected, on average, to only 1.3 frames, and it is problematic if considering the high polysemy of Italian words (Casadei, 2014).
The experimental evaluation shows that neural word embeddings enable the effective application of the distributional approach from Pennacchiotti et al. (2008) to improve the coverage of IFN. Moreover, the adopted distributional framework allowed to develop a graphical semantic browser to support annotators while assigning new LUs to frames. This study paves the way to the semiautomatic development of IFN and investigates about the applicability of neural word embeddings to the incremental semi-automatic LU induction process.

Related Work
In the development of FrameNet and FrameNetlike resources for new languages, one important task is the creation of a large-scale dictionary, in order to guarantee an effective application in semantic analyses or NLP tasks. In fact, the limited coverage of FrameNet has been addressed as one of the main reason of failures (Pennacchiotti et al., 2008;Pavlick et al., 2015). For these reasons and given the high costs of manual annotation, both in terms of time and resources (i.e., human annotators), the automatic (or semi-automatic) expansion of the dictionary for FrameNet and FrameNetlike resources has received attention during the years. Several methods to support the population of frames in FrameNet (Baker et al., 2007;Pavlick et al., 2015;Ustalov et al., 2018;QasemiZadeh et al., 2019;Anwar et al., 2019;Yong and Torrent, 2020), and FrameNet-like resources (Johansson and Nugues, 2007;Tonelli, 2010;Johansson, 2014;Hayoun and Elhadad, 2016) with new Lexical Units have been widely investigated. Some of the methodologies proposed in order to automatically expand FrameNet have exploited the alignment between WordNet and FrameNet data (Johansson and Nugues, 2007;Pennacchiotti et al., 2008;Ferrández et al., 2010). Another strategy is the one adopted by Pavlick et al. (2015) where the scholars enlarge FrameNet coverage using automatic paraphrase. The majority of the works dealing with automatic frame induction, however, exploits distributional methods, for example the work on which this research relies the most, i.e., the work of Pennacchiotti et al. (2008) or some of the most recent works such as the ones of Ustalov et al. (2018),  and Yong and Torrent (2020). Ustalov et al. (2018), for example, model the frame induction problem as a tri-clustering problem and use dependency triples automatically extracted from a Web-scale corpus.  propose to combine dense representations from hidden layers of a masked language model with sparse representations based on substitutes for the target word in the context for the creation of vector representations.

IFrameNet status
The IFrameNet project (Basili et al., 2017), relied, as a starting point, on the achievements of previous researches on the development of Italian resources annotated according to Frame Semantics DeCao et al., 2010), i.e., a set of automatically induced LUs that were covering 554 frames of the 1, 224 frames in FrameNet.
Since the beginning, our main objective has been to improve the coverage of the resource in terms of annotated frames, increasing the number of the LUs and the number of annotated sentences representing each predicate. Starting from the results achieved in 2017, we enlarged the dictionary and provided an initial set of LUs for those frames without any annotation. We also revised the whole dictionary and expunged the LUs whose lemma had low frequency 2 in CORIS (Corpus di Italiano Scritto) (Rossini Favretti et al., 2002). Since CORIS is a large-scale and general-purpose Italian corpus (without biases to any domain), we speculate that not represented LUs can hardly characterize a frame in Italian. Moreover, we worked on the frame annotation of sample sentences taken from the CORIS corpus. We relied on CORIS because it is domain independent and suitable to represent the generic notion of frames. Currently, the resource contains: • 7,776 lexical entries of which: 1, 130 adjectives, 4, 309 nouns and 2, 337 verbs; • 10,379 LUs (nouns, verbs and adjectives) validated in terms of pairs of lexical entries and evoked frame(s); • 1,048 frames with at least one LU among which 743 frames are represented with at least one sentence. Among the 176 frames that still do not have any LU in their dictionary, 134 are marked as Non-Lexical in FrameNet, 12 do not have any LU in FrameNet, but are not explicitly marked as Non-Lexical, 18 are not represented in FrameNet by any noun, verb or adjective and finally, for just 8 frames, it was difficult to find LUs in Italian (e.g. IMPROVISED EXPLOSIVE DEVICE or SHORT SELLING); • 5,208 sentences annotated and validated with at least one LU; • an average of 9.9 LUs assigned to each frame; • an average of 1.3 frames associated to each LU. Among the existing LUs, 5, 960 are assigned to only one frame. Given that Italian language is highly polysemous, it is probable that many LUs evoke more than one frame. This work aims at reducing this limitation.

Automatic Frame Induction
For the Frame Induction we rely on distributional methods as in Pennacchiotti et al. (2008), described hereafter. Distributional representation. As a first step, we obtain a distributional representation of the CORIS corpus and represent in the wordspace each LU as a vector l. We investigated three slightly different approaches for the acquisition of the wordspaces: the Continuous Bag-of-Words model (CBOW), the Skip-gram model (Mikolov et al., 2013) and the Structured Skipgram (sskip-gram) model (Ling et al., 2015). The sskip-gram is a modification of the skip-gram model, sensitive to the positioning of the words and, thus, more suitable for capturing syntactic properties of the words (Ling et al., 2015). Our hypothesis is that this last model would be more suitable for capturing LUs frame properties since syntax is, in general, in agreement with semantic arguments (i.e., Frame Elements, FEs) and their order. "Framehood" representation. As a second step, we exploit the obtained embeddings to represent the meaning of frames. We assume that a frame f can be described by the set of its LUs l ∈ F and that LUs vectors l can be thus used to acquire a distributional representation for each frame. In a nutshell, for each frame we: (i) select all the LUs of its dictionary, (ii) apply to LUs vectors l a clustering algorithm. A frame will be then represented as a set of clusters: given that each frame can have various nuances and that it can be representative of non overlapping senses, sparse in the semantic space, we represent it through its "clusters of senses". This captures, in the semantic space, the possible "framehood" distributions, as dense regions of LUs. In this work, we applied standard K-means (Hartigan and Wong, 1979), so that each frame is represented as a set of k clusters. For each frame k is empirically set to the square root of the number of LUs l in that frame: k = |l|, where |l| denotes the count of l per frame. In this way, each f will have k clusters depending on the number of its LUs and the centroid of each cluster will represent the prototype for a subset of the senses of a frame. New LU induction. Once obtained the distributional representations for frames and LUs, the third step involves the automatic induction of frames given a candidate lexical item. For each  candidate predicate word, we computed the distance between its vector and the sets of clusters representing the frames. The "nearest" clusters will be the ones containing a set of LUs more closely related to the input lexical item, so that the corresponding frames will be suggested as its evoking frames.

Experimental Evaluation
In order to assess the quality of the proposed method, we evaluate its capability in rediscovering the frames manually associated to a lexical item. We apply a leave-one-out schema: for each candidate lexical item, we eliminate it from the dictionary and query the model to "suggest" up to 10 frames. In practice, we rebuild the clusters and then compute the distance between the lexical item's vector and the set of clusters representing all frames. Then, we compare the suggested frames with the frames that were originally linked to the LU. As in Pennacchiotti et al. (2008), we compute Accuracy as the fraction of LUs that are correctly re-assigned to the original frame. Accuracy is computed at different levels b: a LU is correctly assigned if one of its gold standard frames appears among the best-b frames ranked by the model. In fact, as LUs can have more than one correct frame, we deem as "correct" an assignment for which at least one of the correct frames is among the best-b.
The model is evaluated by sampling the test bed according two dimensions, as reported in Table  1. First, we considered the Part-of-Speech (POS) of the LUs (i.e., rows in Table 1). In fact, lexical items having different POS are generally projected in different sub-spaces within word spaces. We thus evaluate the model considering separately LUs and frames containing adjectives (a), nouns (n) or verbs (v). For the sake of completeness, we also evaluated the model without any selection by POS (row a-n-v). When a frame does not contain any LU represented in the wordspace with a required POS, it is discarded during the evaluation: as an example, the actual dictionary contains 631  − 1 b − 2 b − 3 b − 4 b − 5 b − 6 b − 7 b − 8 b − 9   frames containing at least one noun. Then, we filtered frames by applying a threshold to the number of LUs a frame should be connected to, in order to be considered (columns in Table 1), as it follows: first, we considered all frames containing at least one LU whose lemma occurred at least 20 times in CORIS, without applying any other restriction (column 1); then we filtered frames with at least 2 valid LUs 3 (column 2); finally we filtered frames with at least 5 valid LUs (column 5). Both filter policies can be combined and the stricter these policies are, the lower the number of frames considered in the evaluation. As a consequence, the Accuracy baseline of a model which randomly assigns LUs to frames depends on the number of selected frames: when no filter is applied (row a − n − v and column 1) a random assignment would achieve 0.09% = 1 1,041 of Accuracy, or 0.4% = 1 250 when only frames containing at least 5 nouns are selected. Table 2 reports the experimental results of a model derived using a sskip-gram model (Ling et al., 2015) 4 . If we consider the performance over only nouns (n) we see that, when a reasonable threshold is set (row th = 2), in 48% of cases in first position we find one of the original frames evoked by the noun under analysis (column b − 1). If we consider the first two frames proposed by the system (b − 2) the Accuracy rises up to 61% and it keeps increasing as we consider more frames. It is impressive if considering that the corresponding random baseline is 0.2% = 1 463 and 0.4% = 2 463 . If we jointly consider nouns, verbs and adjectives (a-n-v) the performance is slightly lower: for example, with the same threshold th = 2 and considering only two suggested frames (b − 2) the Accuracy is 61%. It means that, on average, the model capability of assigning LUs (ignoring their POS) to frames is slightly lower. This is confirmed by the general drop obtained when only verbs or adjectives are considered: for verbs, considering only the best suggestion (b−1) we measured 25%, if we don't apply any threshold, to 32%, if we consider th = 2, to 42% if we consider th = 5. This is mainly due to higher polysemy characterizing verbs and adjectives with respect to nouns (Casadei, 2014). Anyway, this result is straightforward if considering that for verbs, the baseline in the setting th = 2 and b = 1 corresponds to 0.2% = 1 514 . Discussion. It is worth noting that our dictionary is largely incomplete and thus some of those counted as "incorrect assignements" are instead frames that are evoked by the LU under analysis and that should be added to the dictionary. Moreover, we can see that many of the b − 10 frames are often related at different degrees with the lexical entry under analysis and with the frames for which it is a LU.
For example, when considering the lexical entry "impiccare.v" (hang.v) the model does not retrieve among the b − 10 suggestions the only "correct" frame, i.e., the frame EXECUTION. Anyway, the closest frame identified is the frame KILLING that not only is linked with EXECU-TION with an Inheritance relation, but also appears to be evoked by "impiccare.v". Again, the system is not able to re-assign the lexical entries "innalzarsi.v" (raise.v and rise.v), "innocenza.n" (innocence.n) and "radiazione.n" (radiation.n or expulsion.n) . Anyway, in the b − 10 Figure 1: An example of the IFrameNet Navigator for the LU alleato.a of "innalzarsi.v" appears in fourth position the frame CHANGE POSITION ON A SCALE that can be evoked by "innalzarsi.v" in sentences such as "La marea si innalzava" (The tide was rising) and in the b − 10 of "innocenza.n" appears, in first position, the frame CANDIDNESS that is evoked by this LU in sentences such as "Lei rispose con innocenza" (She answered genuinely). The term "radiazione.n" is present in the dictionary only with the meaning expulsion.n and it is linked only to EXCLUDE MEMBER. Nevertheless, the system proposes the frame NUCLEAR PROCESS in first position and retrieves one correct meaning of a LU like "radiation.n". For "alleato.a" (ally.n, also shown in Figure 1) the system proposes a "correct" frame in ninth position. Anyway, we find in second position the frame MEM-BER OF MILITARY that can be plausibly evoked. Moreover the LU "agnello.n" (lamb.n) evokes in the dictionary only the frame FOOD; anyway, as correctly suggested by the system, it is also LU of the frame ANIMALS. Moreover for "agnello.n" the system proposes also, in sixth position, PEO-PLE BY MORALITY that recalls the idea of innocence and righteousness that represents (at least for the Italian language) a metaphorical extension of the meaning of "lamb.n", strongly influenced by the religious image of the lamb.
In some other cases, the system suggests relations between frames. For example, if we consider the lexical entry "identico.a" (identical.a from IDENTICALITY) we see in the best-10 frames that the system proposes frames such as SIMILAR-ITY (first position) or DIVERSITY (seventh position). If we look at the frame-to-frame relations in FrameNet, we see that IDENTICALITY and SIM-ILARITY or IDENTICALITY and DIVERSITY are not directly connected even if they appear, at a close analysis, strictly related.

IFrameNet Navigator
In order to make the model valuable for the annotators, we also developed a Graphical User Interface, called IFrameNet Navigator. It allows querying and navigating the geometrical representation of semantic phenomena as it displays, for each lexical entry in the dictionary, the best-10 frames. These can be also selected to browse the set of LUs assigned to the cluster underlying the frame, as shown in Figure 1. Finally, each LU can be selected to browse the list of corresponding annotated sentences.
The objectives of the Navigator are: (i) to support the analysis of the currently modeled lexical entries (and the corresponding LUs); (ii) to support the validation of the current sentence classification; (iii) the mining of the CORIS corpus for improving the semantic coverage of the resource for the Italian language; (iv) in perspective, to offer support towards crowd sourcing. This tool will be publicly released to trigger collaborative validation and annotation as an extension of the IFrameNet and the CORIS resources.

Conclusions and Research Perspectives
In this work, we presented the actual state of the IFrameNet project, which aims at developing a large-scale lexical resource based on Frame Semantics in Italian. Moreover, we investigated the applicability of a method for the automatic Induction of FrameNet Lexical Units to improve the coverage of the actual resource, in terms of number of frames assigned to the almost 8,000 existing lexical entries.
With respect to previous work, i.e., Pennacchiotti et al. (2008) we empirically demonstrate the beneficial impact of neural word embeddings in the overall workflow in Italian. The robustness of the adopted model is confirmed also when applied to a resource with a limited average number of frames associated to Lexical Units. The experimental evaluations in many cases showed the valuable support of the method in discovering new Lexical Units by suggesting novel evoked frames. Moreover, the error analysis suggested that most of the "discarded" frames still entertain various kinds of relationships with the "correct" ones as defined in FrameNet, such as Inheritance or Usage. In some cases, it also highlighted metaphorical meanings that the lexical entries could assume.
As a future work, we will certainly exploit the produced IFrameNet Navigator to extend the current LU Italian dictionary, support the annotation of novel sentences and introduce frame-to-frame relations in Italian. Another path that might worth investigating is the exploitation of dependencybased word embeddings for the distributional representation of LUs and frames. This may beneficial since dependency-based contexts highlight more functional similarities (Levy and Goldberg, 2014). Finally, we plan to use the derived frame distributions to augment existing contextualized embeddings in support of Frame Induction (Sikos and Padó, 2019) or Semantic Role Labeling (Shi and Lin, 2019) tasks.