(GSoC) Week-14 Recent Updates

Hello. This blog will be about a few updates I made in recent weeks. Also, I am excited to share that I will be giving a lightning talk in the Google Summer of Code Virtual Contributor Lightning Talks 2023 on my project. Its going to be a short talk(3 minutes max), I will be summarizing my project and I am really excited for it!

In think blog, I will give some info on the recent updates, like adding coreference resolution, some small speedups we got in the code, etc.

Background

In the last blog, I had given details about how each component in our pipeline uses time to process each triple. Based on that, I tried to make things a little faster and we have achieved some speedup, earlier we took around 5 seconds to process one-triple-containing-sentence(process means to do entity linking and relation mapping) and now we need around 1.8 seconds.

Also, I have added co-reference resolution as discussed in the TODOs blog, this will help disambiguate entity mentions and thus we probably won’t miss important relations due to entity not being recognized. Though, I haven’t yet analysed the time consumption of this component.

Speedup

I found out that simply using transformers pipeline for GENRE instead of the model and tokenizer individually led to a 3.4 times speedup in entity linking. The reasons for this speedup lies in the different optimizations that might be there in transformers pipeline.

Next up?

I will be looking to speed up and scale the pipeline further so as to run it on entire wikipedia.