NVIDIA TensorRT-LLM Coming To Windows, Brings Huge AI Boost To Consumer PCs Running GeForce RTX & RTX Pro GPUs

NVIDIA has announced that TensorRT-LLM is coming to Windows soon and will bring a huge AI boost to PCs running RTX GPUs.

NVIDIA RTX GPU-Powered PCs To Get Free AI Performance Boost In Windows With Upcoming TensorRT-LLM Support

Back in September, NVIDIA announced its TensoRT-LLM model for Data Centers which offered an 8x gain on the industry's top AI GPUs such as the Hopper H100 and the Ampere A100. Taking full advantage of the tensor core acceleration featured on NVIDIA's GeForce RTX & RTX Pro GPUs, the latest model will deliver up to 4x faster performance in LLM Inferencing workloads.

Earlier, we explained that One of the biggest updates that TensorRT-LLM brings is in the form of a new scheduler known as In-Flight batching which allows work to enter & exit the GPU independent of other tasks. It allows dynamic processing of several smaller queries while processing large compute-intensive requests in the same GPU. The TensorRT-LLM makes use of optimized open-source models which allow for higher speedups when Batch Sizes are increased. Starting today, these optimized open-source models have been made available to the public and are available to download at developer.nvidia.com.

The added AI acceleration with the TensorRT-LLM model will help drive various daily productivity tasks such as engaging in chat, summarising documents and web content, drafting emails and blogs, and can also be used to analyze data and generate vast amounts of content using what is available to the model.

So how will TensorRT-LLM help consumer PCs running Windows? Well in a demo shown by NVIDIA, a comparison between an open-source pre-trained LLM model such as LLaMa-2 and TensorRT-LLM was shown. When a query is passed to LLaMa-2, it will gather information from a large generalized dataset like Wikipedia so they don't have up-to-date information after they were trained nor do they have domain-specific datasets that they weren't trained on. They also won't certainly know about any dataset that is stored on your personalized devices or systems. So you won't get the specific data that you are looking for.

There are two approaches to solving this problem, one is fine-tuning where the LLM is optimized around a specific data set but that takes a lot of time depending on the size of the data set. The other approach is RAG or Retrieval Augamanted Generation which uses a localized library that can be filled with the dataset you want the LLM to go through & then leverage the language understating capabilities of that LLM to provide you with the information that only comes from that dataset.

In the example, a question is asked related to the NVIDIA tech integrations within Alan Wake 2 which the standard LLaMa 2 model is unable to find the proper results to but the other model with TensorRT-LLM which is fed data from 30 GeForce News articles in the local repository can provide the required information without any problems. So TensorRT-LLM provides a relevant answer and also does it faster than the LLaMa-2 model. Furthermore, NVIDIA also confirmed that you can use TenosrRT-LLM to accelerate almost any model. This is just one of the many use cases where NVIDIA TensorRT-LLM can leverage AI to deliver faster and more productive PC experiences in Windows so stay tuned for more announcements in the future.

Written by Hassan Mujtaba

Wccftech Continue reading/original-link]

Ukraine is pushing for EU membership. But what are the real chances?

Europe looks for alternate gas solutions but could it be left in cold?

More people in need of charity in Europe since COVID-19, NGO says

Eight Bulgarians among 11 missing after fire on ship near Corfu

Near the frontline in eastern Ukraine, snipers and scepticism abound

War in Ukraine will not be short, and it’s changed everything for Europe

WA records 1,766 new local COVID cases as it prepares to open border

Clive Palmer may have just bought Hitler’s car, say Liberals and Labor

Mud Army 2.0 urged to check with home owners before tossing things out

Ramping cut almost in half in last four months, SA government says

Nordstrom shares soar as it makes ‘baby steps’, still has a ways to go

Target thinks it can keep growing sales, here’s how the retailer will do it

AMC is charging more for ‘Batman’ tickets as it tests out a new pricing model

Benioff touts Salesforce’s sales guidance, ‘$30 billions are ahead of us’

Meta says today’s cellular networks aren’t ready for the metaverse

Skyrim Co-Op Mod Released, Mostly Actually Works

Can you name Barca’s starting XI from last Europa League appearance?

After scoring confirmed, should Taylor offer Catterall a rematch?

The ‘internal battle’ when counter culture meets elite sport

‘Messi-inspired’ Grealish helps Man City beat Peterborough in match

A newfound quasicrystal formed in the first atomic bomb testesd in US

How omicron’s mutations make it the most infectious coronavirus variant

Africa’s fynbos plants hold their ground with the world’s thinnest roots

‘Fresh Banana Leaves’ shows how Indigenous people have been harmed

A fast radio burst’s unlikely source may be a cluster of old stars

NVIDIA TensorRT-LLM Coming To Windows, Brings Huge AI Boost To Consumer PCs Running GeForce RTX & RTX Pro GPUs

NVIDIA RTX GPU-Powered PCs To Get Free AI Performance Boost In Windows With Upcoming TensorRT-LLM Support

Related articles

How To Unlock Every Hero And Weapon Evolution In Vampire Survivors Ode To Castlevania DLC

Overwatch Players, Y’all Lived Like This In 2016?

Is Black Myth: Wukong Coming To Xbox? Phil Spencer Knows, But Won’t Say

Best Android app price drops and freebies: Doom & Destiny Worlds, YoWindow Weather, more

Recent articles

How To Unlock Every Hero And Weapon Evolution In Vampire Survivors Ode To Castlevania DLC

Overwatch Players, Y’all Lived Like This In 2016?

Is Black Myth: Wukong Coming To Xbox? Phil Spencer Knows, But Won’t Say

Best Android app price drops and freebies: Doom & Destiny Worlds, YoWindow Weather, more