![]() ![]() Llama_print_timings: prompt eval time = 24325.61 ms / 54 tokens ( 450.47 ms per token, 2.22 tokens per second) Llama_print_timings: sample time = 45.61 ms / 200 runs ( 0.23 ms per token, 4385.00 tokens per second) Llama_print_timings: load time = 24325.80 ms 21:21:37 INFO:Loaded the model in 9.06 seconds. Hope this helps someone considering upgrading RAM to get higher inference speed on a single 4090. ![]() Unfortunately, with more RAM even at higher speed, the speed is about the same 1 - 1.5t/s. The RAM speed increased from 4.8GHz to 5.6GHz. Will it help and if so does anyone have an idea how much improvement I can expect? Appreciate any feedback or alternative suggestions.įor those wondering, I purchased 64G DDR5 and switched out my existing 32G. There is virtually no SSD activities on subsequent text generations.I'm thinking about upgrading the RAM to 64G which is the max on the Alienware R15. I noticed SSD activities (likely due to low system RAM) on the first text generation. After the initial load and first text generation which is extremely slow at ~0.2t/s, suhsequent text generation is about 1.2t/s. I was able to load 70B GGML model offloading 42 layers onto the GPU using oobabooga. I have an Alienware R15 32G DDR5, i9, RTX4090. ![]()
0 Comments
Leave a Reply. |
AuthorWrite something about yourself. No need to be fancy, just an overview. ArchivesCategories |