Hi Santosh
To understand the pain point, can you split the data load first until PSA and then from PSA to data target.
This exercise will let you know if the problem is while extracting data using Data source or is in routine/transformation while loading data from PSA to cube.
The performance results in development/integration machines are not always convincing as they have different environments, less data, often reduced hardware thus its likely you did not encounter the performance issue in those systems.
Please post the results if you have already performed the above mentioned exercise.
Regards
Ashish