A company collects and transforms data files from third-party providers by using an on-premises SFTP server. The company uses a Pythonscript to transform the data.
The company wants to reduce the overhead of maintaining the SFTP server and storing large amounts of data on premises. However, the company does not want to change the existing upload process for the third-party providers.
Which solution will meet these requirements with the LEAST development effort?
A company is reading data from various customer databases that run on Amazon RDS. The databases contain many inconsistent fields For example, a customer record field that is place_id in one database is location_id in another database. The company wants to link customer records across different databases, even when many customer record fields do not match exactly
Which solution will meet these requirements with the LEAST operational overhead?
A human resources company maintains a 10-node Amazon Redshift cluster to run analytics queries on the company’s data. The Amazon Redshift cluster contains a product table and a transactions table, and both tables have a product_sku column. The tables are over 100 GB in size. The majority of queries run on both tables.
Which distribution style should the company use for the two tables to achieve optimal query performance?
A company hosts an on-premises PostgreSQL database that contains historical data. An internal legacy application uses the database for read-only activities. The company’s business team wants to move the data to a data lake in Amazon S3 as soon as possible and enrich the data for analytics.
The company has set up an AWS Direct Connect connection between its VPC and its on-premises network. A data analytics specialist must design a solution that achieves the business team’s goals with the least operational overhead.
Which solution meets these requirements?