Sunday, January 21, 2018

Near-Real Time Vendor Data On-Prem

The company I am working for has a copy of a vendor database on-prem for reporting and analysis.  The database is over 4 TB at this time, and constantly growing due to acquisitions and continued patient care.

Currently, our process for updating the data is log shipping, a tried and true method which has been successfully updating the database for years now.  Requirements are ever-changing, and there is a desire to have data that is updated more often during the day.  This article describes the current process, and explains how Availability Groups can be used to fulfill the requests.

Click here for a pdf that shows the current process, and both new processes, as explained in detail below.

Current Process = Log Shipping

  • Every hour the transaction logs are downloaded from the vendor's SFTP site directly to the database server.
  • Starting at 5pm every night, the restore process kicks off.
  • Once the restore is completed, the SQL job kicks off other SQL jobs which are ETL processes.  The data is updated in the DW staging database, in prep for the main ETL job to run.
  • Pros/Cons
    • Con: The database is not accessible during this time, which is normally 2-6 hours each night.
    • Pro: There are checkspoints in the process to help determine all log files have been received by comparing dates, another checks the lsn numbers to make sure the chain is not broken, etc., so the process is fairly straight forward.

New Process = SQL 2014 Availability Group

  • Create a new SQL server within our company's data center, but within the vendor's domain.
  • Configure the SQL server to be included in the same Availability Group as the production instance.  It will not allow failover, and will be set to asynchronous mode to ensure the application/production environment wouldn't be negatively affected.
  • From a security and responsibility point of view, the server would belong to the vendor.  Our company would not have access to remote into the server, nor to connect to the SQL instance.
  • The server would have 2 NICs, one to talk to the vendor domain, and the other to specifically talk to our DW server.  
  • The existing virtual SQL server will stay in place, along with the existing log shipping method.  The server will stay updated every 24 hours to help ensure if there is extended downtime, either scheduled or unexpected, we will still be able to access data that is within 24 hours.
  • With this setup, the only access that our company would have is a linked server connection from our DW server to the new server.  Therefore, automated queries (and some adHoc queries) would run using the linked server.  Any type of data structure discovery or development of queries would happen from the existing instance where our specific BI personnel have access to connect to the SQL instance.
  • Pros/Cons
    • Pro: Since the vendor's production instance already uses SQL Availability Groups, this process just adds another to the existing group.  Much less work for the vendor's DBA team.
    • Pro: Since the new server is within the vendor's domain, no extra work needs to be done on our company's IT staff for monitoring the server and SQL.
    • Con: With no direct access to the server, the queries that pull a large amount of data will need to be pulled through the network.  There are ways of mitigating this somewhat, including making sure everyone tunes their queries to do as much work on the linked server as possible so that the least amount of data is being brought across to the server unnecessarily.
    • Con: Any downtime for our company will need to be pre-planned with the vendor.  Since the server is in their control, they will have their own monitoring for both the server and for SQL.  They will need to be aware of anytime it is known that downtime is coming.

New Process = SQL 2016 Availability Group

* This would be either starting with SQL 2016 fresh, or starting with SQL 2014 and then upgrading to it when the vendor is prepared.

  • Create a new SQL server within our company's data center, within our company's domain.
  • Configure the SQL server to be included in the same Availability Group as the production instance.  It will not allow failover, and will be set to asynchronous mode to ensure the application/production environment wouldn't be negatively affected.
  • The server would have 2 NICs, one to talk to the vendor domain, and the other to specifically talk to our domain.
  • The existing virtual SQL server will stay in place, along with the existing log shipping method.  The server will stay updated every 24 hours to help ensure if there is extended downtime, either scheduled or unexpected, we will still be able to access data that is within 24 hours.
  • With this setup, the server is part of our company's domain and therefore falls within our responsibility.  SQL will need access to both domains for the Availability Group.
  • Pros/Cons
    • Pro: Since the vendor's production instance already uses SQL Availability Groups, this process just adds another to the existing group.  Much less work for the vendor's DBA team.
    • Pro: Since SQL 2016 allows an Availability Group to not require a Windows Server Failover Cluster, that removes the restriction of being in the vendor's domain.
    • Con: Any downtime for our company will need to be pre-planned with the vendor.  Since the server is in their control, they will have their own monitoring for both the server and for SQL.  They will need to be aware of anytime it is known that downtime is coming.

 

Archives

Powered by mod LCA