How to Delete duplicate Rows from Table

In the below table studId 1,6 and 9 is repeated which should be deleted.

studId studentName age
1 Mugil 35
2 Vinu 36
3 Viju 42
4 Mani 35
5 Madhu 36
6 Mugil 35
7 Venu 37
8 Banu 34
9 Mugil 35

Below query wont work on MySQL but the format doesn’t change. When you take Max only last occurrence of row would be taken and others would be excluded.

DELETE FROM tblStudents TS
 WHERE TS.studId NOT IN (SELECT MAX(TSS.studId)
                            FROM tblStudents TSS
                        GROUP BY TSS.studentName, TSS.age)c

The same could be done using MIN function.

SELECT TS.* FROM tblStudents TS
 WHERE TS.studId NOT IN (SELECT MIN(TSS.studId)
                           FROM tblStudents TSS
                          GROUP BY TSS.studentName, TSS.age)                        

Output

studId studentName age
6 Mugil 35
9 Mugil 35

Q1.Optimistic vs. Pessimistic locking
Optimistic Locking is a strategy where you read a record, take note of a version number (other methods to do this involve dates, timestamps or checksums/hashes) and check that the version hasn’t changed before you write the record back. When you write the record back you filter the update on the version to make sure it’s atomic. (i.e. hasn’t been updated between when you check the version and write the record to the disk) and update the version in one hit.If the record is dirty (i.e. different version to yours) you abort the transaction and the user can re-start it.

This strategy is most applicable to high-volume systems and three-tier architectures where you do not necessarily maintain a connection to the database for your session. In this situation the client cannot actually maintain database locks as the connections are taken from a pool and you may not be using the same connection from one access to the next.Optimistic locking doesn’t necessarily use a version number. Other strategies include using (a) a timestamp or (b) the entire state of the row itself. The latter strategy is ugly but avoids the need for a dedicated version column, in cases where you aren’t able to modify the schema.

Pessimistic Locking is when you lock the record for your exclusive use until you have finished with it. It has much better integrity than optimistic locking but requires you to be careful with your application design to avoid Deadlocks. To use pessimistic locking you need either a direct connection to the database (as would typically be the case in a two tier client server application) or an externally available transaction ID that can be used independently of the connection.

In the latter case you open the transaction with the TxID and then reconnect using that ID. The DBMS maintains the locks and allows you to pick the session back up through the TxID.

Optimistic locking is used when you don’t expect many collisions. It costs less to do a normal operation but if the collision DOES occur you would pay a higher price to resolve it as the transaction is aborted.Pessimistic locking is used when a collision is anticipated. The transactions which would violate synchronization are simply blocked.
To select proper locking mechanism you have to estimate the amount of reads and writes and plan accordingly

Optimistic needs a three-tier architectures where you do not necessarily maintain a connection to the database for your session whereas Pessimistic Locking is when you lock the record for your exclusive use until you have finished with it. It has much better integrity than optimistic locking you need either a direct connection to the database.optimistic (versioning) is faster because of no locking but (pessimistic) locking performs better when contention is high and it is better to prevent the work rather than discard it and start over.Optimistic locking works best when you have rare collisions

Q2.What is the Need for Indexing in Database Tables?
An index can be used to efficiently find all rows matching some column in your query and then walk through only that subset of the table to find exact matches. If you don’t have indexes on any column in the WHERE clause, the SQL server have to walk through the whole table and check every row to see if it matches, which may be a slow operation on big tables.The index can also be a UNIQUE index, which means that you cannot have duplicate values in that column, or a PRIMARY KEY which in some storage engines defines where in the database file the value is stored.

Q3.Clustered and Non Clustered Index
A Clustered index determines the physical order of data in a table.There can be only one clustered index per table (the clustered index IS the table). All other indexes on a table are termed non-clustered.A clustered index means you are telling the database to store close values actually close to one another on the disk. This has the benefit of rapid scan / retrieval of records falling into some range of clustered index values.

For example, you have two tables, Customer and Order:

Customer
----------
ID
Name
Address

Order
----------
ID
CustomerID
Price

If you wish to quickly retrieve all orders of one particular customer, you may wish to create a clustered index on the “CustomerID” column of the Order table. This way the records with the same CustomerID will be physically stored close to each other on disk (clustered) which speeds up their retrieval.The index on CustomerID will obviously be not unique, so you either need to add a second field to “uniquify” the index or let the database handle that for you but that’s another story.

Since the clustered index is actually related to how the data is stored, there is only one of them possible per table (although you can cheat to simulate multiple clustered indexes).

A non-clustered index is different in that you can have many of them and they then point at the data in the clustered index. You could have e.g. a non-clustered index at the back of a phone book which is keyed on (town, address)

You can have only one clustered index per table because this defines how the data is physically arranged. If you wish an analogy, imagine a big room with many tables in it. You can either put these tables to form several rows or pull them all together to form a big conference table, but not both ways at the same time. A table can have other indexes, they will then point to the entries in the clustered index which in its turn will finally say where to find the actual data.

Clustered Index

  1. Only one clustered index can be there in a table
  2. Sort the records and store them physically according to the order
  3. Data retrieval is faster than non-clustered indexes
  4. Do not need extra space to store logical structure

Non Clustered Index

  1. There can be any number of non-clustered indexes in a table
  2. Do not affect the physical order. Create a logical order for data rows and use pointers to physical data files
  3. Data insertion/update is faster than clustered index
  4. Use extra space to store logical structure

Q4.What is Staging/Factory table?
Staging table is a temporary table that is used to stage the data for temporary purpose just before loading it to the Target table from the Source Table. As the data resides temporarily, you can do various stuff on that data like
De-duping,Cleansing, Normalizing to multiple tables, De-Normalizing from multiple to a single table and Extrapolating

Q5.Staging vs Temp table?
Staging tables are permanent table just database tables containing your business data in some form or other. Staging is the process of preparing your business data, usually taken from some business application.Temporary tables can be created at runtime and can do the all kinds of operations that one normal table can do. But, based on the table types, the scope is limited. These tables are created inside tempdb database.When we are doing large number of row manipulation in stored procedures. This is useful to replace the cursor. We can store the result set data into a temp table, then we can manipulate the data from there. When we are having a complex join operation.

Permanent table is faster if the table structure is to be 100% the same since there’s no overhead for allocating space and building the table.

Temp table is faster in certain cases (e.g. when you don’t need indexes that are present on permanent table which would slow down inserts/updates)

Q6.What are different types of Tables?
Normal tables are exactly that, physical tables defined in your database.

Local temporary tables are temporary tables that are available only to the session that created them. These tables are automatically destroyed at the termination of the procedure or session that created them.

Global temporary tables are temporary tables that are available to all sessions and all users. They are dropped automatically when the last session using the temporary table has completed. Both local temporary tables and global temporary tables are physical tables created within the tempdb database.

Table variables are stored within memory but are laid out like a table. Table variables are partially stored on disk and partially stored in memory. It’s a common misconception that table variables are stored only in memory. Because they are partially stored in memory, the access time for a table variable can be faster than the time it takes to access a temporary table.

Q7. Procedures vs Functions?

Stored Procedures (SP) Functions (UDF – User Defined Function)
SP can return zero , single or multiple values. Function must return a single value (which may be a scalar or a table).
We can use transaction in SP We can’t use transaction in UDF.
SP can have input/output parameter. Only input parameter
We can call function from SP. We can’t call SP from function.
We can’t use SP in SELECT/ WHERE/ HAVING statement. We can use UDF in SELECT/ WHERE/HAVING statement.
We can use exception handling using Try-Catch block in SP. We can’t use Try-Catch block in UDF.

Q8.Table vs View?
Table is a preliminary storage for storing data and information in RDBMS. A table is a collection of related data entries and it consists of columns and rows.

A view is a virtual table whose contents are defined by a query. Unless indexed, a view does not exist as a stored set of data values in a database. The advantage of a view is that it can join data from several tables thus creating a new view of it

Advantages over table are

  1. We can combine columns/rows from multiple table or another view and have a consolidated view.
  2. Views can be used as security mechanisms by letting users access data through the view, without granting the users permissions to directly access the underlying base tables of the view
  3. It acts as abstract layer to downstream systems, so any change in schema is not exposed and hence the downstream systems doesn’t get affected.
  4. Instead of sending the complex query to the database all the time, you can save the query as a view and then SELECT * FROM view

Q9.Can View could be indexed?
Yes. View can be indexed.The big disadvantage of indexed views is that they are recreated every time the underlying table data changes. That restricts the use of indexed views to data that does not change often, typically in a data warehouse or business intelligence environment.

Q10.Which one is Faster Optimistic or Pessimistic?
Optimistic locking assumes concurrent transactions can complete without affecting each other. So Optimistic locking is faster because no locks are enforced while doing transactions.Optimistic locking does not cause transactions to wait for each other.Optimistic locking possibly causes a transaction to fail, but it does so without any “lock” ever having been taken.
The word “optimistic” derives from exactly the “I will not be taking actual locks because I hope they won’t be needed anyway. If it turns out I was wrong about that, I will accept the inevitable failure.”

Unidirectional – bidirectional relationship provides navigational access in one direction

  Parent -----> Child

i.e you can go from parent to child, but you cannot go back from children to parent.

However, if there were no pointer to Parent in Child:

class Child { }

Bidirectional – bidirectional relationship provides navigational access in both directions

  Parent <-----> Child

i.e you can go from a Parent to its child, and vice-versa: the parent knows about its child, the child knows about its parent

class Parent {
  Child* children;
}

class Child {
  Parent* parent;
}

Idle Scenarios
One to One unidirectional Mapping
employee knows the employer

One to One Bidirectional Mapping
employer knows the employee and employee knows the employer

One to Many unidirectional Mapping
employee has a skill in his skill set which is not used by other employees

One to Many bidirectional Mapping
employee has a skill in his skill set which is not used by other employees and
employer knows that employee has this skill in skill set

Many to One unidirectional Mapping
employees knows which employer he is going to work but employer has no idea about employee

employees will have employerId in their entity class but employer has no details of employee

Many to One bidirectional Mapping
Many employees work for one Employer. The employer knows about employee and employee knows about employer

employees will have employerId in their entity class but employer will have empid of employee

Many to Many unidirectional Mapping
Employee knows the employers he has worked for but employers does not know details of employee who worked for them

Many to Many bidirectional Mapping
Employee knows about the employers he has worked for and employers knows about employee who has worked for them.

  1. JDBC-ODBC Bridge Driver
  2. JDBC-Native API
  3. Network Protocol driver(Pure Java Driver)
  4. Thin driver(Pure Java Driver)

What is Difference between Thin client driver and Thick Client driver?
Thick driver – type 1 and type 2 JDBC drivers are called thick drivers.It provides JDBC access via ODBC drivers. ODBC binary code, and in many cases, database client code, must be loaded on each client machine that uses such a driver. and as the program is connected to database by an extra layer, hence thick drivers.

Thin drivers: Drivers converts JDBC calls into the network protocol, used directly by DBMS, allowing a direct call from the client machine to the DBMS server and providing a practical solution for intranet access.

What is the simplest SQL Query to find the second largest value?

SELECT MAX(col)
  FROM table
 WHERE col < (SELECT MAX(col)
                FROM table)
SELECT MAX(col) 
  FROM table 
 WHERE col NOT IN (SELECT MAX(col) FROM table);
Posted in SQL.

FOR cur IN (SELECT  process_Id 
              FROM  tbl_process)
LOOP
  SELECT (MAX(TO_NUMBER(substr(p.old_id, instr(p.old_id, '_', -1) + 1))))
    INTO new_id
    FROM tbl_process p
   WHERE process_Id = cur.process_Id;

  IF new_id IS NULL THEN
    new_id := cur.process_Id || '_1';
  ELSE
    new_id := new_id + 1;
    new_id := cur.process_Id || '_' || new_id;
  END IF;
END LOOP;

If you want a query which returns rows in column containing NULL or empty space then the using NULL alone in where clause will not return empty space rows. So the query should be altered as below

Actual Query

 SELECT COUNT (*) 
   FROM TABLE 
  WHERE COL_NAME IS NULL;

Altered Query

  SELECT COUNT (*) 
    FROM TABLE 
   WHERE trim(COL_NAME) IS NULL

(or)

Altered Query

  SELECT COUNT (*) 
    FROM TABLE  
    WHERE COL_NAME IS NULL OR 
          LENGTH(TRIM (COL_NAME)) = 0