Author Archives: Shubham Jha

Shubham Jha

Salesforce engineer. Love learning about new tech.

Data Skew in Salesforce

Data is getting generated at an explosive pace nowadays and we are running out of storage solutions in order to manage that data. Researches by multiple magazines and portals suggest that 90 percent of the total data in the world was created in the last two years only. This pace continues to increase day by day and we are slowly approaching a state where we would not be able to deal with this data. Salesforce is no exception in this case where organization instances are having large amount of records related to the business process needs. When this plethora of data is not managed properly, we slowly approach a state which is termed as Data Skew.

What is Data Skew?

Data Skew generally refers to a condition where data is distributed unevenly in a large data set. In Salesforce, data skew occurs when more than 10000 child object records are related to a single parent object record, or more than 10000 records of any object are owned by a single Salesforce user. This skewness leads to major performance hits and long running processes which are something that one should avoid.

ShubhamPic1

Types of Data Skew in Salesforce

Three types of data skew exist in Salesforce which are as follows:

  1. Account Skew
  2. Ownership Skew
  3. Lookup Skew

 

1. Account Skew

This type of Salesforce data skew comes into existence when you have large number of child records present under a single account record. This is a very common scenario as it is quite tempting to place all your unwanted or unassigned records under an account named ‘Miscellaneous’ or ‘Unassigned’. As easy and correct as it may look, it can cause major issues such as record locking and sharing performances. This is mainly because certain standard objects like Opportunity and Account, have special data relationships which maintain record access under private sharing models. The problems that you will face in a state of Account skew are:

  • Record Locking: When we are performing an update operation on large number of child records in separate threads, the system locks the child being updated as well as the parent record in order to maintain database integrity for each update. Hence, the parent record might be locked by one thread while some other thread is trying to update the same.
  • Sharing Problems: When we have many child associations with a single parent record, a simple change in sharing setting might lead to a chain of time-consuming processes. Even a meagre change like updating the owner of a parent record may lead to all the sharing rules on the child records being recalculated as well as recalculation of the role hierarchy.

Possible Way for Avoiding Account Skew:

There is only one way to avoid Account skew, that is by distribution of such child records across multiple accounts rather than accumulation on a single record. Having an even distribution of child records across parent accounts fool proofs our organization against performance hits due to account skew.

 

2. Ownership Skew
Ownership data skew is another type of date skew which is very common in Salesforce. This issue occurs when more than 10000 records are owned by a single Salesforce user. Since every record inside Salesforce needs to have an owner, it is quite common in organizations to make a default owner or queue, to which all the unassigned or unused records go to. It is a preferred solution for many organizations in such use case, but little do they know that though this might work for small data sets, this will fail when we are dealing with large data. This increases the probability of performance issues whenever some change to the sharing settings or some similar operation occurs. For example, if a user owns large number of records and he/she is moved around in the role hierarchy, then the sharing rules for all the records owned by that user will be reevaluated and that will result in a long running operation.

Possible Ways for Avoiding Ownership Skew:

  • The best way to avoid this kind of skew will be even distribution of such records among multiple users rather than having a single user for all.
  • If you are compelled to stay put with this solution, then the performance impacts can be reduced by not assigning the user (record owner) to a role.
  • If the owner must have a role, then try to keep the user on top of the role hierarchy. This will avoid the user being passed around the role hierarchy.
  • Make sure that the user is not a member of any public group which is acting as the source for a sharing rule.

 

3. Lookup Skew
Lookup skew is similar to Account skew but can affect a broader number of objects. This happens when large number of records are associated to a single record in the lookup object. Since lookup fields can exist on standard as well as custom fields, lookup skew problem can arise on any custom object in the organization. This happens regardless of whether that lookup exists on a single object or across multiple objects.

Possible Ways for Avoiding Lookup Skew:

  • One method is to distribute the skew across multiple lookup fields. The main cause of the problem is that large number of records are lookup to the same record. By providing additional lookup values to distribute the skew, record lock exceptions can be minimized or even eliminated.
  • Remove unnecessary workflow rules or process builders on the objects in order to reduce the record saving time. Also, make sure that the synchronous apex code and triggers are well optimized.
  • In case the number of lookup values are low and definite, you can use picklist values to represent the lookup values rather than using lookup fields.

 

Conclusion

Data plays a crucial role in the business architecture of large organizations and hence these problems are very common. By taking a few steps while designing our architecture, the data skew problems can be avoided. Having a distributed data is still the best bet for getting rid of these skews and their repercussions.

Posted in Uncategorized. Tagged with , , , .

Machine Learning vs. Artificial Intelligence-The identical twins or are they really?

Since you are reading this, I assume you are aware of, or at least have heard about Machine Learning and Artificial Intelligence. Being two of the hottest buzzwords in the industry right now, these are often used interchangeably leading to some confusion. However, these two have different meanings and applications. The two terms are very strongly related though, as they share a containership relationship between them where the former is a subset of the later. Lets dive deep into these topics and try to find the reason for this confusion and related solutions.

Why this confusion?

The main culprit behind this confusion is the interchangeable use of these two terms and the limited knowledge of the subject among the developer as well as user community. Artificial intelligence is heavily dependent on machine learning,

Machine Learning and AI confusion

Machine Learning and AI confusion

which has led to the perception that both terms refer to the same thing. This confusion has spread like wildfire in the industry and only people who are experts in this field, know the clear distinction among these terms.

Artificial Intelligence-The Big Brother

Artificial Intelligence is the intelligence demonstrated by machines which emulates a human like thinking and behavior, allowing them to make their own decisions in real life situations. Going by the computer science definition, AI is referred to as the study of intelligent agents, which are devices that perceive their environment and take actions accordingly in order to maximum fulfillment of their goals. These agents mimic certain cognitive functions, which humans relate with the human mind, like problem solving and learning. AI, traditionally, attempts to solve problems such as Reasoning, Knowledge Representation, Learning, Planning, Natural Language Processing etc. Generating an intelligent agent which can think like humans is the long-term goal since it makes use of all the former techniques mentioned.   Continue reading

Posted in Artificial Intelligence, Machine Learning, Salesforce AI, Salesforce Einstein, Salesforce Machine Learning. Tagged with , , , .

“Playing with the Sentiments”-a blog on Sentiment Analysis

People have always had an interest in what other people think, or what opinion they hold. Since the inception of the internet, increasing numbers of people are using websites and social media platform for expressing their opinion. Due to platforms such as Facebook, Twitter etc., it has become feasible to analyze and extract the public opinion on a certain topic, news story, product, or brand. Opinions that are mined from such services can be valuable. Data mined from these sources can be analyzed and presented accordingly to easily identify the online mood (positive, negative or neutral). This allows individuals or business to be proactive as opposed to reactive when a negative conversational thread is emerging. Alternatively, positive sentiments can be leveraged to identify product advocates as well to shape the business strategy by seeing the parts of the strategy that are working.

Salesforce Sentiment Analysis

Sentiment Analysis

Continue reading

Posted in Learn Salesforce, salesforce certified, salesforce consultant, Salesforce Einstein, Uncategorized. Tagged with , , , .

Big Objects in Salesforce

Have you ever been in a situation where you need to store gigantic amount of data in your Salesforce org? Are you tired of using third-party storage systems and writing web service calls to get the massive amount of data? Having to crunch those large numbers, though a reflection of your own success, can be troublesome and performance degrading. Salesforce has come to the rescue again by introducing big objects-which are objects with massive storage capabilities on the Salesforce platform itself. They provide consistent performance over a data set of the order of billions and are accessible through a standard set of APIs to your org or to an external system.

Continue reading

Posted in apex develeopment, force.com app development, Salesforce, sfdc.

Salesforce & IoT

Anything that can be connected, will be connected.”  This quote by Jason Morgan of Forbes.com, is going to be, what future holds, with the increasing growth of the concept of IoT.  Simply put, Internet of things is a system of electronic devices or components, interconnected with each other through Internet and capable of exchanging data and information.  The devices include everything ranging from cellphones, coffee makers, headphones, smart wearables to components of machines like engine of an airplane etc. The IoT allows the devices to be remotely sensed and controlled using the existing network structure; which enables the opportunity to directly integrate them with other devices or networks. So basically, IoT is a giant network of connected things.

Why use IoT & How big is it?

A survey conducted by HP estimated that the growth of IoT will be exponential and by 2025, over one trillion devices will be connected through IoT. Another report by Cisco predicts that IoT will generate $14.4 trillion in value across all industries in the next decade. These surveys suggest towards a fully automated future.

IoTPicture

Continue reading

Posted in Apex Development, Salesforce, salesforce consultant, salesforce for healthcare, salesforce for small business, Salesforce.com, Visualforce. Tagged with , .

NLP – Natural Language Processing

Since a long time, engineers have been striving to make machines perform tasks that human beings do; which has led to birth of the field of machine learning. Understanding the language humans speak, constitutes a vital part of this field. This field of computer science which deals with human-machine interactions, especially concerned with computer programs which can process natural language efficiently, is known as Natural Language Processing, mostly referred to by the abbreviation NLP.

NLP sits at the intersection of computer science, artificial intelligence and computational linguistics. “By utilizing Natural Language Processing algorithms, developers can organize and structure textual data to perform tasks such as automatic summarization, translation, named entity recognition, relationship extraction, sentiment analysis, speech recognition, and topic segmentation.” (En.wikipedia.org, 2017)

Natural Language Processing is characterized as a hard problem in computer science since human language is rarely precise, or plainly spoken. To understand human language, one must not only understand the words but their meaning & context and how they interconnect to form meaning. The vagueness and ambiguous nature of human language makes it difficult to learn for computers while being easy to learn for humans.
 

Components of NLP

There are two components of NLP which are listed as follows:

    • Natural Language Understanding(NLU)
      This includes understanding the different aspects of the language and mapping the input text in natural language to useful representations. This is the harder of the two components since this section has to deal with the ambiguity & complexity of the language. There are mainly three levels of ambiguity which are as follows:

          1. Word-level or Lexical Ambiguity
          2. Syntax Level or Parsing Ambiguity
          3. Referential Ambiguity

 

    • Natural Language Generation(NLG)
      As evident from the name, NLG is the process of producing or generating meaningful phrases and sentences in the form of natural language. It involves text planning, sentence planning and text realization.

 

NLP Terminology

Syntax: It refers to arrangement of words which form a sentence. It also involves determination of structural role of each word in the sentence.

Phonology: It is the study of organizing sounds systematically.

Morphology: It is study of how words are constructed using primitive meaningful units.

Semantics: It deals with the meaning of words and how they can be joined/combined to form meaningful sentences.

Discourse: This determines how the immediately preceding sentence can affect the interpretation of the next sentence.

Pragmatics: This deals with how the interpretation of a sentence changes according to the situation.

 

What can developers use NLP algorithms for?

    • Summarizing blocks of text to extract the meaningful information from the given text, ignoring the remaining non-relevant text
    • Understanding the input and generating the output in Chatbots
    • Deriving the sentiment of a piece of text using Sentiment analysis
    • Break up large text into simpler tokens such as sentences or words

 

Some Open Source NLP Libraries

    • Apache OpenNLP
      It is a Java based machine learning toolkit provided by Apache, that supports the most common NLP tasks, such as tokenization, sentence segmentation, part-of-speech tagging, named entity extraction, chunking, parsing, language detection and coreference resolution. OpenNLP also includes maximum entropy and perceptron based machine learning. It provides built-in Java classes for each functionality as well a command line interface for testing the pre-built agents.

 

    • Natural Language Toolkit(NLTK)
      It is a platform for building Python programs to read and process human language data. It provides easy-to-use interfaces to over 50 corpora and lexical resources, along with a suite of text processing libraries for classification, tokenization, stemming, tagging, parsing, and semantic reasoning, wrappers for industrial-strength NLP libraries, and an active discussion forum.

 

    • Stanford CoreNLP
      Stanford CoreNLP provides a set of human language technology tools. It can give the base forms of words, their parts of speech, mark up the structure of sentences in terms of phrases and syntactic dependencies, indicate which noun phrases refer to the same entities, indicate sentiment, extract or open-class relations between entity mentions, get the quotes people said, etc.

 

    • MALLET
      MALLET is a Java-based package for statistical natural language processing, document classification, clustering, topic modeling, information extraction, and other machine learning applications to text. Apart from classification, MALLET includes tools for sequence tagging for applications such as named-entity extraction from text. Algorithms include Hidden Markov Models, Maximum Entropy Markov Models, and Conditional Random Fields.

 

These are few of the many open source libraries and toolkits available for development on Natural Language Processing which can be utilized by developers in their applications.

In conclusion, Natural Language Processing is an important part of the artificial intelligence field and needs to be given importance if someone wants to master the trade of Machine Learning or Artificial Intelligence.

 

References

 

Posted in Agile, Learn Salesforce. Tagged with , , .